Reverse geocoding is one of HERE Technologies most heavily used services. From its access logs geocodes can be extracted and then counted with respect to some cellulation of the earth–-creating a sparse heat map. For our Place & Address Search products we use such a heat map to define a notion of relative place importance to rank and index addresses and places. However, the large data size, sparsity, and variations in traffic from which different global heat maps may be derived, makes faithful visualization and comparison a challenge. Additionally, common implementations of spatial image processing techniques that can help address the aforementioned challenges don’t map directly onto Spark’s computing engine.
In this talk Arvind Rao will describe implementations of histogram equalization and kernel-based sparse image processing methods on Spark. Histogram equalization, which is best known as a method of contrast enhancement, automatically normalizes images, facilitating comparison. Along the way, Arvind will talk about how HERE uses heat maps as a feature in their autocompletion service, and say just enough about perception of contrast to put histogram equalization in context.
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
1. Arvind Rao, HERE Technologies
@cwcomplex
Spatial Processing of
Global Heat Maps with
Apache Spark
#EUds13
2. Agenda
• Definition of the heat map
– How its built from access logs
• Contrast enhancement
– Description of histogram equalization
• Kernel based image processing
– Gaussian blur. Motivated by noise removal
2@cwcomplex #EUds13
5. Heat Map is a Spatial Histogram
• The earth is partitioned into cells
– Defined by fixed latitude & longitude increments
• Reverse geocodes within each cell are summed
5@cwcomplex #EUds13
8. Definition & Data
• Not a conventional image
– Not acquired by a camera or sensor
– ~ 6.5 trillion tiles
• High Resolution
– For perspective a 4k display has ~ 8 million pixels
– By comparison the heatmap is ~ 1 million times more
detailed
8@cwcomplex #EUds13
9. How is it used in HERE Search
9@cwcomplex #EUds13
19. Cumulative Distribution Function
19@cwcomplex #EUds13
Row(latitude, longitude, density)
Histogram
Row(density, count)
group by
on density
collect locally &
accumulate CDF
Array of (density, count)
CDF
Broadcast of CDF Array
20. HE Implementation
Binary search finds closest left bin to given
density in CDF.
Then linear interpolate
between left and right
bins
20@cwcomplex #EUds13
21. Some Things To Know
• May increase the contrast of background noise,
while decreasing the usable signal
• HE often best in scientific images, like thermal,
satellite, or X-rays
@cwcomplex #EUds13 21
22. Spatial Processing
• Transform image pixel by a function of neighborhood
pixel values
• Examples of kernel based methods:
– Mean, median, Gaussian, etc.
– Also useful in edge-detection & segmentation
22@cwcomplex #EUds13
25. Spatial Processing
Functional approach: process each pixel
independently.
Sketch of Algorithm:
1. Explode heat map to DataFrame of pointers from
new pixel indices to all non-null neighbors in the
original image
2. Map over exploded DataFrame to apply kernel
3. GroupBy on pixel indices (latitude & longitude)
25@cwcomplex #EUds13
29. Spatial Processing
29@cwcomplex #EUds13
Should think of rows in the exploded
DataFrame as a pointer from any pixel
index in the image domain to its non-null
neighbor from input image.
Sparsity guarantees that a
pixel index exists in exploded
DataFrame iff it has a non-null
neighbor.
31. Spatial Processing
Sketch of Algorithm
1. Explode heat map to DataFrame of pointers
from new pixel indices to all non-null neighbors
in the original image.
2. Map over exploded DataFrame to apply kernel.
3. GroupBy on pixel indices (latitude & longitude)
31@cwcomplex #EUds13