SlideShare a Scribd company logo
1 of 34
Download to read offline
Arvind Rao, HERE Technologies
@cwcomplex
Spatial Processing of
Global Heat Maps with
Apache Spark
#EUds13
Agenda
• Definition of the heat map
– How its built from access logs
• Contrast enhancement
– Description of histogram equalization
• Kernel based image processing
– Gaussian blur. Motivated by noise removal
2@cwcomplex #EUds13
3@cwcomplex #EUds13
4@cwcomplex #EUds13
Heat Map is a Spatial Histogram
• The earth is partitioned into cells
– Defined by fixed latitude & longitude increments
• Reverse geocodes within each cell are summed
5@cwcomplex #EUds13
Aggregation of Request Coordinates
6@cwcomplex #EUds13
Caveat: Cell Areas Vary by Latitude
7@cwcomplex #EUds13
India
Europe
Definition & Data
• Not a conventional image
– Not acquired by a camera or sensor
– ~ 6.5 trillion tiles
• High Resolution
– For perspective a 4k display has ~ 8 million pixels
– By comparison the heatmap is ~ 1 million times more
detailed
8@cwcomplex #EUds13
How is it used in HERE Search
9@cwcomplex #EUds13
The Histogram is SUPER Skewed
10@cwcomplex #EUds13
The Histogram is SUPER Skewed
11@cwcomplex #EUds13
Loose Description of Histogram
Equalization (HE)
@cwcomplex #EUds13 12
Loose Description of Histogram
Equalization (HE)
Literally mapping data to percentile rank.
@cwcomplex #EUds13 13
Loose Description of HE
𝑙 ↦ 𝑃(𝐿 < 𝑙)×𝑁
Max Pixel Value
@cwcomplex #EUds13 14
Loose Description of HE
T
@cwcomplex #EUds13 15
Log Scaled vs. Histogram Normalized
@cwcomplex #EUds13 16
HE – Well- Defined Comparison
@cwcomplex #EUds13 17
HE Implementation
18@cwcomplex #EUds13
Heat Map DF
Row(latitude, longitude,
count)
Map Row(latitude, longitude, density)
Row(latitude, longitude,
CDF(density) × N)
Map
Cumulative Distribution Function
19@cwcomplex #EUds13
Row(latitude, longitude, density)
Histogram
Row(density, count)
group by
on density
collect locally &
accumulate CDF
Array of (density, count)
CDF
Broadcast of CDF Array
HE Implementation
Binary search finds closest left bin to given
density in CDF.
Then linear interpolate
between left and right
bins
20@cwcomplex #EUds13
Some Things To Know
• May increase the contrast of background noise,
while decreasing the usable signal
• HE often best in scientific images, like thermal,
satellite, or X-rays
@cwcomplex #EUds13 21
Spatial Processing
• Transform image pixel by a function of neighborhood
pixel values
• Examples of kernel based methods:
– Mean, median, Gaussian, etc.
– Also useful in edge-detection & segmentation
22@cwcomplex #EUds13
Spatial Processing
23@cwcomplex #EUds13
neighborhood
pixel of interest
Imperative solution:
Slide kernel window
over image
• Iterate pixel by pixel
• Each pixel knows its
neighborhood.
Spatial Processing – Kernel for Blur
Gaussian kernel with 𝜎 = 1,
𝐻 𝑥, 𝑦 =
1
273
1 4 7
4 16 26
4 1
16 4
7 26 41
4
1
16
4
26
7
26 7
16
4
4
1
Transformed image,
𝐼7(𝑥, 𝑦) = 8 𝐻(Δ𝑥,Δ𝑦)𝐼(𝑥 + Δ𝑥, 𝑦 + Δ𝑦)
;<,;=
24@cwcomplex #EUds13
Implemented with
a GroupBy
kernel application
Implemented with
a GroupBy
kernel application
Spatial Processing
Functional approach: process each pixel
independently.
Sketch of Algorithm:
1. Explode heat map to DataFrame of pointers from
new pixel indices to all non-null neighbors in the
original image
2. Map over exploded DataFrame to apply kernel
3. GroupBy on pixel indices (latitude & longitude)
25@cwcomplex #EUds13
Spatial Processing
26@cwcomplex #EUds13
Exploding heat map
DataFrame gives all
indices of pixels in
the transformed
image.
Spatial Processing
27@cwcomplex #EUds13
The smoothed image is larger
and less sparse.
Spatial Processing
28@cwcomplex #EUds13
Being a
neighbor is
reciprocal.
Spatial Processing
29@cwcomplex #EUds13
Should think of rows in the exploded
DataFrame as a pointer from any pixel
index in the image domain to its non-null
neighbor from input image.
Sparsity guarantees that a
pixel index exists in exploded
DataFrame iff it has a non-null
neighbor.
Spatial Processing
30@cwcomplex #EUds13
𝚫x 𝚫𝐲 nb-x nb-y originating
value
0 0 34.23 120.56 120
1 -1 34.24 120.55 120
2 -2 34.25 120.54 120
Spatial Processing
Sketch of Algorithm
1. Explode heat map to DataFrame of pointers
from new pixel indices to all non-null neighbors
in the original image.
2. Map over exploded DataFrame to apply kernel.
3. GroupBy on pixel indices (latitude & longitude)
31@cwcomplex #EUds13
Frankfurt – Without & With Blurring
32@cwcomplex #EUds13
Berlin – Without & With Blurring
33@cwcomplex #EUds13
Arvind Rao, HERE Technologies
@cwcomplex
Take Care!
#EUds13

More Related Content

What's hot

Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Somnath Mazumdar
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
Anis Nasir
 

What's hot (20)

A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
Tricks of the trade with Ted Malaska
Tricks of the trade with Ted MalaskaTricks of the trade with Ted Malaska
Tricks of the trade with Ted Malaska
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Making Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQLMaking Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQL
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis -  Massimo PeriniDeep Stream Dynamic Graph Analytics with Grapharis -  Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
 
Federated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation TherapyFederated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation Therapy
 
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
 
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
 
Hadoop analytics provisioning based on a virtual infrastructure
Hadoop analytics provisioning based on a virtual infrastructureHadoop analytics provisioning based on a virtual infrastructure
Hadoop analytics provisioning based on a virtual infrastructure
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 

Viewers also liked

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Spark Summit
 

Viewers also liked (18)

Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
 
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
 
Story Deduplication and Mutation with Antoine Amend and Andrew Morgan
Story Deduplication and Mutation with Antoine Amend and Andrew MorganStory Deduplication and Mutation with Antoine Amend and Andrew Morgan
Story Deduplication and Mutation with Antoine Amend and Andrew Morgan
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
 
Low Touch Machine Learning with Leah McGuire (Salesforce)
Low Touch Machine Learning with Leah McGuire (Salesforce)Low Touch Machine Learning with Leah McGuire (Salesforce)
Low Touch Machine Learning with Leah McGuire (Salesforce)
 
Building Machine Learning Algorithms on Apache Spark with William Benton
Building Machine Learning Algorithms on Apache Spark with William BentonBuilding Machine Learning Algorithms on Apache Spark with William Benton
Building Machine Learning Algorithms on Apache Spark with William Benton
 
Feature Hashing for Scalable Machine Learning with Nick Pentreath
Feature Hashing for Scalable Machine Learning with Nick PentreathFeature Hashing for Scalable Machine Learning with Nick Pentreath
Feature Hashing for Scalable Machine Learning with Nick Pentreath
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Art of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel SarwarArt of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel Sarwar
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 

Similar to Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao

Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
Aritra Sarkar
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
inside-BigData.com
 

Similar to Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao (20)

Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
IMQA Poster
IMQA PosterIMQA Poster
IMQA Poster
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Caustic Object Construction Based on Multiple Caustic Patterns
Caustic Object Construction Based on Multiple Caustic PatternsCaustic Object Construction Based on Multiple Caustic Patterns
Caustic Object Construction Based on Multiple Caustic Patterns
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
HS Demo
HS DemoHS Demo
HS Demo
 
Temporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch MatchingTemporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch Matching
 
CG MODULE 1 (S6 CSE).pptx
CG MODULE 1 (S6 CSE).pptxCG MODULE 1 (S6 CSE).pptx
CG MODULE 1 (S6 CSE).pptx
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
 
Near Exascale Computing in the Cloud
Near Exascale Computing in the CloudNear Exascale Computing in the Cloud
Near Exascale Computing in the Cloud
 
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupDTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Spark Summit
 

More from Spark Summit (20)

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
 
Variant-Apache Spark for Bioinformatics with Piotr Szul
Variant-Apache Spark for Bioinformatics with Piotr SzulVariant-Apache Spark for Bioinformatics with Piotr Szul
Variant-Apache Spark for Bioinformatics with Piotr Szul
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao

  • 1. Arvind Rao, HERE Technologies @cwcomplex Spatial Processing of Global Heat Maps with Apache Spark #EUds13
  • 2. Agenda • Definition of the heat map – How its built from access logs • Contrast enhancement – Description of histogram equalization • Kernel based image processing – Gaussian blur. Motivated by noise removal 2@cwcomplex #EUds13
  • 5. Heat Map is a Spatial Histogram • The earth is partitioned into cells – Defined by fixed latitude & longitude increments • Reverse geocodes within each cell are summed 5@cwcomplex #EUds13
  • 6. Aggregation of Request Coordinates 6@cwcomplex #EUds13
  • 7. Caveat: Cell Areas Vary by Latitude 7@cwcomplex #EUds13 India Europe
  • 8. Definition & Data • Not a conventional image – Not acquired by a camera or sensor – ~ 6.5 trillion tiles • High Resolution – For perspective a 4k display has ~ 8 million pixels – By comparison the heatmap is ~ 1 million times more detailed 8@cwcomplex #EUds13
  • 9. How is it used in HERE Search 9@cwcomplex #EUds13
  • 10. The Histogram is SUPER Skewed 10@cwcomplex #EUds13
  • 11. The Histogram is SUPER Skewed 11@cwcomplex #EUds13
  • 12. Loose Description of Histogram Equalization (HE) @cwcomplex #EUds13 12
  • 13. Loose Description of Histogram Equalization (HE) Literally mapping data to percentile rank. @cwcomplex #EUds13 13
  • 14. Loose Description of HE 𝑙 ↦ 𝑃(𝐿 < 𝑙)×𝑁 Max Pixel Value @cwcomplex #EUds13 14
  • 15. Loose Description of HE T @cwcomplex #EUds13 15
  • 16. Log Scaled vs. Histogram Normalized @cwcomplex #EUds13 16
  • 17. HE – Well- Defined Comparison @cwcomplex #EUds13 17
  • 18. HE Implementation 18@cwcomplex #EUds13 Heat Map DF Row(latitude, longitude, count) Map Row(latitude, longitude, density) Row(latitude, longitude, CDF(density) × N) Map
  • 19. Cumulative Distribution Function 19@cwcomplex #EUds13 Row(latitude, longitude, density) Histogram Row(density, count) group by on density collect locally & accumulate CDF Array of (density, count) CDF Broadcast of CDF Array
  • 20. HE Implementation Binary search finds closest left bin to given density in CDF. Then linear interpolate between left and right bins 20@cwcomplex #EUds13
  • 21. Some Things To Know • May increase the contrast of background noise, while decreasing the usable signal • HE often best in scientific images, like thermal, satellite, or X-rays @cwcomplex #EUds13 21
  • 22. Spatial Processing • Transform image pixel by a function of neighborhood pixel values • Examples of kernel based methods: – Mean, median, Gaussian, etc. – Also useful in edge-detection & segmentation 22@cwcomplex #EUds13
  • 23. Spatial Processing 23@cwcomplex #EUds13 neighborhood pixel of interest Imperative solution: Slide kernel window over image • Iterate pixel by pixel • Each pixel knows its neighborhood.
  • 24. Spatial Processing – Kernel for Blur Gaussian kernel with 𝜎 = 1, 𝐻 𝑥, 𝑦 = 1 273 1 4 7 4 16 26 4 1 16 4 7 26 41 4 1 16 4 26 7 26 7 16 4 4 1 Transformed image, 𝐼7(𝑥, 𝑦) = 8 𝐻(Δ𝑥,Δ𝑦)𝐼(𝑥 + Δ𝑥, 𝑦 + Δ𝑦) ;<,;= 24@cwcomplex #EUds13 Implemented with a GroupBy kernel application Implemented with a GroupBy kernel application
  • 25. Spatial Processing Functional approach: process each pixel independently. Sketch of Algorithm: 1. Explode heat map to DataFrame of pointers from new pixel indices to all non-null neighbors in the original image 2. Map over exploded DataFrame to apply kernel 3. GroupBy on pixel indices (latitude & longitude) 25@cwcomplex #EUds13
  • 26. Spatial Processing 26@cwcomplex #EUds13 Exploding heat map DataFrame gives all indices of pixels in the transformed image.
  • 27. Spatial Processing 27@cwcomplex #EUds13 The smoothed image is larger and less sparse.
  • 29. Spatial Processing 29@cwcomplex #EUds13 Should think of rows in the exploded DataFrame as a pointer from any pixel index in the image domain to its non-null neighbor from input image. Sparsity guarantees that a pixel index exists in exploded DataFrame iff it has a non-null neighbor.
  • 30. Spatial Processing 30@cwcomplex #EUds13 𝚫x 𝚫𝐲 nb-x nb-y originating value 0 0 34.23 120.56 120 1 -1 34.24 120.55 120 2 -2 34.25 120.54 120
  • 31. Spatial Processing Sketch of Algorithm 1. Explode heat map to DataFrame of pointers from new pixel indices to all non-null neighbors in the original image. 2. Map over exploded DataFrame to apply kernel. 3. GroupBy on pixel indices (latitude & longitude) 31@cwcomplex #EUds13
  • 32. Frankfurt – Without & With Blurring 32@cwcomplex #EUds13
  • 33. Berlin – Without & With Blurring 33@cwcomplex #EUds13
  • 34. Arvind Rao, HERE Technologies @cwcomplex Take Care! #EUds13