SlideShare a Scribd company logo
1 of 37
Download to read offline
Deep Learning on ApacheĀ®
Sparkā„¢: Workflows and Best
Practices
Tim Hunter (Software Engineer)
Jules S. Damji (Spark Community Evangelist)
May 4, 2017
Agenda
ā€¢ Logistics
ā€¢ Databricks Overview
ā€¢ Deep Learning on ApacheĀ® Sparkā„¢: Workflows and Best
Practices
ā€¢ Q & A
Logistics
ā€¢ We canā€™t hear youā€¦
ā€¢ Recording will be available...
ā€¢ Slides and Notebooks will be available...
ā€¢ Queue up Questions ā€¦.
ā€¢ Orange Button for Tech Support difficulties...
Empower anyone to innovate faster with big data.
Founded by the creators of Apache Spark.
Contributes 75%of the open source code,
10x more than any other company.
VISION
WHO WE ARE
A data processing for data scientists, data engineers, and data
analysts that simplifies that data integration, real-time
experimentation, machine learning and deployment of
production pipelines .
PRODUCT
A New Paradigm
SECOND GENERATION
THE BEST OF BOTH WORLDS
Hadoop + data lake
Hard to centralize data and
extract value with disparate tools
Virtual analytics
ā€¢ Holisticallyanalyze data from
data warehouses, data lakes,
and other data stores
ā€¢ Utilize a single engine for batch,
ML, streaming & real-time
queries
ā€¢ Enable enterprise-wide
collaboration
+
FIRST GENERATION
Data warehouses
ETL process is rigid, scaling out
is expensive, limited to SQL
CLUSTER TUNING &
MANAGEMENT
INTERACTIVE
WORKSPACE
PRODUCTION
PIPELINE
AUTOMATION
OPTIMIZED DATA
ACCESS
DATABRICKS ENTERPRISE SECURITY
YOUR	TEAMS
Data Science
Data Engineering
Many othersā€¦
BI Analysts
YOUR	DATA
Cloud Storage
Data Warehouses
Data Lake
VIRTUAL ANALYTICS PLATFORM
Deep Learning on ApacheĀ®
Sparkā„¢: Workflows and Best
Practices
Tim Hunter (Software Engineer)
May 4 , 2017
About Me
ā€¢ Tim Hunter
ā€¢ Software engineer @ Databricks
ā€¢ Ph.D. from UC Berkeley in Machine Learning
ā€¢ Very early Spark user
ā€¢ Contributor to MLlib
ā€¢ Author of TensorFrames and GraphFrames
Deep Learning and Apache Spark
Deep Learning frameworks w/ Spark bindings
ā€¢ Caffe (CaffeOnSpark)
ā€¢ Keras (Elephas)
ā€¢ MXNet
ā€¢ Paddle
ā€¢ TensorFlow(TensorFlowOnSpark,TensorFrames)
Extensions to Spark for specialized hardware
ā€¢ Blaze (UCLA & Falcon Computing Solutions)
ā€¢ IBM Conductor with Spark
Native Spark
ā€¢ BigDL
ā€¢ DeepDist
ā€¢ DeepLearning4J
ā€¢ MLlib
ā€¢ SparkCL
ā€¢ SparkNet
Deep Learning and Apache Spark
2016: the year of emerging solutions for Spark + Deep Learning
No consensus
ā€¢ Many approaches for libraries: integrate existing ones with Spark, build on
top of Spark, modify Spark itself
ā€¢ Official Spark MLlib support is limited(perceptron-like networks)
One Framework to Rule Them All?
Should we look for The One Deep Learning Framework?
Databricksā€™ perspective
ā€¢ Databricks: hosted Spark platform on public cloud
ā€¢ GPUs for compute-intensive workloads
ā€¢ Customers use many Deep Learning frameworks: TensorFlow, MXNet, BigDL,
Theano, Caffe, and more
This talk
ā€¢ Lessons learned from supporting many Deep Learning frameworks
ā€¢ Multiple ways to integrate Deep Learning & Spark
ā€¢ Best practices for these integrations
Outline
ā€¢ Deep Learning in data pipelines
ā€¢ Recurring patterns in Spark + Deep Learning integrations
ā€¢ Developer tips
ā€¢ Monitoring
Outline
ā€¢ Deep Learning in data pipelines
ā€¢ Recurring patterns in Spark + Deep Learning integrations
ā€¢ Developer tips
ā€¢ Monitoring
ML is a small part of data pipelines.
Hidden	technical	debt	in	Machine	Learning	systems
Sculley et	al.,	NIPS	2016
DL in a data pipeline: Training
Data
collection
ETL Featurization Deep
Learning
Validation Export,
Serving
compute intensive IO intensiveIO intensive
Large cluster
High memory/CPU ratio
Small cluster
Low memory/CPU ratio
DL in a data pipeline: Transformation
Specialized data transforms: feature extraction & prediction
Input Output
cat
dog
dog
Saulius Garalevicius - CC BY-SA3.0
Outline
ā€¢ Deep Learning in data pipelines
ā€¢ Recurringpatterns in Spark + Deep Learning integrations
ā€¢ Developer tips
ā€¢ Monitoring
Recurring patterns
Spark as a scheduler
ā€¢ Data-parallel tasks
ā€¢ Data stored outside Spark
Embedded Deep Learning transforms
ā€¢ Data-parallel tasks
ā€¢ Data stored in DataFrames/RDDs
Cooperative frameworks
ā€¢ Multiple passes over data
ā€¢ Heavy and/or specialized communication
Streaming data through DL
Primary storage choices:
ā€¢ Cold layer (HDFS/S3/etc.)
ā€¢ Local storage: files, Sparkā€™s on-disk persistence layer
ā€¢ In memory: SparkRDDs or SparkDataFrames
Find out if you are I/O constrained or processor-constrained
ā€¢ How big is your dataset? MNIST or ImageNet?
If using PySpark:
ā€¢ All frameworks heavily optimized for diskI/O
ā€¢ Use Sparkā€™s broadcastfor small datasets that fitin memory
ā€¢ Reading files is fast: use local files when it does not fit
Cooperative frameworks
ā€¢ Use Spark for data input
ā€¢ Examples:
ā€¢ IBM GPU efforts
ā€¢ Skymindā€™s DeepLearning4J
ā€¢ DistML and other Parameter Server efforts
RDD
Partition	1
Partition	n
RDD
Partition	1
Partition	m
Black	box
Cooperative frameworks
ā€¢ Bypass Spark for asynchronous / specific communication
patterns across machines
ā€¢ Lose benefit of RDDs and DataFrames and
reproducibility/determinism
ā€¢ But these guarantees are not requested anyway when doing
deep learning (stochastic gradient)
ā€¢ ā€œreproducibility is worth a factor of 2ā€ (Leon Bottou, quoted by
John Langford)
Outline
ā€¢ Deep Learning in data pipelines
ā€¢ Recurring patterns in Spark + Deep Learning integrations
ā€¢ Developer tips
ā€¢ Monitoring
The GPU software stack
ā€¢ Deep Learning commonly used with GPUs
ā€¢ A lot of workon Spark dependencies:
ā€¢ Few dependencies on local machine when compiling Spark
ā€¢ The build process works well in a largenumber of configurations (just scala +
maven)
ā€¢ GPUs present challenges: CUDA, support libraries, drivers, etc.
ā€¢ Deep softwarestack, requires careful construction (hardware+ drivers + CUDA
+ libraries)
ā€¢ All these are expected by the user
ā€¢ Turnkey stacks just starting to appear
ā€¢ Provide a Docker image with all the GPU SDK
ā€¢ Pre-install GPU drivers on the instance
Container:
nvidia-docker,
lxc,	etc.
The GPU software stack
GPU	hardware
Linux	kernel NV	Kernel	driver
CuBLAS CuDNN
Deep	learning	libraries
(Tensorflow,	etc.) JCUDA
Python	/	JVM	clients
CUDA
NV	kernel	driver	(userspace interface)
Using GPUs through PySpark
ā€¢ Popular choice for many independent tasks
ā€¢ Many DL packages have Python interfaces: TensorFlow,
Theano, Caffe, MXNet, etc.
ā€¢ Lifetime for python packages: the process
ā€¢ Requires some configuration tweaks in Spark
PySpark recommendation
ā€¢ spark.executor.cores = 1
ā€¢ Gives the DL framework full access over all the resources
ā€¢ Important for frameworks that optimize processor pipelines
Outline
ā€¢ Deep Learning in data pipelines
ā€¢ Recurring patterns in Spark + Deep Learning integrations
ā€¢ Developer tips
ā€¢ Monitoring
Monitoring
?
Monitoring
ā€¢ How do you monitor the progress of your tasks?
ā€¢ It depends on the granularity
ā€¢ Around tasks
ā€¢ Inside (long-running) tasks
Monitoring: Accumulators
ā€¢ Good to check throughput
or failure rate
ā€¢ Works for Scala
ā€¢ Limited use for Python
(for now, SPARK-2868)
ā€¢ No ā€œreal-timeā€ update
batchesAcc = sc.accumulator(1)
def processBatch(i):
global acc
acc += 1
# Process image batch here
images = sc.parallelize(ā€¦)
images.map(processBatch).collect()
Monitoring: external system
ā€¢ Plugs into an external system
ā€¢ Existing solutions: Grafana, Graphite, Prometheus, etc.
ā€¢ Most flexible, but more complex to deploy
Conclusion
ā€¢ Distributed deep learning: exciting and fast-moving space
ā€¢ Most insights are specific to a task, a dataset and an algorithm:
nothing replaces experiments
ā€¢ Get started with data-parallel jobs
ā€¢ Move to cooperative frameworks only when your data are too large.
Challenges to address
For Spark developers
ā€¢ Monitoringlong-running tasks
ā€¢ Presentingand introspecting intermediate results
For DL developers
ā€¢ What boundary to put between the algorithm and Spark?
ā€¢ How to integrate with Spark at the low-level?
Resources
Recent blog posts ā€” http://databricks.com/blog
ā€¢ TensorFrames
ā€¢ GPU acceleration
ā€¢ Getting started with Deep Learning
ā€¢ Intelā€™s BigDL
Docs for Deep Learning on Databricks ā€” http://docs.databricks.com
ā€¢ Getting started
ā€¢ Spark integration
SPARK SUMMIT 2017
DATA SCIENCE AND ENGINEERING AT SCALE
JUNE 5 ā€“ 7 | MOSCONE CENTER | SAN FRANCISCO
ORGANIZED BY spark-summit.org/2017
Thank You!
Questions?
Happy Sparking & Deep Learning!

More Related Content

What's hot

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
Ā 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh
Ā 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkDatabricks
Ā 
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Spark Summit
Ā 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
Ā 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
Ā 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JJosh Patterson
Ā 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software DevelopmentAlexis Seigneurin
Ā 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
Ā 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit
Ā 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopJosh Patterson
Ā 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit
Ā 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsDatabricks
Ā 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
Ā 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
Ā 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
Ā 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
Ā 
Scaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and DatabricksScaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and DatabricksDatabricks
Ā 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...Databricks
Ā 

What's hot (19)

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Ā 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
Ā 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
Ā 
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0
Ā 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Ā 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
Ā 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
Ā 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Ā 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Ā 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Ā 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Ā 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
Ā 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
Ā 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Ā 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
Ā 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Ā 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Ā 
Scaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and DatabricksScaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and Databricks
Ā 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Ā 

Similar to Deep Learning Best Practices on Apache Spark

Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
Ā 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Wee Hyong Tok
Ā 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache SparkQuantUniversity
Ā 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkDatabricks
Ā 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
Ā 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
Ā 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
Ā 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkDatabricks
Ā 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
Ā 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
Ā 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
Ā 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
Ā 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in IndustryDorian Beganovic
Ā 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
Ā 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
Ā 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
Ā 
Big Data training
Big Data trainingBig Data training
Big Data trainingvishal192091
Ā 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
Ā 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014mahchiev
Ā 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
Ā 

Similar to Deep Learning Best Practices on Apache Spark (20)

Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Ā 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Ā 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
Ā 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
Ā 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
Ā 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Ā 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Ā 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
Ā 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Ā 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
Ā 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
Ā 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
Ā 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
Ā 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
Ā 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Ā 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Ā 
Big Data training
Big Data trainingBig Data training
Big Data training
Ā 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
Ā 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
Ā 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
Ā 

More from Jen Aman

Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkJen Aman
Ā 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
Ā 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
Ā 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman
Ā 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
Ā 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
Ā 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
Ā 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
Ā 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
Ā 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
Ā 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
Ā 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonJen Aman
Ā 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
Ā 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
Ā 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
Ā 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
Ā 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics Jen Aman
Ā 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development KitJen Aman
Ā 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkJen Aman
Ā 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To ProductionJen Aman
Ā 

More from Jen Aman (20)

Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
Ā 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Ā 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
Ā 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Ā 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Ā 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Ā 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Ā 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Ā 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Ā 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
Ā 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
Ā 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Ā 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Ā 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Ā 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
Ā 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Ā 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
Ā 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
Ā 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
Ā 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
Ā 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
Ā 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]šŸ“Š Markus Baersch
Ā 
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€åŠžē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€F sss
Ā 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
Ā 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
Ā 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
Ā 
1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·
1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·
1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·vhwb25kk
Ā 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
Ā 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
Ā 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
Ā 
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
Ā 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...SeƔn Kennedy
Ā 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
Ā 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
Ā 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
Ā 
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)jennyeacort
Ā 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
Ā 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...SeƔn Kennedy
Ā 
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”DelhiRS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhijennyeacort
Ā 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
Ā 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Ā 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
Ā 
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€åŠžē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
Ā 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Ā 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
Ā 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Ā 
1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·
1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·
1:1定制(UQęƕäøščƁļ¼‰ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•äæ®ę”¹ē•™äæ”å­¦åŽ†č®¤čƁ原ē‰ˆäø€ęØ”äø€ę ·
Ā 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Ā 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Ā 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
Ā 
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
Ā 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
Ā 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
Ā 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
Ā 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
Ā 
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Ā 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Ā 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Ā 
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”DelhiRS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
Ā 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
Ā 

Deep Learning Best Practices on Apache Spark

  • 1. Deep Learning on ApacheĀ® Sparkā„¢: Workflows and Best Practices Tim Hunter (Software Engineer) Jules S. Damji (Spark Community Evangelist) May 4, 2017
  • 2. Agenda ā€¢ Logistics ā€¢ Databricks Overview ā€¢ Deep Learning on ApacheĀ® Sparkā„¢: Workflows and Best Practices ā€¢ Q & A
  • 3. Logistics ā€¢ We canā€™t hear youā€¦ ā€¢ Recording will be available... ā€¢ Slides and Notebooks will be available... ā€¢ Queue up Questions ā€¦. ā€¢ Orange Button for Tech Support difficulties...
  • 4. Empower anyone to innovate faster with big data. Founded by the creators of Apache Spark. Contributes 75%of the open source code, 10x more than any other company. VISION WHO WE ARE A data processing for data scientists, data engineers, and data analysts that simplifies that data integration, real-time experimentation, machine learning and deployment of production pipelines . PRODUCT
  • 5. A New Paradigm SECOND GENERATION THE BEST OF BOTH WORLDS Hadoop + data lake Hard to centralize data and extract value with disparate tools Virtual analytics ā€¢ Holisticallyanalyze data from data warehouses, data lakes, and other data stores ā€¢ Utilize a single engine for batch, ML, streaming & real-time queries ā€¢ Enable enterprise-wide collaboration + FIRST GENERATION Data warehouses ETL process is rigid, scaling out is expensive, limited to SQL
  • 6. CLUSTER TUNING & MANAGEMENT INTERACTIVE WORKSPACE PRODUCTION PIPELINE AUTOMATION OPTIMIZED DATA ACCESS DATABRICKS ENTERPRISE SECURITY YOUR TEAMS Data Science Data Engineering Many othersā€¦ BI Analysts YOUR DATA Cloud Storage Data Warehouses Data Lake VIRTUAL ANALYTICS PLATFORM
  • 7. Deep Learning on ApacheĀ® Sparkā„¢: Workflows and Best Practices Tim Hunter (Software Engineer) May 4 , 2017
  • 8. About Me ā€¢ Tim Hunter ā€¢ Software engineer @ Databricks ā€¢ Ph.D. from UC Berkeley in Machine Learning ā€¢ Very early Spark user ā€¢ Contributor to MLlib ā€¢ Author of TensorFrames and GraphFrames
  • 9. Deep Learning and Apache Spark Deep Learning frameworks w/ Spark bindings ā€¢ Caffe (CaffeOnSpark) ā€¢ Keras (Elephas) ā€¢ MXNet ā€¢ Paddle ā€¢ TensorFlow(TensorFlowOnSpark,TensorFrames) Extensions to Spark for specialized hardware ā€¢ Blaze (UCLA & Falcon Computing Solutions) ā€¢ IBM Conductor with Spark Native Spark ā€¢ BigDL ā€¢ DeepDist ā€¢ DeepLearning4J ā€¢ MLlib ā€¢ SparkCL ā€¢ SparkNet
  • 10. Deep Learning and Apache Spark 2016: the year of emerging solutions for Spark + Deep Learning No consensus ā€¢ Many approaches for libraries: integrate existing ones with Spark, build on top of Spark, modify Spark itself ā€¢ Official Spark MLlib support is limited(perceptron-like networks)
  • 11. One Framework to Rule Them All? Should we look for The One Deep Learning Framework?
  • 12. Databricksā€™ perspective ā€¢ Databricks: hosted Spark platform on public cloud ā€¢ GPUs for compute-intensive workloads ā€¢ Customers use many Deep Learning frameworks: TensorFlow, MXNet, BigDL, Theano, Caffe, and more This talk ā€¢ Lessons learned from supporting many Deep Learning frameworks ā€¢ Multiple ways to integrate Deep Learning & Spark ā€¢ Best practices for these integrations
  • 13. Outline ā€¢ Deep Learning in data pipelines ā€¢ Recurring patterns in Spark + Deep Learning integrations ā€¢ Developer tips ā€¢ Monitoring
  • 14. Outline ā€¢ Deep Learning in data pipelines ā€¢ Recurring patterns in Spark + Deep Learning integrations ā€¢ Developer tips ā€¢ Monitoring
  • 15. ML is a small part of data pipelines. Hidden technical debt in Machine Learning systems Sculley et al., NIPS 2016
  • 16. DL in a data pipeline: Training Data collection ETL Featurization Deep Learning Validation Export, Serving compute intensive IO intensiveIO intensive Large cluster High memory/CPU ratio Small cluster Low memory/CPU ratio
  • 17. DL in a data pipeline: Transformation Specialized data transforms: feature extraction & prediction Input Output cat dog dog Saulius Garalevicius - CC BY-SA3.0
  • 18. Outline ā€¢ Deep Learning in data pipelines ā€¢ Recurringpatterns in Spark + Deep Learning integrations ā€¢ Developer tips ā€¢ Monitoring
  • 19. Recurring patterns Spark as a scheduler ā€¢ Data-parallel tasks ā€¢ Data stored outside Spark Embedded Deep Learning transforms ā€¢ Data-parallel tasks ā€¢ Data stored in DataFrames/RDDs Cooperative frameworks ā€¢ Multiple passes over data ā€¢ Heavy and/or specialized communication
  • 20. Streaming data through DL Primary storage choices: ā€¢ Cold layer (HDFS/S3/etc.) ā€¢ Local storage: files, Sparkā€™s on-disk persistence layer ā€¢ In memory: SparkRDDs or SparkDataFrames Find out if you are I/O constrained or processor-constrained ā€¢ How big is your dataset? MNIST or ImageNet? If using PySpark: ā€¢ All frameworks heavily optimized for diskI/O ā€¢ Use Sparkā€™s broadcastfor small datasets that fitin memory ā€¢ Reading files is fast: use local files when it does not fit
  • 21. Cooperative frameworks ā€¢ Use Spark for data input ā€¢ Examples: ā€¢ IBM GPU efforts ā€¢ Skymindā€™s DeepLearning4J ā€¢ DistML and other Parameter Server efforts RDD Partition 1 Partition n RDD Partition 1 Partition m Black box
  • 22. Cooperative frameworks ā€¢ Bypass Spark for asynchronous / specific communication patterns across machines ā€¢ Lose benefit of RDDs and DataFrames and reproducibility/determinism ā€¢ But these guarantees are not requested anyway when doing deep learning (stochastic gradient) ā€¢ ā€œreproducibility is worth a factor of 2ā€ (Leon Bottou, quoted by John Langford)
  • 23. Outline ā€¢ Deep Learning in data pipelines ā€¢ Recurring patterns in Spark + Deep Learning integrations ā€¢ Developer tips ā€¢ Monitoring
  • 24. The GPU software stack ā€¢ Deep Learning commonly used with GPUs ā€¢ A lot of workon Spark dependencies: ā€¢ Few dependencies on local machine when compiling Spark ā€¢ The build process works well in a largenumber of configurations (just scala + maven) ā€¢ GPUs present challenges: CUDA, support libraries, drivers, etc. ā€¢ Deep softwarestack, requires careful construction (hardware+ drivers + CUDA + libraries) ā€¢ All these are expected by the user ā€¢ Turnkey stacks just starting to appear
  • 25. ā€¢ Provide a Docker image with all the GPU SDK ā€¢ Pre-install GPU drivers on the instance Container: nvidia-docker, lxc, etc. The GPU software stack GPU hardware Linux kernel NV Kernel driver CuBLAS CuDNN Deep learning libraries (Tensorflow, etc.) JCUDA Python / JVM clients CUDA NV kernel driver (userspace interface)
  • 26. Using GPUs through PySpark ā€¢ Popular choice for many independent tasks ā€¢ Many DL packages have Python interfaces: TensorFlow, Theano, Caffe, MXNet, etc. ā€¢ Lifetime for python packages: the process ā€¢ Requires some configuration tweaks in Spark
  • 27. PySpark recommendation ā€¢ spark.executor.cores = 1 ā€¢ Gives the DL framework full access over all the resources ā€¢ Important for frameworks that optimize processor pipelines
  • 28. Outline ā€¢ Deep Learning in data pipelines ā€¢ Recurring patterns in Spark + Deep Learning integrations ā€¢ Developer tips ā€¢ Monitoring
  • 30. Monitoring ā€¢ How do you monitor the progress of your tasks? ā€¢ It depends on the granularity ā€¢ Around tasks ā€¢ Inside (long-running) tasks
  • 31. Monitoring: Accumulators ā€¢ Good to check throughput or failure rate ā€¢ Works for Scala ā€¢ Limited use for Python (for now, SPARK-2868) ā€¢ No ā€œreal-timeā€ update batchesAcc = sc.accumulator(1) def processBatch(i): global acc acc += 1 # Process image batch here images = sc.parallelize(ā€¦) images.map(processBatch).collect()
  • 32. Monitoring: external system ā€¢ Plugs into an external system ā€¢ Existing solutions: Grafana, Graphite, Prometheus, etc. ā€¢ Most flexible, but more complex to deploy
  • 33. Conclusion ā€¢ Distributed deep learning: exciting and fast-moving space ā€¢ Most insights are specific to a task, a dataset and an algorithm: nothing replaces experiments ā€¢ Get started with data-parallel jobs ā€¢ Move to cooperative frameworks only when your data are too large.
  • 34. Challenges to address For Spark developers ā€¢ Monitoringlong-running tasks ā€¢ Presentingand introspecting intermediate results For DL developers ā€¢ What boundary to put between the algorithm and Spark? ā€¢ How to integrate with Spark at the low-level?
  • 35. Resources Recent blog posts ā€” http://databricks.com/blog ā€¢ TensorFrames ā€¢ GPU acceleration ā€¢ Getting started with Deep Learning ā€¢ Intelā€™s BigDL Docs for Deep Learning on Databricks ā€” http://docs.databricks.com ā€¢ Getting started ā€¢ Spark integration
  • 36. SPARK SUMMIT 2017 DATA SCIENCE AND ENGINEERING AT SCALE JUNE 5 ā€“ 7 | MOSCONE CENTER | SAN FRANCISCO ORGANIZED BY spark-summit.org/2017