SlideShare a Scribd company logo
1 of 38
Download to read offline
Spark Technology
Center
Convolutional Neural Networks at
Scale in MLlib
Jeremy
Nixon
Spark Technology
Center
1. Machine Learning Engineer at the Spark
Technology Center
2. Contributor to MLlib, dedicated to
scalable deep learning.
3. Previously, studied Applied Mathematics
to Computer Science and Economics at
Harvard
Jeremy Nixon
Spark Technology
Center
Large Scale Data Processing
● In-memory compute
● Up to 100x faster than Hadoop
Improved Usability
● Rich APIs in Scala, Java, Python
● Interactive Shell
Spark Technology
Center
Spark’s Machine Learning Library
● Alternating Least Squares
● Lasso
● Ridge Regression
● Logistic Regression
● Decision Trees
● Naive Bayes
● SVMs
● …
MLlib
Spark Technology
Center
Part of Spark
● Integrated Data Analysis
●
Scalable
Python, Scala, Java APIs
MLlib
Spark Technology
Center
● Deep Learning benefits from large
datasets
● Spark allows for Large Scale Data
Analysis
● Compute is Local to Data
● Integrated into organization’s Spark
Jobs
● Leverages existing compute cluster
Deep Learning
in MLlib
Spark Technology
Center
Github Link:
https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/sparkdl
Links
Spark Technology
Center
1. Framing Deep Learning
2. MLlib Deep Learning API
3. Optimization
4. Performance
5. Future Work
6. Deep Learning Options on Spark
7. Deep Learning Outside of Spark
Structure
Spark Technology
Center
1. Structural Assumptions
2. Automated Feature Engineering
3. Learning Representations
4. Applications
Framing
Convolutional
Neural Networks
Spark Technology
Center
Structural
Assumptions:
Location
Invariance
- Convolution is a restriction on the
features that can be combined.
- Location Invariance leads to strong
accuracy in vision, audio, and
language.
colah.github.io
Spark Technology
Center
Structural
Assumptions:
Hierarchical
Abstraction
Spark Technology
Center
- Pixels - Edges - Shapes - Parts - Objects
- Learn features that are optimized for the
data
- Makes transfer learning feasible
Structural
Assumptions:
Hierarchical
Abstraction
Spark Technology
Center
- Character - Word - Phrase - Sentence
- Phonemes - Words
- Pixels - Edges - Shapes - Parts - Objects
Structural
Assumptions:
Composition
Spark Technology
Center
1. CNNs - State of the art
a. Object Recognition
b. Object Localization
c. Image Segmentation
d. Image Restoration
e. Music Recommendation
2. RNNs (LSTM) - State of the Art
a. Speech Recognition
b. Question Answering
c. Machine Translation
d. Text Summarization
e. Named Entity Recognition
f. Natural Language Generation
g. Word Sense Disambiguation
h. Image / Video Captioning
i. Sentiment Analysis
Applications
Spark Technology
Center
● Computationally Efficient
● Makes Transfer Learning Easy
● Takes advantage of location
invariance
Structural
Assumptions:
Weight Sharing
Spark Technology
Center
- Network depth creates an extraordinary
range of possible models.
- That flexibility creates value in large
datasets to reduce variance.
Structural
Assumptions:
Combinatorial
Flexibility
Spark Technology
Center
Automated
Feature
Engineering
- Feature hierarchy is too complex to engineer manually
- Works well for compositional structure, overfits elsewhere
Spark Technology
Center
Learning
Representations
Hidden Layer
+
Nonlinearity
http://colah.github.io/posts/2014-03-NN-Manifolds-To
pology/
Spark Technology
Center
Flexibility. High level enough to be efficient.
Low level enough to be expressive.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Modularity enables Logistic Regression,
Feedforward Networks.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Optimization
Modern optimizers allow for
more efficient, stable
training.
Momentum cancels noise in
the gradient.
Spark Technology
Center
Optimization
Modern optimizers allow for
more efficient, stable
training.
RMSProp automatically
adapts the learning rate.
Spark Technology
Center
Parallel implementation of
backpropagation:
1. Each worker gets weights from master
node.
2. Each worker computes a gradient on its
data.
3. Each worker sends gradient to master.
4. Master averages the gradients and
updates the weights.
Distributed
Optimization
Spark Technology
Center
● Parallel MLP on Spark with 7 nodes ~=
Caffe w/GPU (single node).
● Advantages to parallelism diminish with
additional nodes due to
communication costs.
● Additional workers are valuable up to
~20 workers.
● See
https://github.com/avulanov/ann-benc
hmark for more details
Performance
Spark Technology
Center
Github: https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/s
parkdl
Access
Spark Technology
Center
1. GPU Acceleration (External)
2. Python API
3. Keras Integration
4. Residual Layers
5. Hardening
6. Regularization
7. Batch Normalization
8. Tensor Support
Future Work
Deep Learning on Spark
1. Major Projects
a. DL4J
b. BigDL
c. Spark-deep-learning
d. Tensorflow-on-Spark
e. SystemML
2. Important Comparisons
3. Minor & Abandoned Projects
a. H20AI DeepWater
b. TensorFrames
c. Caffe-on-Spark
d. Scalable-deep-learning
e. MLlib Deep Learning
f. Sparknet
g. DeepDist
● Distributed GPU support for all major deep learning architectures
○ CPU / Distributed CPU / Single GPU options exist
○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec
● Actively Supported and Improved
● APIs in Java, Scala, Python
○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end)
○ Working towards becoming a Keras Backend
● Backed by Skymind (Committed)
○ ~15 person startup, Adam Gibson + Chris Nicholson
● Modular front end in DL4J
● Backed by linear algebra library ND4J
○ Numerical computing wrapper over BLAS for various backends
● Python API has Keras import / export
● Production with proprietary ‘Skymind Intelligence Layer’
DL4J
BigDL
● Distributed CPU based library
○ Backed by Intel MKL / multithreading
○ No benchmark out as yet
● Support for most major deep learning architectures
○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove
● Backed by Intel (Committed)
○ Actively Supported / Improved
○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear.
○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL
● Scala and Python API Support
○ API Modeled after Torch
● Support for numeric computing via tensors
Spark-deep-learning
● Databricks’ library focused on model serving, to allow scaled out inference
● ‘Transfer Learning’ (Allows logistic regression layer to be retrained)
● Python API
○ One-liner for integrating Keras model into a pipeline
● Supports Tensorflow models
○ Keras Import for Tensorflow backed Keras Models
● Support for image processing only
● Weakly Supported by Databricks
○ Last commit was a month ago
○ Qualifying lines - “We will implement text processing, audio processing if there is interest”
1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setup
a. Each executor launches a Caffe / TF instance
b. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s
ethernet model
2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout
3. Allows for Model / Data parallelism
4. Weakly supported by Yahoo
a. Caffe-on-spark hasn’t seen a commit in 6 months
b. Tensorflow-on-spark gets about 2 minor commits / month
5. Yahoo demonstrated capability on large scale Flickr dataset
6. Visualization with tensorboard
Caffe / Tensorflow -on-Spark
SystemML
● Deep Learning library with single-node GPU support, moving towards
distributed GPU support
○ Supports CNNs for Classification, Localization, Segmentation
○ Supports RNNs / LSTM
● Attached to linear algebra focused ML library w/ linear algebra compiler
● Backed by IBM
○ Actively being Improved
● Provides CPU based support for most computer vision tasks
○ Convolutional Networks
● Caffe2DML for caffe integration
● DML API
○ SystemML has Python API for a handful of algorithms, may come out with Python DL API
Important Comparisons
Framework Hardware Supported Models API
DL4J CPU / GPU,
Distributed CPU / GPU
CNNs, RNNs,
Feedforward Nets,
Word2Vec
Java, Scala, Python
BigDL CPU / Distributed CPU CNNs, RNNs,
Feedforward Nets
Scala, Python
Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs,
Feedforward Nets
Python
Caffe / Tensorflow on Spark CPU / GPU,
Distributed CPU / GPU
CNNs, RNNs,
Feedforward Nets,
Word2Vec
Python
SystemML Deep Learning CPU, Towards GPU /
Distrbuted GPU
CNNs, RNNS,
Feedforward Nets
DML, Potentially
Python
Important Comparisons
Framework Support Strength Goal Distinguishing Value
DL4J Skymind. Fully focused
on package, but still a
Startup.
Fully fledged Deep
Learning solution from
training to production
Comprehensive,
Distributed GPU.
BigDL Intel. Fairly strong
AI/DL commitment.
Has Chainer, Nirvana.
Spark / Hadoop
solution, bring DL to
the data
Comprehensive
Spark-Deep-Learning Databricks, ambiguous
level of commitment
Scaleout solution for
TF users
Scaling out with Spark
at inference time
Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark
looks abandoned,
TF-on Spark better.
Scaling out training on
heterogenous
hardware.
Scaling out training
with distributed CPU /
GPU.
SystemML Deep Learning IBM team. Deep Learning
Training solution
GPU Support, Moving
towards Distributed
GPU Support.
Minor & Abandoned Projects
1. H20AI DeepWater
a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platform
b. Only native support is for feedforward networks
2. MXNet Integration
a. Nascent, few commits from Microsoft engineer
3. TensorFrames
a. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month
4. Caffe-on-Spark
a. No commits for ~6 months
5. Scalable-deep-learning
a. Only supports feedforward networks / autoencoder, CPU based
6. MLlib Deep Learning
a. Only supports feedforward networks, CPU based
7. Sparknet
a. Abandoned, no commits for 18 months
Deep Learning
Outside of Spark
Deep Learning
Outside of Spark
Spark Technology
Center
Thank you for your attention!
Questions?

More Related Content

What's hot

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer ModelsDatabricks
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherDatabricks
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba... DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...Databricks
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Databricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Databricks
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsDatabricks
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkSpark Summit
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionDatabricks
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibDataWorks Summit
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranDatabricks
 

What's hot (20)

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba... DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlib
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 

Similar to Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkDatabricks
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systemsinside-BigData.com
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayNick Pentreath
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Wee Hyong Tok
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopJosh Patterson
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningAdam Gibson
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jGuglielmo Iozzia
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkDatabricks
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Simplilearn
 

Similar to Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017 (20)

Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
eScience Cluster Arch. Overview
eScience Cluster Arch. OvervieweScience Cluster Arch. Overview
eScience Cluster Arch. Overview
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep Learning
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4j
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

  • 1. Spark Technology Center Convolutional Neural Networks at Scale in MLlib Jeremy Nixon
  • 2. Spark Technology Center 1. Machine Learning Engineer at the Spark Technology Center 2. Contributor to MLlib, dedicated to scalable deep learning. 3. Previously, studied Applied Mathematics to Computer Science and Economics at Harvard Jeremy Nixon
  • 3. Spark Technology Center Large Scale Data Processing ● In-memory compute ● Up to 100x faster than Hadoop Improved Usability ● Rich APIs in Scala, Java, Python ● Interactive Shell
  • 4. Spark Technology Center Spark’s Machine Learning Library ● Alternating Least Squares ● Lasso ● Ridge Regression ● Logistic Regression ● Decision Trees ● Naive Bayes ● SVMs ● … MLlib
  • 5. Spark Technology Center Part of Spark ● Integrated Data Analysis ● Scalable Python, Scala, Java APIs MLlib
  • 6. Spark Technology Center ● Deep Learning benefits from large datasets ● Spark allows for Large Scale Data Analysis ● Compute is Local to Data ● Integrated into organization’s Spark Jobs ● Leverages existing compute cluster Deep Learning in MLlib
  • 7. Spark Technology Center Github Link: https://github.com/JeremyNixon/sparkdl Spark Package: https://spark-packages.org/package/JeremyNixon/sparkdl Links
  • 8. Spark Technology Center 1. Framing Deep Learning 2. MLlib Deep Learning API 3. Optimization 4. Performance 5. Future Work 6. Deep Learning Options on Spark 7. Deep Learning Outside of Spark Structure
  • 9. Spark Technology Center 1. Structural Assumptions 2. Automated Feature Engineering 3. Learning Representations 4. Applications Framing Convolutional Neural Networks
  • 10. Spark Technology Center Structural Assumptions: Location Invariance - Convolution is a restriction on the features that can be combined. - Location Invariance leads to strong accuracy in vision, audio, and language. colah.github.io
  • 12. Spark Technology Center - Pixels - Edges - Shapes - Parts - Objects - Learn features that are optimized for the data - Makes transfer learning feasible Structural Assumptions: Hierarchical Abstraction
  • 13. Spark Technology Center - Character - Word - Phrase - Sentence - Phonemes - Words - Pixels - Edges - Shapes - Parts - Objects Structural Assumptions: Composition
  • 14. Spark Technology Center 1. CNNs - State of the art a. Object Recognition b. Object Localization c. Image Segmentation d. Image Restoration e. Music Recommendation 2. RNNs (LSTM) - State of the Art a. Speech Recognition b. Question Answering c. Machine Translation d. Text Summarization e. Named Entity Recognition f. Natural Language Generation g. Word Sense Disambiguation h. Image / Video Captioning i. Sentiment Analysis Applications
  • 15. Spark Technology Center ● Computationally Efficient ● Makes Transfer Learning Easy ● Takes advantage of location invariance Structural Assumptions: Weight Sharing
  • 16. Spark Technology Center - Network depth creates an extraordinary range of possible models. - That flexibility creates value in large datasets to reduce variance. Structural Assumptions: Combinatorial Flexibility
  • 17. Spark Technology Center Automated Feature Engineering - Feature hierarchy is too complex to engineer manually - Works well for compositional structure, overfits elsewhere
  • 19. Spark Technology Center Flexibility. High level enough to be efficient. Low level enough to be expressive. MLlib Flexible Deep Learning API
  • 20. Spark Technology Center Modularity enables Logistic Regression, Feedforward Networks. MLlib Flexible Deep Learning API
  • 21. Spark Technology Center Optimization Modern optimizers allow for more efficient, stable training. Momentum cancels noise in the gradient.
  • 22. Spark Technology Center Optimization Modern optimizers allow for more efficient, stable training. RMSProp automatically adapts the learning rate.
  • 23. Spark Technology Center Parallel implementation of backpropagation: 1. Each worker gets weights from master node. 2. Each worker computes a gradient on its data. 3. Each worker sends gradient to master. 4. Master averages the gradients and updates the weights. Distributed Optimization
  • 24. Spark Technology Center ● Parallel MLP on Spark with 7 nodes ~= Caffe w/GPU (single node). ● Advantages to parallelism diminish with additional nodes due to communication costs. ● Additional workers are valuable up to ~20 workers. ● See https://github.com/avulanov/ann-benc hmark for more details Performance
  • 25. Spark Technology Center Github: https://github.com/JeremyNixon/sparkdl Spark Package: https://spark-packages.org/package/JeremyNixon/s parkdl Access
  • 26. Spark Technology Center 1. GPU Acceleration (External) 2. Python API 3. Keras Integration 4. Residual Layers 5. Hardening 6. Regularization 7. Batch Normalization 8. Tensor Support Future Work
  • 27. Deep Learning on Spark 1. Major Projects a. DL4J b. BigDL c. Spark-deep-learning d. Tensorflow-on-Spark e. SystemML 2. Important Comparisons 3. Minor & Abandoned Projects a. H20AI DeepWater b. TensorFrames c. Caffe-on-Spark d. Scalable-deep-learning e. MLlib Deep Learning f. Sparknet g. DeepDist
  • 28. ● Distributed GPU support for all major deep learning architectures ○ CPU / Distributed CPU / Single GPU options exist ○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec ● Actively Supported and Improved ● APIs in Java, Scala, Python ○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end) ○ Working towards becoming a Keras Backend ● Backed by Skymind (Committed) ○ ~15 person startup, Adam Gibson + Chris Nicholson ● Modular front end in DL4J ● Backed by linear algebra library ND4J ○ Numerical computing wrapper over BLAS for various backends ● Python API has Keras import / export ● Production with proprietary ‘Skymind Intelligence Layer’ DL4J
  • 29. BigDL ● Distributed CPU based library ○ Backed by Intel MKL / multithreading ○ No benchmark out as yet ● Support for most major deep learning architectures ○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove ● Backed by Intel (Committed) ○ Actively Supported / Improved ○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear. ○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL ● Scala and Python API Support ○ API Modeled after Torch ● Support for numeric computing via tensors
  • 30. Spark-deep-learning ● Databricks’ library focused on model serving, to allow scaled out inference ● ‘Transfer Learning’ (Allows logistic regression layer to be retrained) ● Python API ○ One-liner for integrating Keras model into a pipeline ● Supports Tensorflow models ○ Keras Import for Tensorflow backed Keras Models ● Support for image processing only ● Weakly Supported by Databricks ○ Last commit was a month ago ○ Qualifying lines - “We will implement text processing, audio processing if there is interest”
  • 31. 1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setup a. Each executor launches a Caffe / TF instance b. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s ethernet model 2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout 3. Allows for Model / Data parallelism 4. Weakly supported by Yahoo a. Caffe-on-spark hasn’t seen a commit in 6 months b. Tensorflow-on-spark gets about 2 minor commits / month 5. Yahoo demonstrated capability on large scale Flickr dataset 6. Visualization with tensorboard Caffe / Tensorflow -on-Spark
  • 32. SystemML ● Deep Learning library with single-node GPU support, moving towards distributed GPU support ○ Supports CNNs for Classification, Localization, Segmentation ○ Supports RNNs / LSTM ● Attached to linear algebra focused ML library w/ linear algebra compiler ● Backed by IBM ○ Actively being Improved ● Provides CPU based support for most computer vision tasks ○ Convolutional Networks ● Caffe2DML for caffe integration ● DML API ○ SystemML has Python API for a handful of algorithms, may come out with Python DL API
  • 33. Important Comparisons Framework Hardware Supported Models API DL4J CPU / GPU, Distributed CPU / GPU CNNs, RNNs, Feedforward Nets, Word2Vec Java, Scala, Python BigDL CPU / Distributed CPU CNNs, RNNs, Feedforward Nets Scala, Python Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs, Feedforward Nets Python Caffe / Tensorflow on Spark CPU / GPU, Distributed CPU / GPU CNNs, RNNs, Feedforward Nets, Word2Vec Python SystemML Deep Learning CPU, Towards GPU / Distrbuted GPU CNNs, RNNS, Feedforward Nets DML, Potentially Python
  • 34. Important Comparisons Framework Support Strength Goal Distinguishing Value DL4J Skymind. Fully focused on package, but still a Startup. Fully fledged Deep Learning solution from training to production Comprehensive, Distributed GPU. BigDL Intel. Fairly strong AI/DL commitment. Has Chainer, Nirvana. Spark / Hadoop solution, bring DL to the data Comprehensive Spark-Deep-Learning Databricks, ambiguous level of commitment Scaleout solution for TF users Scaling out with Spark at inference time Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark looks abandoned, TF-on Spark better. Scaling out training on heterogenous hardware. Scaling out training with distributed CPU / GPU. SystemML Deep Learning IBM team. Deep Learning Training solution GPU Support, Moving towards Distributed GPU Support.
  • 35. Minor & Abandoned Projects 1. H20AI DeepWater a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platform b. Only native support is for feedforward networks 2. MXNet Integration a. Nascent, few commits from Microsoft engineer 3. TensorFrames a. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month 4. Caffe-on-Spark a. No commits for ~6 months 5. Scalable-deep-learning a. Only supports feedforward networks / autoencoder, CPU based 6. MLlib Deep Learning a. Only supports feedforward networks, CPU based 7. Sparknet a. Abandoned, no commits for 18 months
  • 38. Spark Technology Center Thank you for your attention! Questions?