Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Spark Technology
Center
Convolutional Neural Networks at
Scale in MLlib
Jeremy
Nixon

Spark Technology
Center
1. Machine Learning Engineer at the Spark
Technology Center
2. Contributor to MLlib, dedicated to
scalable deep learning.
3. Previously, studied Applied Mathematics
to Computer Science and Economics at
Harvard
Jeremy Nixon

Spark Technology
Center
Large Scale Data Processing
● In-memory compute
● Up to 100x faster than Hadoop
Improved Usability
● Rich APIs in Scala, Java, Python
● Interactive Shell

Spark Technology
Center
Spark’s Machine Learning Library
● Alternating Least Squares
● Lasso
● Ridge Regression
● Logistic Regression
● Decision Trees
● Naive Bayes
● SVMs
● …
MLlib

Spark Technology
Center
Part of Spark
● Integrated Data Analysis
●
Scalable
Python, Scala, Java APIs
MLlib

Spark Technology
Center
● Deep Learning benefits from large
datasets
● Spark allows for Large Scale Data
Analysis
● Compute is Local to Data
● Integrated into organization’s Spark
Jobs
● Leverages existing compute cluster
Deep Learning
in MLlib

Spark Technology
Center
Github Link:
https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/sparkdl
Links

Spark Technology
Center
1. Framing Deep Learning
2. MLlib Deep Learning API
3. Optimization
4. Performance
5. Future Work
6. Deep Learning Options on Spark
7. Deep Learning Outside of Spark
Structure

Spark Technology
Center
1. Structural Assumptions
2. Automated Feature Engineering
3. Learning Representations
4. Applications
Framing
Convolutional
Neural Networks

Spark Technology
Center
Structural
Assumptions:
Location
Invariance
- Convolution is a restriction on the
features that can be combined.
- Location Invariance leads to strong
accuracy in vision, audio, and
language.
colah.github.io

Spark Technology
Center
Structural
Assumptions:
Hierarchical
Abstraction

Spark Technology
Center
- Pixels - Edges - Shapes - Parts - Objects
- Learn features that are optimized for the
data
- Makes transfer learning feasible
Structural
Assumptions:
Hierarchical
Abstraction

Spark Technology
Center
- Character - Word - Phrase - Sentence
- Phonemes - Words
- Pixels - Edges - Shapes - Parts - Objects
Structural
Assumptions:
Composition

Spark Technology
Center
1. CNNs - State of the art
a. Object Recognition
b. Object Localization
c. Image Segmentation
d. Image Restoration
e. Music Recommendation
2. RNNs (LSTM) - State of the Art
a. Speech Recognition
b. Question Answering
c. Machine Translation
d. Text Summarization
e. Named Entity Recognition
f. Natural Language Generation
g. Word Sense Disambiguation
h. Image / Video Captioning
i. Sentiment Analysis
Applications

Spark Technology
Center
● Computationally Efficient
● Makes Transfer Learning Easy
● Takes advantage of location
invariance
Structural
Assumptions:
Weight Sharing

Spark Technology
Center
- Network depth creates an extraordinary
range of possible models.
- That flexibility creates value in large
datasets to reduce variance.
Structural
Assumptions:
Combinatorial
Flexibility

Spark Technology
Center
Automated
Feature
Engineering
- Feature hierarchy is too complex to engineer manually
- Works well for compositional structure, overfits elsewhere

Spark Technology
Center
Learning
Representations
Hidden Layer
+
Nonlinearity
http://colah.github.io/posts/2014-03-NN-Manifolds-To
pology/

Spark Technology
Center
Flexibility. High level enough to be efficient.
Low level enough to be expressive.
MLlib Flexible Deep
Learning API

Spark Technology
Center
Modularity enables Logistic Regression,
Feedforward Networks.
MLlib Flexible Deep
Learning API

Spark Technology
Center
Optimization
Modern optimizers allow for
more efficient, stable
training.
Momentum cancels noise in
the gradient.

Spark Technology
Center
Optimization
Modern optimizers allow for
more efficient, stable
training.
RMSProp automatically
adapts the learning rate.

Spark Technology
Center
Parallel implementation of
backpropagation:
1. Each worker gets weights from master
node.
2. Each worker computes a gradient on its
data.
3. Each worker sends gradient to master.
4. Master averages the gradients and
updates the weights.
Distributed
Optimization

Spark Technology
Center
● Parallel MLP on Spark with 7 nodes ~=
Caffe w/GPU (single node).
● Advantages to parallelism diminish with
additional nodes due to
communication costs.
● Additional workers are valuable up to
~20 workers.
● See
https://github.com/avulanov/ann-benc
hmark for more details
Performance

Spark Technology
Center
Github: https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/s
parkdl
Access

Spark Technology
Center
1. GPU Acceleration (External)
2. Python API
3. Keras Integration
4. Residual Layers
5. Hardening
6. Regularization
7. Batch Normalization
8. Tensor Support
Future Work

Deep Learning on Spark
1. Major Projects
a. DL4J
b. BigDL
c. Spark-deep-learning
d. Tensorflow-on-Spark
e. SystemML
2. Important Comparisons
3. Minor & Abandoned Projects
a. H20AI DeepWater
b. TensorFrames
c. Caffe-on-Spark
d. Scalable-deep-learning
e. MLlib Deep Learning
f. Sparknet
g. DeepDist

● Distributed GPU support for all major deep learning architectures
○ CPU / Distributed CPU / Single GPU options exist
○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec
● Actively Supported and Improved
● APIs in Java, Scala, Python
○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end)
○ Working towards becoming a Keras Backend
● Backed by Skymind (Committed)
○ ~15 person startup, Adam Gibson + Chris Nicholson
● Modular front end in DL4J
● Backed by linear algebra library ND4J
○ Numerical computing wrapper over BLAS for various backends
● Python API has Keras import / export
● Production with proprietary ‘Skymind Intelligence Layer’
DL4J

BigDL
● Distributed CPU based library
○ Backed by Intel MKL / multithreading
○ No benchmark out as yet
● Support for most major deep learning architectures
○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove
● Backed by Intel (Committed)
○ Actively Supported / Improved
○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear.
○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL
● Scala and Python API Support
○ API Modeled after Torch
● Support for numeric computing via tensors

Spark-deep-learning
● Databricks’ library focused on model serving, to allow scaled out inference
● ‘Transfer Learning’ (Allows logistic regression layer to be retrained)
● Python API
○ One-liner for integrating Keras model into a pipeline
● Supports Tensorflow models
○ Keras Import for Tensorflow backed Keras Models
● Support for image processing only
● Weakly Supported by Databricks
○ Last commit was a month ago
○ Qualifying lines - “We will implement text processing, audio processing if there is interest”

1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setup
a. Each executor launches a Caffe / TF instance
b. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s
ethernet model
2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout
3. Allows for Model / Data parallelism
4. Weakly supported by Yahoo
a. Caffe-on-spark hasn’t seen a commit in 6 months
b. Tensorflow-on-spark gets about 2 minor commits / month
5. Yahoo demonstrated capability on large scale Flickr dataset
6. Visualization with tensorboard
Caffe / Tensorflow -on-Spark

SystemML
● Deep Learning library with single-node GPU support, moving towards
distributed GPU support
○ Supports CNNs for Classification, Localization, Segmentation
○ Supports RNNs / LSTM
● Attached to linear algebra focused ML library w/ linear algebra compiler
● Backed by IBM
○ Actively being Improved
● Provides CPU based support for most computer vision tasks
○ Convolutional Networks
● Caffe2DML for caffe integration
● DML API
○ SystemML has Python API for a handful of algorithms, may come out with Python DL API

Important Comparisons
Framework Hardware Supported Models API
DL4J CPU / GPU,
Distributed CPU / GPU
CNNs, RNNs,
Feedforward Nets,
Word2Vec
Java, Scala, Python
BigDL CPU / Distributed CPU CNNs, RNNs,
Feedforward Nets
Scala, Python
Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs,
Feedforward Nets
Python
Caffe / Tensorflow on Spark CPU / GPU,
Distributed CPU / GPU
CNNs, RNNs,
Feedforward Nets,
Word2Vec
Python
SystemML Deep Learning CPU, Towards GPU /
Distrbuted GPU
CNNs, RNNS,
Feedforward Nets
DML, Potentially
Python

Important Comparisons
Framework Support Strength Goal Distinguishing Value
DL4J Skymind. Fully focused
on package, but still a
Startup.
Fully fledged Deep
Learning solution from
training to production
Comprehensive,
Distributed GPU.
BigDL Intel. Fairly strong
AI/DL commitment.
Has Chainer, Nirvana.
Spark / Hadoop
solution, bring DL to
the data
Comprehensive
Spark-Deep-Learning Databricks, ambiguous
level of commitment
Scaleout solution for
TF users
Scaling out with Spark
at inference time
Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark
looks abandoned,
TF-on Spark better.
Scaling out training on
heterogenous
hardware.
Scaling out training
with distributed CPU /
GPU.
SystemML Deep Learning IBM team. Deep Learning
Training solution
GPU Support, Moving
towards Distributed
GPU Support.

Minor & Abandoned Projects
1. H20AI DeepWater
a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platform
b. Only native support is for feedforward networks
2. MXNet Integration
a. Nascent, few commits from Microsoft engineer
3. TensorFrames
a. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month
4. Caffe-on-Spark
a. No commits for ~6 months
5. Scalable-deep-learning
a. Only supports feedforward networks / autoencoder, CPU based
6. MLlib Deep Learning
a. Only supports feedforward networks, CPU based
7. Sparknet
a. Abandoned, no commits for 18 months

Deep Learning
Outside of Spark

Spark Technology
Center
Thank you for your attention!
Questions?

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Similar to Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017 (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017