Urs Köster Presenting at RE-Work DL Summit in Boston
1. Proprietary and confidential. Do not distribute.
Deep Learning at Scale
May 2016
Urs Köster, PhD
Nervana
MAKING MACHINES SMARTER.
2. Proprietary and confidential. Do not distribute.
ner va na
About nervana
2
• A platform for machine intelligence
• enable deep learning at scale
• optimized from algorithms to silicon
X
3. Proprietary and confidential. Do not distribute.
ner va na
The Nervana Platform - a full-stack solution
3
neon deep
learning
framework
nervana
cloud Solutions
Images
Text
Tabular
Speech
Time series
Video
4. neon: nervana python deep learning library
4
• User-friendly, extensible, fast
• Support for many deep learning models
• Interface to nervana cloud
• Multiple backends
• nervana engine
• GPU (optimized assembler kernels)
• CPU cluster
Open source (Apache 2.0) on
github.com/nervanaSystems/neon
6. Proprietary and confidential. Do not distribute.
ner va na
Deep learning as a core technology
6
DL
Photos Maps
Voice
Search
Self-driving
car
Ad
Targeting
Machine
Translation
‘Google Brain’ model
DL
Image
Classification
Object
Localization
Video
Indexing
Speech
Recognition
Nervana Platform
Natural
Language
7. Proprietary and confidential. Do not distribute.
ner va na
Video recognition with 3D convolution
7
Training Speed
0
0.25
0.5
0.75
1
epochs / hour
neon caffe
8. Proprietary and confidential. Do not distribute.
ner va na
Object Localization / Segmentation
8
CamVid Dataset
SegNet model
KITTI Dataset
Fast R-CNN model
neon (ms) caffe (ms) Speedup
Fast-RCNN (batch size=4) 360 670 1.8x
SegNet (batch size=4) 267 1455 5.4x
SegNet (4 GPUs, batch size=16) 348 -- *5.9x
11. Proprietary and confidential. Do not distribute.
ner va na
Imagenet ILSVRC Challenge
11
Top-5errorrate
0%
10%
20%
30%
2010 2011 2012 2013 2014 2015
Deep learning
human
performance
AlexNet
ClarifaiGoogleNet
ResNet
12. Proprietary and confidential. Do not distribute.
ner va na 12
• Same model, better performance:
• Hardware improvements
• Algorithmic improvements
Speeding up Deep Learning
0
100
200
300
400
500
600
CPU GTX580TitanX neon
Soumith's AlexNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neon
CuDNN
Soumith's GoogleNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neon
CuDNN
15,000
...
Alexnet ms / iteration
13. Proprietary and confidential. Do not distribute.
ner va na
Dennard scaling has ended
13
# OF PROCESSORS
LEARNING
SPEED
INDUSTRY STANDARD:
COMMUNICATION
OVERHEAD =
PERFORMANCE CEILING
NERVANA: BETTER
COMMUNICATION
FABRIC, NEAR
LINEAR SCALING
Transistors
Clock speed
Power
Perf / clock
14. Proprietary and confidential. Do not distribute.
ner va na
Nervana Engine (coming in 2017)
14
• Unprecedented computing power
• 10x speedup over current GPUs
• More memory on-chip
• High-Bandwidth Memory off-chip
• Six bi-directional high-bandwidth
links for 3D torus interconnect
• 8 chips in a box, seamlessly scale
to multiple chassis
15. Proprietary and confidential. Do not distribute.
ner va na
Summary
15
• Deep learning is a new computational paradigm
• Learning and Inference on data
• neon with state-of-the-art GPU kernels
• Nervana Cloud with multi-GPU training
• Watch for Nervana Engine deep learning processor