SlideShare a Scribd company logo
1 of 16
Download to read offline
Proprietary and confidential. Do not distribute.
Deep Learning at Scale
May 2016
Urs Köster, PhD
Nervana
MAKING MACHINES SMARTER.
Proprietary and confidential. Do not distribute.
ner va na
About nervana
2
• A platform for machine intelligence
• enable deep learning at scale
• optimized from algorithms to silicon
X
Proprietary and confidential. Do not distribute.
ner va na
The Nervana Platform - a full-stack solution
3
neon deep
learning
framework
nervana
cloud Solutions
Images
Text
Tabular
Speech
Time series
Video
neon: nervana python deep learning library
4
• User-friendly, extensible, fast
• Support for many deep learning models
• Interface to nervana cloud
• Multiple backends
• nervana engine
• GPU (optimized assembler kernels)
• CPU cluster
Open source (Apache 2.0) on
github.com/nervanaSystems/neon
Proprietary and confidential. Do not distribute.
ner va na
Nervana Cloud
5
web interface
command line
Proprietary and confidential. Do not distribute.
ner va na
Deep learning as a core technology
6
DL
Photos Maps
Voice
Search
Self-driving
car
Ad
Targeting
Machine
Translation
‘Google Brain’ model
DL
Image
Classification
Object
Localization
Video
Indexing
Speech
Recognition
Nervana Platform
Natural
Language
Proprietary and confidential. Do not distribute.
ner va na
Video recognition with 3D convolution
7
Training Speed
0
0.25
0.5
0.75
1
epochs / hour
neon caffe
Proprietary and confidential. Do not distribute.
ner va na
Object Localization / Segmentation
8
CamVid Dataset
SegNet model
KITTI Dataset
Fast R-CNN model
neon (ms) caffe (ms) Speedup
Fast-RCNN (batch size=4) 360 670 1.8x
SegNet (batch size=4) 267 1455 5.4x
SegNet (4 GPUs, batch size=16) 348 -- *5.9x
Proprietary and confidential. Do not distribute.
ner va na
Image Classification (Residual Network)
9
Proprietary and confidential. Do not distribute.
ner va na
Speech to text
10
Proprietary and confidential. Do not distribute.
ner va na
Imagenet ILSVRC Challenge
11
Top-5errorrate
0%
10%
20%
30%
2010 2011 2012 2013 2014 2015
Deep learning
human
performance
AlexNet
ClarifaiGoogleNet
ResNet
Proprietary and confidential. Do not distribute.
ner va na 12
• Same model, better performance:
• Hardware improvements
• Algorithmic improvements
Speeding up Deep Learning
0
100
200
300
400
500
600
CPU GTX580TitanX neon
Soumith's AlexNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neon
CuDNN
Soumith's GoogleNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neon
CuDNN
15,000
...
Alexnet ms / iteration
Proprietary and confidential. Do not distribute.
ner va na
Dennard scaling has ended
13
# OF PROCESSORS
LEARNING
SPEED
INDUSTRY STANDARD:
COMMUNICATION
OVERHEAD =
PERFORMANCE CEILING
NERVANA: BETTER
COMMUNICATION
FABRIC, NEAR
LINEAR SCALING
Transistors
Clock speed
Power
Perf / clock
Proprietary and confidential. Do not distribute.
ner va na
Nervana Engine (coming in 2017)
14
• Unprecedented computing power
• 10x speedup over current GPUs
• More memory on-chip
• High-Bandwidth Memory off-chip
• Six bi-directional high-bandwidth
links for 3D torus interconnect
• 8 chips in a box, seamlessly scale
to multiple chassis
Proprietary and confidential. Do not distribute.
ner va na
Summary
15
• Deep learning is a new computational paradigm
• Learning and Inference on data
• neon with state-of-the-art GPU kernels
• Nervana Cloud with multi-GPU training
• Watch for Nervana Engine deep learning processor
Urs Köster Presenting at RE-Work DL Summit in Boston

More Related Content

What's hot

Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntel Nervana
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
 
Nervana Systems
Nervana SystemsNervana Systems
Nervana SystemsNand Dalal
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA Taiwan
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenPoo Kuan Hoong
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMiguel González-Fierro
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeNVIDIA Taiwan
 

What's hot (20)

ODSC West
ODSC WestODSC West
ODSC West
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres Rodriguez
 
Nervana Systems
Nervana SystemsNervana Systems
Nervana Systems
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep Learning
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 

Viewers also liked

Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
Object recognition
Object recognitionObject recognition
Object recognitionsaniacorreya
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleIntel Nervana
 
Object recognition
Object recognitionObject recognition
Object recognitionakkichester
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Intel Nervana
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for InferenceIntel Nervana
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningIntel Nervana
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 

Viewers also liked (15)

Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model Example
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for Inference
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

Similar to Urs Köster Presenting at RE-Work DL Summit in Boston

Modern frameworks for machine learning
Modern frameworks for machine learningModern frameworks for machine learning
Modern frameworks for machine learningSergii Nechuiviter
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceNVIDIA
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introductionakira-ai
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobileAnirudh Koul
 
abelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxabelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxgopikahari7
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 
Deep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesDeep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesMindos Cheng
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
 
Deep Learning at the Edge
Deep Learning at the EdgeDeep Learning at the Edge
Deep Learning at the EdgeJulien SIMON
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseJosh Patterson
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 

Similar to Urs Köster Presenting at RE-Work DL Summit in Boston (20)

Modern frameworks for machine learning
Modern frameworks for machine learningModern frameworks for machine learning
Modern frameworks for machine learning
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and Inference
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobile
 
abelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxabelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptx
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
Deep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesDeep Learning Accelerator Design Techniques
Deep Learning Accelerator Design Techniques
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Deep Learning at the Edge
Deep Learning at the EdgeDeep Learning at the Edge
Deep Learning at the Edge
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Urs Köster Presenting at RE-Work DL Summit in Boston

  • 1. Proprietary and confidential. Do not distribute. Deep Learning at Scale May 2016 Urs Köster, PhD Nervana MAKING MACHINES SMARTER.
  • 2. Proprietary and confidential. Do not distribute. ner va na About nervana 2 • A platform for machine intelligence • enable deep learning at scale • optimized from algorithms to silicon X
  • 3. Proprietary and confidential. Do not distribute. ner va na The Nervana Platform - a full-stack solution 3 neon deep learning framework nervana cloud Solutions Images Text Tabular Speech Time series Video
  • 4. neon: nervana python deep learning library 4 • User-friendly, extensible, fast • Support for many deep learning models • Interface to nervana cloud • Multiple backends • nervana engine • GPU (optimized assembler kernels) • CPU cluster Open source (Apache 2.0) on github.com/nervanaSystems/neon
  • 5. Proprietary and confidential. Do not distribute. ner va na Nervana Cloud 5 web interface command line
  • 6. Proprietary and confidential. Do not distribute. ner va na Deep learning as a core technology 6 DL Photos Maps Voice Search Self-driving car Ad Targeting Machine Translation ‘Google Brain’ model DL Image Classification Object Localization Video Indexing Speech Recognition Nervana Platform Natural Language
  • 7. Proprietary and confidential. Do not distribute. ner va na Video recognition with 3D convolution 7 Training Speed 0 0.25 0.5 0.75 1 epochs / hour neon caffe
  • 8. Proprietary and confidential. Do not distribute. ner va na Object Localization / Segmentation 8 CamVid Dataset SegNet model KITTI Dataset Fast R-CNN model neon (ms) caffe (ms) Speedup Fast-RCNN (batch size=4) 360 670 1.8x SegNet (batch size=4) 267 1455 5.4x SegNet (4 GPUs, batch size=16) 348 -- *5.9x
  • 9. Proprietary and confidential. Do not distribute. ner va na Image Classification (Residual Network) 9
  • 10. Proprietary and confidential. Do not distribute. ner va na Speech to text 10
  • 11. Proprietary and confidential. Do not distribute. ner va na Imagenet ILSVRC Challenge 11 Top-5errorrate 0% 10% 20% 30% 2010 2011 2012 2013 2014 2015 Deep learning human performance AlexNet ClarifaiGoogleNet ResNet
  • 12. Proprietary and confidential. Do not distribute. ner va na 12 • Same model, better performance: • Hardware improvements • Algorithmic improvements Speeding up Deep Learning 0 100 200 300 400 500 600 CPU GTX580TitanX neon Soumith's AlexNet Benchmark ms 0 100 200 300 400 500 4/2015 8/2015 3/2016 neon CuDNN Soumith's GoogleNet Benchmark ms 0 100 200 300 400 500 4/2015 8/2015 3/2016 neon CuDNN 15,000 ... Alexnet ms / iteration
  • 13. Proprietary and confidential. Do not distribute. ner va na Dennard scaling has ended 13 # OF PROCESSORS LEARNING SPEED INDUSTRY STANDARD: COMMUNICATION OVERHEAD = PERFORMANCE CEILING NERVANA: BETTER COMMUNICATION FABRIC, NEAR LINEAR SCALING Transistors Clock speed Power Perf / clock
  • 14. Proprietary and confidential. Do not distribute. ner va na Nervana Engine (coming in 2017) 14 • Unprecedented computing power • 10x speedup over current GPUs • More memory on-chip • High-Bandwidth Memory off-chip • Six bi-directional high-bandwidth links for 3D torus interconnect • 8 chips in a box, seamlessly scale to multiple chassis
  • 15. Proprietary and confidential. Do not distribute. ner va na Summary 15 • Deep learning is a new computational paradigm • Learning and Inference on data • neon with state-of-the-art GPU kernels • Nervana Cloud with multi-GPU training • Watch for Nervana Engine deep learning processor