SlideShare a Scribd company logo
1 of 42
Download to read offline
Proprietary and confidential. Do not distribute.
Nervana and the
Future of Computing
26 April 2016
Arjun Bansal
Co-founder & VP Algorithms, Nervana
MAKING MACHINES SMARTER.™
Proprietary and confidential. Do not distribute.
AI on demand using Deep Learning
2
DL
Image
Classification
Object
Localization
Video
Indexing
Text
Analysis
Nervana Platform
Machine
Translation
Proprietary and confidential. Do not distribute.
Image classification and video activity detection
3
Deep learning model Potential applications
• Trained on a public dataset1 of
13K videos in 100 categories
• Training was approximately 3
times faster than competitive
framework
• Can be extended to perform
scene and object detection,
action similarity labeling, video
retrieval, anomaly detection
1: UCF101 dataset: http://crcv.ucf.edu/data/UCF101.php
• Activity detection and
monitoring for security
• Automatic editing of captured
moments from video camera
• Facial recognition and image
based retrieval
• Sense and avoid systems for
autonomous driving
• Baggage screening at airports
and other public venueshttps://www.youtube.com/watch?v=ydnpgUOpdBw
Proprietary and confidential. Do not distribute.ner va na
Object localization and recognition
4
Proprietary and confidential. Do not distribute.ner va na
Speech to text
5
https://youtu.be/NaqZkV_fBIM
Proprietary and confidential. Do not distribute.ner va na
Question answering
6
Stories
Mary journeyed to Texas.
John went to Maryland.
Mary went to Iowa.
John travelled to Florida.
Questions
Answers
Where is John located?
Florida
Proprietary and confidential. Do not distribute.ner va na
Reinforcement learning
7
Pong Breakout
https://youtu.be/KkIf0Ok5GCEhttps://youtu.be/0ZlgrQS3krg
Proprietary and confidential. Do not distribute.ner va na
Application areas
8
Healthcare Agriculture Finance
Online Services Automotive Energy
Proprietary and confidential. Do not distribute.
Nervana is building the future of computing
9
The Economist, March 12, 2016
Cloud Computing
Custom ASIC
Deep Learning / AI
Proprietary and confidential. Do not distribute.ner va na
nervana cloud
10
Images
Text
Tabular
Speech
Time series
Video
Data
import trainbuild deploy
Cloud
Proprietary and confidential. Do not distribute.ner va na
nervana neon
11
Proprietary and confidential. Do not distribute.ner va na
nervana neon
11
• Fastest library
Proprietary and confidential. Do not distribute.ner va na
nervana neon
11
• Fastest library
Proprietary and confidential. Do not distribute.ner va na
nervana neon
11
• Fastest library
• Model support Models
• Convnet
• RNN, LSTM
• MLP
• DQN
• NTM
Domains
• Images
• Video
• Speech
• Text
• Time series
Proprietary and confidential. Do not distribute.ner va na
Running locally:
% python rnn.py # or neon rnn.yaml
Running in nervana cloud:
% ncloud submit —py rnn.py # or —yaml rnn.yaml
% ncloud show <model_id>
% ncloud list
% ncloud deploy <model_id>
% ncloud predict <model_id> <data> # or use REST api
nervana neon
11
• Fastest library
• Model support
• Cloud integration
Proprietary and confidential. Do not distribute.ner va na
Backends
• CPU
• GPU
• Multiple GPUs
• Parameter server
• (Xeon Phi)
• nervana TPU
nervana neon
11
• Fastest library
• Model support
• Cloud integration
• Multiple backends
Proprietary and confidential. Do not distribute.ner va na
nervana neon
11
• Fastest library
• Model support
• Cloud integration
• Multiple backends
• Optimized at assembler level
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• Unprecedented compute density
=1
nervana
engine
10 GPUs
200 CPUs
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• Unprecedented compute density
• Scalable distributed architecture
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
Instruction
and data
memory
Ctrl
ALU
CPU
Data
Memory
Ctrl
Nervana
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
• Learning and inference
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
• Learning and inference
• Exploit limited precision
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
• Learning and inference
• Exploit limited precision
• Power efficiency
Proprietary and confidential. Do not distribute.ner va na
nervana tensor processing unit (TPU)
12
• 10-100x gain
• Architecture optimized for
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
• Learning and inference
• Exploit limited precision
• Power efficiency
Proprietary and confidential. Do not distribute.ner va na
Special purpose computation
13
1940s: Turing Bombe
Motivation: Automating
calculations, code breaking
Proprietary and confidential. Do not distribute.ner va na
General purpose computation
14
2000s: SoC
Motivation: reduce power
and cost, fungible
computing.
Enabled inexpensive
mobile devices.
Proprietary and confidential. Do not distribute.ner va na
Dennard scaling has ended
15
What business and
technology constraints do
we have now?
Proprietary and confidential. Do not distribute.ner va na
Many-core tiled architectures
16
Tile Processor Architecture Overview for the TILEPro Series 5
and provides high bandwidth and extremely low latency communication among tiles. The Tile
Processor™ integrates external memory and I/O interfaces on chip and is a complete programma-
ble multicore processor. External memory and I/O interfaces are connected to the tiles via the
iMesh interconnect.
Figure 2-1 shows the 64-core TILEPro64™ Tile processor with details of an individual tile’s
structure.
Figure 2-1. Tile Processor Hardware Architecture
Each tile is a powerful, full-featured computing system that can independently run an entire oper-
ating system, such as Linux. Each tile implements a 32-bit integer processor engine utilizing a
three-way Very Long Instruction Word (VLIW) architecture with its own program counter (PC),
cache, and DMA subsystem. An individual tile is capable of executing up to three operations per
cycle.
CDN
TDN
IDN
MDN
STN
UDN
1,1 6,1
3,2 4,2 5,2 6,2 7,2
XAUI
(10GbE)
TDN
IDN
MDN
STN
UDN
LEGEND:
Tile Detail
port2
msh0
port0
port2 port1 port0
DDR2
DDR2
port0
msh1
port2
port0 port1 port2
DDR2
DDR2
RGMII
(GbE)
XAUI
(10GbE)
FlexI/O
PCIe
(x4 lane)
I2C, JTAG,
HPI, UART,
SPI ROM
FlexI/O
PCIe
(x4 lane)
port1 port1
msh3 msh2
port2
msh0
port0
port2 port1 port0
port0
msh1
port2
port0 port1 port2
port1 port1
msh3 msh2
gpio1
port0
port1
port1
port0
port1
xgbe0
gbe0
xgbe1
port0
gpio1
port1
port0
port1
gbe1
port0
port1
xgbe0
xgbe1
port0
0,3 1,3 2,3 3,3 4,3 5,3 6,3 7,3
0,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5
0,6 1,6 2,6 3,6 4,6 5,6 6,6 7,6
0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7
7,00,0 1,0 2,0 3,0 4,0 5,0 6,0
0,1 1,1 6,12,1 3,1 4,1 5,1 7,1
3,2 4,2 5,2 6,2 7,20,2 1,2 2,2
0,4 1,4 2,4 3,4 4,4 5,4 6,4 7,4
port0
7,0
port0
pcie0
port0
port1
rshim0
gpio0
pcie1
port0
port1
pcie0
port0
port1
rshim0
gpio0
pcie1
port0
port1
Switch
Engine
Cache
Engine
Processor
Engine
U
D
N
S
T
N
M
D
N
I
D
N
T
D
N
C
D
N
U
D
N
S
T
N
M
D
N
I
D
N
T
D
N
C
D
N
STNSTN
TDNTDN
IDNIDN
MDNMDN
UDNUDN
CDNCDN
2010s: multi-core, GPGPU
Motivation: increased
performance without clock
rate increase or smaller
devices.
Requires changes in
programming paradigm.
NVIDIA GM204Tilera
Intel Xeon Phi
Knight’s landing
Proprietary and confidential. Do not distribute.ner va na
FPGA architectures
17
Altera Arria 10
Motivation: fine grained
parallelism, reconfigurable,
lots of IO, scalable.
Slow clock speed, lacks
compute density for
machine learning.
Proprietary and confidential. Do not distribute.ner va na
Neuromorphic architectures
18
IBM TrueNorth
dress for the target axon and
addresses representing core
ension to the target core). This
coded into a packet that is in-
entering spikes (Fig. 2I). Spikes leaving the mesh
are tagged with their row (for spikes traveling
east-west) or column (for spikes traveling north-
south) before being merged onto a shared link
ters (31,232 bits), destination addresses (6656
bits), and axonal delays (1024 bits). In terms of
efficiency, TrueNorth’s power density is 20 mW
per cm2
, whereas that of a typical central processing
Proprietary and confidential. Do not distribute.ner va na
Neural network parallelism
20
Data chunk 1 Data chunk n
…
Processor 1 Processor n
…
parameter server
Full deep
network on
each processor
Parameter coordination
Data parallelism Model parallelism
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
PCIE SW PCIE SW
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
PCIE SW PCIE SW
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
PCIE SW PCIE SW
G
P
U
G
P
U
G
P
U
G
P
U
PCIE SW
CPU
S
S
D
CPU
IB
10
G
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
PCIE SW PCIE SW
G
P
U
G
P
U
G
P
U
G
P
U
PCIE SW
CPU
S
S
D
CPU
IB
10
G
G
P
U
G
P
U
G
P
U
G
P
U
PCIE SW
Proprietary and confidential. Do not distribute.ner va na
Existing computing topologies are lacking
21
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
G
P
U
CPU
S
S
D
CPU
G
P
U
G
P
U
G
P
U
IB
10
G
PCIE SW PCIE SW
G
P
U
G
P
U
G
P
U
G
P
U
PCIE SW
CPU
S
S
D
CPU
IB
10
G
G
P
U
G
P
U
G
P
U
G
P
U
PCIE SW
Proprietary and confidential. Do not distribute.ner va na
nervana compute topology
22
CPU
CPU
S
S
D
IB
10
G
S
S
D
IB
10
G
nn
n n
nn
nn
PCIE SW
PCIE SW
Proprietary and confidential. Do not distribute.ner va na
Distributed linear algebra and convolution
23
02/27/2014! CS267 Lecture 12! 50!
52!
SUMMA – n x n matmul on P1/2 x P1/2 grid
•  C[i, j] is n/P1/2 x n/P1/2 submatrix of C on processor Pij!
•  A[i,k] is n/P1/2 x b submatrix of A!
•  B[k,j] is b x n/P1/2 submatrix of B !
•  C[i,j] = C[i,j] + Σk A[i,k]*B[k,j] !
•  summation over submatrices!
•  Need not be square processor grid !
* =
i"
j"
A[i,k]"
k"
k"
B[k,j]"
C[i,j]
02/27/2014! CS267 Lecture 12!
SUMMA distributed matrix multiply C=A*B
(Jim Demmel, CS267 lecture notes)
Matrix multiplication on multidimensional torus networks
Edgar Solomonik and James Demmel
Division of Computer Science
University of California at Berkeley, CA, USA
solomon@cs.berkeley.edu, demmel@cs.berkeley.edu
Abstract. Blocked matrix multiplication algorithms such as Cannon’s algorithm and SUMMA have
a 2-dimensional communication structure. We introduce a generalized ’Split-Dimensional’ version of
Cannon’s algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure.
This algorithm is useful for higher-dimensional torus interconnects that can achieve more injection
bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon
Proprietary and confidential. Do not distribute.ner va na
Summary
24
• Computers are tools for solving problems of their time
• Was: Coding, calculation, graphics, web
• Today: Learning and Inference on data
• Deep learning as a computational paradigm
• Custom architecture can do vastly better

More Related Content

What's hot

Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntel Nervana
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
Nervana Systems
Nervana SystemsNervana Systems
Nervana SystemsNand Dalal
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016Intel Nervana
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA Taiwan
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 

What's hot (20)

Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres Rodriguez
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
Nervana Systems
Nervana SystemsNervana Systems
Nervana Systems
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 

Viewers also liked

An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for InferenceIntel Nervana
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Intel Nervana
 
Google I/O 2016 Highlights That You Should Know
Google I/O 2016 Highlights That You Should KnowGoogle I/O 2016 Highlights That You Should Know
Google I/O 2016 Highlights That You Should KnowAppinventiv
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleIntel Nervana
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningIntel Nervana
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크지운 배
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)Amazon Web Services Korea
 
Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1NVIDIA Taiwan
 

Viewers also liked (14)

An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for Inference
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
Google I/O 2016 Highlights That You Should Know
Google I/O 2016 Highlights That You Should KnowGoogle I/O 2016 Highlights That You Should Know
Google I/O 2016 Highlights That You Should Know
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model Example
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
Region Of Interest Extraction
Region Of Interest ExtractionRegion Of Interest Extraction
Region Of Interest Extraction
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
 
Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1
 

Similar to Nervana and the Future of Computing

Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
Possibilities of generative models
Possibilities of generative modelsPossibilities of generative models
Possibilities of generative modelsAlison B. Lowndes
 
Gömülü Sistemlerde Derin Öğrenme Uygulamaları
Gömülü Sistemlerde Derin Öğrenme UygulamalarıGömülü Sistemlerde Derin Öğrenme Uygulamaları
Gömülü Sistemlerde Derin Öğrenme UygulamalarıFerhat Kurt
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Newprolab
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusDatabricks
 
QuAI platform
QuAI platformQuAI platform
QuAI platformTeddy Kuo
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsIndrajit Poddar
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergenceinside-BigData.com
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance
 

Similar to Nervana and the Future of Computing (20)

Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
Possibilities of generative models
Possibilities of generative modelsPossibilities of generative models
Possibilities of generative models
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
 
PowerDRC/LVS 2.0 Overview
PowerDRC/LVS 2.0 OverviewPowerDRC/LVS 2.0 Overview
PowerDRC/LVS 2.0 Overview
 
Gömülü Sistemlerde Derin Öğrenme Uygulamaları
Gömülü Sistemlerde Derin Öğrenme UygulamalarıGömülü Sistemlerde Derin Öğrenme Uygulamaları
Gömülü Sistemlerde Derin Öğrenme Uygulamaları
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
QuAI platform
QuAI platformQuAI platform
QuAI platform
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
OpenVINO introduction
OpenVINO introductionOpenVINO introduction
OpenVINO introduction
 

Recently uploaded

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

Nervana and the Future of Computing

  • 1. Proprietary and confidential. Do not distribute. Nervana and the Future of Computing 26 April 2016 Arjun Bansal Co-founder & VP Algorithms, Nervana MAKING MACHINES SMARTER.™
  • 2. Proprietary and confidential. Do not distribute. AI on demand using Deep Learning 2 DL Image Classification Object Localization Video Indexing Text Analysis Nervana Platform Machine Translation
  • 3. Proprietary and confidential. Do not distribute. Image classification and video activity detection 3 Deep learning model Potential applications • Trained on a public dataset1 of 13K videos in 100 categories • Training was approximately 3 times faster than competitive framework • Can be extended to perform scene and object detection, action similarity labeling, video retrieval, anomaly detection 1: UCF101 dataset: http://crcv.ucf.edu/data/UCF101.php • Activity detection and monitoring for security • Automatic editing of captured moments from video camera • Facial recognition and image based retrieval • Sense and avoid systems for autonomous driving • Baggage screening at airports and other public venueshttps://www.youtube.com/watch?v=ydnpgUOpdBw
  • 4. Proprietary and confidential. Do not distribute.ner va na Object localization and recognition 4
  • 5. Proprietary and confidential. Do not distribute.ner va na Speech to text 5 https://youtu.be/NaqZkV_fBIM
  • 6. Proprietary and confidential. Do not distribute.ner va na Question answering 6 Stories Mary journeyed to Texas. John went to Maryland. Mary went to Iowa. John travelled to Florida. Questions Answers Where is John located? Florida
  • 7. Proprietary and confidential. Do not distribute.ner va na Reinforcement learning 7 Pong Breakout https://youtu.be/KkIf0Ok5GCEhttps://youtu.be/0ZlgrQS3krg
  • 8. Proprietary and confidential. Do not distribute.ner va na Application areas 8 Healthcare Agriculture Finance Online Services Automotive Energy
  • 9. Proprietary and confidential. Do not distribute. Nervana is building the future of computing 9 The Economist, March 12, 2016 Cloud Computing Custom ASIC Deep Learning / AI
  • 10. Proprietary and confidential. Do not distribute.ner va na nervana cloud 10 Images Text Tabular Speech Time series Video Data import trainbuild deploy Cloud
  • 11. Proprietary and confidential. Do not distribute.ner va na nervana neon 11
  • 12. Proprietary and confidential. Do not distribute.ner va na nervana neon 11 • Fastest library
  • 13. Proprietary and confidential. Do not distribute.ner va na nervana neon 11 • Fastest library
  • 14. Proprietary and confidential. Do not distribute.ner va na nervana neon 11 • Fastest library • Model support Models • Convnet • RNN, LSTM • MLP • DQN • NTM Domains • Images • Video • Speech • Text • Time series
  • 15. Proprietary and confidential. Do not distribute.ner va na Running locally: % python rnn.py # or neon rnn.yaml Running in nervana cloud: % ncloud submit —py rnn.py # or —yaml rnn.yaml % ncloud show <model_id> % ncloud list % ncloud deploy <model_id> % ncloud predict <model_id> <data> # or use REST api nervana neon 11 • Fastest library • Model support • Cloud integration
  • 16. Proprietary and confidential. Do not distribute.ner va na Backends • CPU • GPU • Multiple GPUs • Parameter server • (Xeon Phi) • nervana TPU nervana neon 11 • Fastest library • Model support • Cloud integration • Multiple backends
  • 17. Proprietary and confidential. Do not distribute.ner va na nervana neon 11 • Fastest library • Model support • Cloud integration • Multiple backends • Optimized at assembler level
  • 18. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12
  • 19. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • Unprecedented compute density =1 nervana engine 10 GPUs 200 CPUs
  • 20. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • Unprecedented compute density • Scalable distributed architecture
  • 21. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • Unprecedented compute density • Scalable distributed architecture • Memory near computation Instruction and data memory Ctrl ALU CPU Data Memory Ctrl Nervana
  • 22. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • Unprecedented compute density • Scalable distributed architecture • Memory near computation • Learning and inference
  • 23. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • Unprecedented compute density • Scalable distributed architecture • Memory near computation • Learning and inference • Exploit limited precision
  • 24. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • Unprecedented compute density • Scalable distributed architecture • Memory near computation • Learning and inference • Exploit limited precision • Power efficiency
  • 25. Proprietary and confidential. Do not distribute.ner va na nervana tensor processing unit (TPU) 12 • 10-100x gain • Architecture optimized for • Unprecedented compute density • Scalable distributed architecture • Memory near computation • Learning and inference • Exploit limited precision • Power efficiency
  • 26. Proprietary and confidential. Do not distribute.ner va na Special purpose computation 13 1940s: Turing Bombe Motivation: Automating calculations, code breaking
  • 27. Proprietary and confidential. Do not distribute.ner va na General purpose computation 14 2000s: SoC Motivation: reduce power and cost, fungible computing. Enabled inexpensive mobile devices.
  • 28. Proprietary and confidential. Do not distribute.ner va na Dennard scaling has ended 15 What business and technology constraints do we have now?
  • 29. Proprietary and confidential. Do not distribute.ner va na Many-core tiled architectures 16 Tile Processor Architecture Overview for the TILEPro Series 5 and provides high bandwidth and extremely low latency communication among tiles. The Tile Processor™ integrates external memory and I/O interfaces on chip and is a complete programma- ble multicore processor. External memory and I/O interfaces are connected to the tiles via the iMesh interconnect. Figure 2-1 shows the 64-core TILEPro64™ Tile processor with details of an individual tile’s structure. Figure 2-1. Tile Processor Hardware Architecture Each tile is a powerful, full-featured computing system that can independently run an entire oper- ating system, such as Linux. Each tile implements a 32-bit integer processor engine utilizing a three-way Very Long Instruction Word (VLIW) architecture with its own program counter (PC), cache, and DMA subsystem. An individual tile is capable of executing up to three operations per cycle. CDN TDN IDN MDN STN UDN 1,1 6,1 3,2 4,2 5,2 6,2 7,2 XAUI (10GbE) TDN IDN MDN STN UDN LEGEND: Tile Detail port2 msh0 port0 port2 port1 port0 DDR2 DDR2 port0 msh1 port2 port0 port1 port2 DDR2 DDR2 RGMII (GbE) XAUI (10GbE) FlexI/O PCIe (x4 lane) I2C, JTAG, HPI, UART, SPI ROM FlexI/O PCIe (x4 lane) port1 port1 msh3 msh2 port2 msh0 port0 port2 port1 port0 port0 msh1 port2 port0 port1 port2 port1 port1 msh3 msh2 gpio1 port0 port1 port1 port0 port1 xgbe0 gbe0 xgbe1 port0 gpio1 port1 port0 port1 gbe1 port0 port1 xgbe0 xgbe1 port0 0,3 1,3 2,3 3,3 4,3 5,3 6,3 7,3 0,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5 0,6 1,6 2,6 3,6 4,6 5,6 6,6 7,6 0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 7,00,0 1,0 2,0 3,0 4,0 5,0 6,0 0,1 1,1 6,12,1 3,1 4,1 5,1 7,1 3,2 4,2 5,2 6,2 7,20,2 1,2 2,2 0,4 1,4 2,4 3,4 4,4 5,4 6,4 7,4 port0 7,0 port0 pcie0 port0 port1 rshim0 gpio0 pcie1 port0 port1 pcie0 port0 port1 rshim0 gpio0 pcie1 port0 port1 Switch Engine Cache Engine Processor Engine U D N S T N M D N I D N T D N C D N U D N S T N M D N I D N T D N C D N STNSTN TDNTDN IDNIDN MDNMDN UDNUDN CDNCDN 2010s: multi-core, GPGPU Motivation: increased performance without clock rate increase or smaller devices. Requires changes in programming paradigm. NVIDIA GM204Tilera Intel Xeon Phi Knight’s landing
  • 30. Proprietary and confidential. Do not distribute.ner va na FPGA architectures 17 Altera Arria 10 Motivation: fine grained parallelism, reconfigurable, lots of IO, scalable. Slow clock speed, lacks compute density for machine learning.
  • 31. Proprietary and confidential. Do not distribute.ner va na Neuromorphic architectures 18 IBM TrueNorth dress for the target axon and addresses representing core ension to the target core). This coded into a packet that is in- entering spikes (Fig. 2I). Spikes leaving the mesh are tagged with their row (for spikes traveling east-west) or column (for spikes traveling north- south) before being merged onto a shared link ters (31,232 bits), destination addresses (6656 bits), and axonal delays (1024 bits). In terms of efficiency, TrueNorth’s power density is 20 mW per cm2 , whereas that of a typical central processing
  • 32. Proprietary and confidential. Do not distribute.ner va na Neural network parallelism 20 Data chunk 1 Data chunk n … Processor 1 Processor n … parameter server Full deep network on each processor Parameter coordination Data parallelism Model parallelism
  • 33. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G
  • 34. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G
  • 35. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G G P U CPU S S D CPU G P U G P U G P U IB 10 G PCIE SW PCIE SW
  • 36. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G G P U CPU S S D CPU G P U G P U G P U IB 10 G PCIE SW PCIE SW
  • 37. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G G P U CPU S S D CPU G P U G P U G P U IB 10 G PCIE SW PCIE SW G P U G P U G P U G P U PCIE SW CPU S S D CPU IB 10 G
  • 38. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G G P U CPU S S D CPU G P U G P U G P U IB 10 G PCIE SW PCIE SW G P U G P U G P U G P U PCIE SW CPU S S D CPU IB 10 G G P U G P U G P U G P U PCIE SW
  • 39. Proprietary and confidential. Do not distribute.ner va na Existing computing topologies are lacking 21 G P U CPU S S D CPU G P U G P U G P U IB 10 G G P U CPU S S D CPU G P U G P U G P U IB 10 G PCIE SW PCIE SW G P U G P U G P U G P U PCIE SW CPU S S D CPU IB 10 G G P U G P U G P U G P U PCIE SW
  • 40. Proprietary and confidential. Do not distribute.ner va na nervana compute topology 22 CPU CPU S S D IB 10 G S S D IB 10 G nn n n nn nn PCIE SW PCIE SW
  • 41. Proprietary and confidential. Do not distribute.ner va na Distributed linear algebra and convolution 23 02/27/2014! CS267 Lecture 12! 50! 52! SUMMA – n x n matmul on P1/2 x P1/2 grid •  C[i, j] is n/P1/2 x n/P1/2 submatrix of C on processor Pij! •  A[i,k] is n/P1/2 x b submatrix of A! •  B[k,j] is b x n/P1/2 submatrix of B ! •  C[i,j] = C[i,j] + Σk A[i,k]*B[k,j] ! •  summation over submatrices! •  Need not be square processor grid ! * = i" j" A[i,k]" k" k" B[k,j]" C[i,j] 02/27/2014! CS267 Lecture 12! SUMMA distributed matrix multiply C=A*B (Jim Demmel, CS267 lecture notes) Matrix multiplication on multidimensional torus networks Edgar Solomonik and James Demmel Division of Computer Science University of California at Berkeley, CA, USA solomon@cs.berkeley.edu, demmel@cs.berkeley.edu Abstract. Blocked matrix multiplication algorithms such as Cannon’s algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized ’Split-Dimensional’ version of Cannon’s algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for higher-dimensional torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon
  • 42. Proprietary and confidential. Do not distribute.ner va na Summary 24 • Computers are tools for solving problems of their time • Was: Coding, calculation, graphics, web • Today: Learning and Inference on data • Deep learning as a computational paradigm • Custom architecture can do vastly better