NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
3. 3
THE BIG BANG IN MACHINE LEARNING
DNN GPUBIG DATA
100 hours of video
uploaded every
minute
350 millions
images uploaded
per day
2.5 Petabytes of
customer data
hourly
0.0
0.5
1.0
1.5
2.0
2.5
3.0
2008 2009 2010 2011 2012 2013 2014
NVIDIA GPU x86 CPU
TFLOPS
4. 4
BIG DATA & ANALYTICS
AUTOMOTIVE
Auto sensors reporting
location, problems
COMMUNICATIONS
Location-based advertising
CONSUMER PACKAGED GOODS
Sentiment analysis of
what’s hot, problems
$
FINANCIAL SERVICES
Risk & portfolio analysis
New products
EDUCATION & RESEARCH
Experiment sensor analysis
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg. quality
Warranty analysis
LIFE SCIENCES
Clinical trials
MEDIA/ENTERTAINMENT
Viewers / advertising
effectiveness
ON-LINE SERVICES /
SOCIAL MEDIA
People & career matching
HEALTH CARE
Patient sensors,
monitoring, EHRs
OIL & GAS
Drilling exploration sensor
analysis
RETAIL
Consumer sentiment
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
UTILITIES
Smart Meter analysis
for network capacity,
LAW ENFORCEMENT
& DEFENSE
Threat analysis - social media
monitoring, photo analysis
5. 5
EXPONENTIAL DATA GROWTH
INCREASING DATA VARIETY
Search
Marketing
Behavioral
Targeting
Dynamic
Funnels
User
Generated
Content
Mobile Web
SMS/MMS
Sentiment
HD Video
Speech To
Text
Product/
Service Logs
Social
Network
Business
Data Feeds
User Click
Stream
Sensors Infotainment
Systems
Wearable
Devices
Cyber
Security Logs
Connected
Vehicles
Machine
Data
IoT Data
Dynamic
Pricing
Payment
Record
Purchase
Detail
Purchase
Record
Support
Contacts
Segmentation
Offer
Details
Web
Logs
Offer
History
A/B
Testing
BUSINESS
PROCESS
PETABYTESTERABYTESGIGABYTESEXABYTESZETTABYTES
Streaming
Video
Natural
Language
Processing
WEB
DIGITAL
AI
90% of the world’s
data created in the
last year - IBM
7. 7
WHAT IS DEEP LEARNING?
ARTIFICAL
INTELLIGENCE MACHINE
LEARNING
DEEP LEARNINGPerception
Reasoning
Planning
Optimization
Computational
Statistics
Supervised and
Unsupervised Learning
Neural networks
Distributed Representations
Hierarchical Explanatory Factors
Unsupervised Feature Engineering
8. 8
DEEP LEARNING FUELING DISCOVERY
Classify Satellite Images for
Carbon Monitoring
Analyze Obituaries on the Web for
Cancer-related Discoveries
Determine Drug Treatments to Increase
Child’s Chance of Survival
NASA AMES
9. 9
DEEP LEARNING FOR EVERY APPLICATION
Visual search for
e-commerce
Visual Search in
Geoinformatics
Improving Agriculture:
LettuceBot only
sprays weeds
13. 13
MORE THAN 1,500 AI START UPS
AROUND THE WORLD
Deep Learning
for Art
Deep Learning for
Cybersecurity
Deep Learning for
Genomics
Deep Learning for
Self-Driving Cars
14. 14
IMAGENET CHALLENGE
Where it all started … again
bird
frog
person
hammer
flower pot
power drill
person
car
helmet
motorcycle
person
dog
chair
1.2M training images • 1000 object categories
Challenge
15. 15
ACHIEVING SUPERHUMAN PERFORMANCE
2012: Deep Learning
researchers
worldwide discover GPUs
2016: Microsoft achieves
speech recognition
milestone
2015: ImageNet — Deep
Learning achieves
superhuman image
recognition
16. 16
DEEP LEARNING ADOPTION IS EXPONENTIAL
# of Organizations Using Deep Learning
Source: Jeff Dean, Spark Summit 2016
17. 17
MASSIVE COMPUTING CHALLENGE
SPEECH RECOGNITION
2014
Deep Speech 1
80 GFLOP
7,000 hrs of Data
~8% Error
465 GFLOP
12,000 hrs of
Data
~5% Error
2015
Deep Speech 2
10X
Training Ops
IMAGE RECOGNITION
2012
AlexNet
8 Layers
1.4 GFLOP
~16% Error
152 Layers
22.6 GFLOP
~3.5% Error
2015
ResNet
16X
Model
18. 18
Device
NVIDIA DEEP LEARNING PLATFORM
TRAINING
DIGITS Training System
Deep Learning Frameworks
Tesla P100, DGX1
DATACENTER INFERENCING
DeepStream SDK
TensorRT
Tesla P40 & P4
19. 19
Device
NVIDIA DEEP LEARNING PLATFORM
TRAINING DATACENTER INFERENCING
Training: comparing to Kepler GPU in 2013 using Caffe, Inference: comparing img/sec/watt to CPU: Intel E5-2697v4 using AlexNet
65Xin 3 years
Tesla P100
40Xvs CPU
Tesla P4
20. 20
40x Efficient vs CPU, 8x Efficient vs FPGA
0
50
100
150
200
AlexNet
CPU FPGA 1x M4 (FP32) 1x P4 (INT8)
Images/Sec/Watt
Maximum Efficiency for Scale-out Servers
TESLA P4
5.5 TFLOPS
0
20,000
40,000
60,000
80,000
100,000
GoogLeNet AlexNet
8x M40 (FP32) 8x P40 (INT8)TESLA P40
Highest Throughput for Scale-up Servers
Images/Sec
4x Boost in Less than One Year
21. 21
INTRODUCING TESLA P100
Page Migration Engine
Virtually Unlimited Memory
CoWoS HBM2
3D Stacked Memory (i.e fast!)
NVLink
GPU Interconnect for
Maximum Scalability
23. 23
Instant productivity — plug-and-
play, supports every AI framework
Performance optimized across
the entire stack
Always up-to-date via the cloud
Mixed framework environments
—containerized
Direct access to NVIDIA experts
DGX STACK
Fully integrated Deep Learning platform
24. 24
NVIDIA POWERS DEEP LEARNING
Every major DL framework leverages NVIDIA SDKs
Mocha.jl
NVIDIA DEEP LEARNING SDK
COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING
OBJECT
DETECTION
IMAGE
CLASSIFICATION
VOICE
RECOGNITION
LANGUAGE
TRANSLATION
RECOMMENDATION
ENGINES
SENTIMENT
ANALYSIS
25. 25
NVIDIA DIGITS
Interactive Deep Learning GPU Training System
Interactive deep neural network development
environment for image classification and object
detection
Schedule, monitor, and manage neural network training
jobs
Analyze accuracy and loss in real time
Track datasets, results, and trained neural networks
Scale training jobs across multiple GPUs automatically
26. 26
NVIDIA cuDNN
Accelerating Deep Learning
High performance building blocks for deep learning
frameworks
Drop-in acceleration for widely used deep learning
frameworks such as Caffe, CNTK, Tensorflow, Theano,
Torch and others
Accelerates industry vetted deep learning algorithms, such
as convolutions, LSTM, fully connected, and pooling layers
Fast deep learning training performance tuned for NVIDIA
GPUs
Deep Learning Training Performance
Caffe AlexNet
Speed-upofImages/SecvsK40in2013
K40 K80 +
cuDN…
M40 +
cuDNN4
P100 +
cuDNN5
0x
10x
20x
30x
40x
50x
60x
70x
80x
“ NVIDIA has improved the speed of cuDNN
with each release while extending the
interface to more operations and devices
at the same time.”
— Evan Shelhamer, Lead Caffe Developer, UC Berkeley
AlexNet training throughput on CPU: 1x E5-2680v3 12 Core 2.5GHz.
128GB System Memory, Ubuntu 14.04
M40 bar: 8x M40 GPUs in a node, P100: 8x P100 NVLink-enabled
27. 27
0 50 100 150 200 250 300
P40
P4
1x CPU (14 cores)
Inference Execution Time (ms)
11 ms
6 ms
User Experience: Instant Response
45x Faster with Pascal + TensorRT
Faster, more responsive AI-powered services such as voice recognition, speech translation
Efficient inference on images, video, & other data in hyperscale production data centers
INTRODUCING NVIDIA TensorRT
High Performance Inference Engine
260 ms
Training
Device
Datacenter
28. 28
NVIDIA DEEPSTREAM SDK
Delivering Video Analytics at Scale
Inference
Preprocess
Hardware
Decode
“Boy playing soccer”
Simple, high performance API for analyzing video
Decode H.264, HEVC, MPEG-2, MPEG-4, VP9
CUDA-optimized resize and scale
TensorRT
0
20
40
60
80
100
1x Tesla P4 Server +
DeepStream SDK
13x E5-2650 v4 Servers
ConcurrentVideoStreams
Concurrent Video Streams Analyzed
29. 29
“Billions of intelligent devices will take advantage of deep learning to provide
personalization and localization as GPUs become faster and faster over the next
several years.” — Tractica
BILLIONS OF INTELLIGENT DEVICES
30. 30
SMART CITIES OF THE FUTURE
“Pittsburgh's "predictive policing" program … police car laptops will display maps
showing locations where crime is likely to occur, based on data-crunching
algorithms developed by scientists at Carnegie Mellon University — Science
32. 32
GPU-ACCELERATION HAS NO LIMITS
MapD
MapD is 55x to 1,000x faster than
comparable CPU databases on billion+
row datasets
Kinetica
Hardware costs that are 1⁄10 that of
standard in-memory databases
BlazeGraph
200-300x speed-up
Graphistry
See 100x more data at millisecond
speed
SQream
The supercomputing powers of the GPU combined with SQream’s patented
technology, results in up to 100 times faster analytics performance on terabyte-
petabyte scale data sets
33. 33
MASSIVE SCALE GPU ACCELERATED ANALYTICS
DEA theft of Silk Road bitcoinsSIEM attack escalationTwitter botnet deconstruction