SlideShare a Scribd company logo
1 of 32
Download to read offline
TAIPEI | SEP. 21-22, 2016
Robert Sheen
HPE APJeC Principle Solution Architect
Sep 21, 2016
A PLATFORM FOR ACCELERATING
MACHINE LEARNING APPLICATIONS
2
WHAT CONFUSION! ARTIFICIAL INTELLIGENCE …
MACHINE LEARNING … NEURAL NETWORKS … DEEP LEARNING
3
A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS
The (artificial) neuron.
Artificial Neural Networks
(ANNs) are inspired by
biological systems similar to
our brain
f(z)
xo
x1
x2
x3
x4
1
y1
Bias = threshold
Inputs
1
1,0w
1
2,0w
1
3,0w
1
0b
Weights
1
4,0w
1
5,0w 1
0
11
0 bxwz l
kk
l
jk  

)( 1
0
1
0 zfa 
NNs are made up of neurons, which are a mathematical
approximation to biological neurons
ReLU / SoftplusHyperbolic tangent
+1
-1
Logistic (sigma)
𝑎 𝑧 = tanh 𝑧
𝑎 𝑧 = max 0, 𝑧
𝑎 𝑧 = ln 1 + 𝑒 𝑧
𝑎 𝑧 =
1
1 + 𝑒−𝑧
𝑤ℎ𝑒𝑟𝑒 𝑧 =
𝑗
𝑤𝑗 𝑥𝑗 − 𝑏
Artificial Neural Networks
(ANNs) are inspired by
biological systems similar to
our brain
f(z)
xo
x1
x2
x3
x4
1
y1
Bias = threshold
Inputs
1
1,0w
1
2,0w
1
3,0w
1
0b
Weights
1
4,0w
1
5,0w 1
0
11
0 bxwz l
kk
l
jk  

)( 1
0
1
0 zfa 
NNs are made up of neurons, which are a mathematical
approximation to biological neurons
In a typical neuron the inputs (xn) are multiplied by weights
(𝑤𝑗𝑘
𝑙
) and then summed up ( 𝑤𝑗𝑘
𝑙
).
A non-linear activation function, 𝑓, is applied to the
summed and “thresholded” output (𝑧𝑖
𝑙
) using a non-linear
activation function, 𝑓 𝑧 .
This activation is the output of the neuron.
4
A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS
To solve useful problems we have to connect multiple neurons together. The output from a neuron in one layer becomes
the input to neurons in the next layer.
Notice that the arrows go in one direction only. We will only be discussing “feed-forward” networks. There are others.
What is deep learning? It is essentially artificial neural networks consisting of many (>1) layers and a large number
of neurons (units). This is very computationally intensive and uses mathematical techniques typical of high
performance computing (matrix-matrix multiplies, vector operations, FFTs, convolutions) and requires HPC
hardware.
Training deep networks requires
high performance computing
hardware and techniques.
5
A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS
What do neural networks do?
They classify
-E.g., Given an image is it a bird, is it a cat? Is it Stephen Fleischman?
-Given an audio signal, what are the words. What do they mean?
-This requires a training data set with inputs and their classes.
-This is supervised learning and what we will focus on.
-They cluster
-Find groups of similar things.
-Does not require classified training sets.
-This is unsupervised learning.
-It is often used together with supervised learning.
MNIST handwriting recognition
data set for digits. Classify
each image as 0 .. 9.
6
 The most important networks that solve the ImageNet
challenge over the years are benchmarked.
 Some of them are:
 Alexnet (The original!)
 VGG_A
 Overfeat
 Inception V1 (and now Inception V3!) (From Google)
 The ImageNet dataset is a database of around 1.2 million annotated images.
 The challenge is to train the neural network using a subset of the database and then attempt to classify all
the images in the dataset.
 The industry standard parameter is the number of images per second that we can train.
 Training time is forward + back propagation time of the network
 Every year various teams compete to classify the ImageNet dataset in the “ImageNet Large Scale Visual
Recognition Challenge” (ILSVRC). The network that has the greatest accuracy wins.
Testing Performance
The ImageNet dataset and benchmark
7
 The most important networks that solve the ImageNet challenge over the years are benchmarked.
 The classification accuracy has been improving year on year, so much that now it is better than humans!
Testing Performance
The ImageNet dataset and benchmark
Lowerisbetter
8
Computers have to be explicitly programmed
Analyze the problem to be solved.
Write the code in a programming language.
Deductive reasoning
Instruction and PC
Neural networks learn from examples
No requirement of an explicit description of the problem.
The neural computer adapts itself during a training period, based on examples of similar problems
Able to generalize or to handle incomplete data.
Inductive reasoning
Works well with “natural” data (like speech, image etc.)
How does a Neural Network work?
A quick introduction to (Deep) Neural Networks
9
Why is Deep Learning High Performance Computing?
DNNs are compute intensive and the training for a typical DNN application runs for weeks even on
modern hardware
Maps to BLAS functions like SGEMM, finding max/min, matrix inversions, FFTs etc.
Easily mapped to accelerators thus these applications becomes natural target for HPC platforms
Analysis shows that about 80% of time is spent in convolutions, which are basically SGEMM
computations
Recent developments in learning models have enhanced parallelism with both data and model
parallelisms
Recent advances with Nvidia libraries have supported multiple GPUs (1-8) in a single node
Known to scale well with scale-out configurations too.
A quick introduction to (Deep) Neural Networks
10
Challenges in training deep neural networks
– Slow convergence with millions of weights / parameters.
– Activations saturate or explode.
– Depends on the function but result is that weights going into that neuron stop training.
– Vanishing gradient problem.
– Result of how we optimize the weights.
– Overfitting (or Overtraining)
- So many parameters you can easily train to fit the training data but then be completely unable to generalize.
– Achieving scalability in training is crucial but to do so on more than one GPU
For each of these challenges there are methods to ameliorate them. Depends on the problem and the
choices that you make in the activation function, the cost function, the number of layers, the number of
neurons, the types of layers etc.
These are the hyper-parameters of the neural network model and choosing them is currently 1) an art as
much as a science 2) an active area of research 3) a major factor in sizing the hardware for deep learning.
A quick introduction to (Deep) Neural Networks
11
Getting training to scale
– Model parallelism
– Split the model (neural network) across GPUs and servers.
– Parallelizes well on a single GPU
– Up to 8 GPUs currently but some claims of better efficiency (Baidu).
– Multiple server is a problem.
– Data parallelism
– Gather scatter (SXM2)
– Split the training set across processing units and gather the updates. Requires peer to peer communication.
– Parameter servers (Master-Slave)
– Traditional manager/worker parallelism. Use the CPU to gather and dispatch the data. Not being used for much anyway. Need to store the
entire model on the GPU but no peer to peer communication.
– Hyper-parameters
– Figuring out the number of layers, number of neurons, training momentum can be done in parallels.
– Consensus
– Can have multiple neural networks training on the same data with different models and have them vote or otherwise combine their weights.
– Potentially more suitable for clusters of servers.
– Inference: Run it in parallel if you replicate the model.
A quick introduction to (Deep) Neural Networks
12
• Domain-specific embedded language with associated optimizing compiler and runtime
• Array programming language embedded in a state machine execution model
• Targets advanced analytics workloads on massively parallel distributed systems
• Design Goals
– Optimal deployment on parallel hardware
– Fast design iterations
– Enforce scalability
– Broad COTS hardware support
– Compatible with shared infrastructure
– High productivity for analysts and algorithm engineers
What is CogX?
CogX
13
Compute graph
moviet
backgroundt +*0.999f
*0.001f
nextBackgroundt backgroundt+1
- abs
reduce
Sum
suspicioust
ColorMovie
Opportunities for optimization
14
Compute graph
moviet
backgroundt
nextBackgroundt backgroundt+1
suspicioust
ColorMovie
*0.001f
*0.999f +
- Abs reduce
Sum
device
kernel
Opportunities for optimization
Initially: 6 separate devie kernels.
15
Compute graph
moviet
backgroundt +*0.999f
*0.001f
nextBackgroundt backgroundt+1
- abs
reduce
Sum
suspicioust
ColorMovie
device
kernel
Opportunities for optimization
After a “single-output” kernel fuser pass: 2 device kernels remain.
16
Compute graph
moviet
backgroundt +*0.999f
*0.001f
nextBackgroundt backgroundt+1
- abs
reduce
Sum
suspicioust
ColorMovie
device
kernel
Opportunities for optimization
After a “multi-output” kernel fuser pass: only a single device kernel remains
17
User CogX
model
(scala)
parsing and
OpenCL code
generation
Kernel
circuit
(kernels,
field bufs)
Optimized
kernel
circuit
(merged
kernels)
optimizations,
including kernel
fusion
CogX code snippet
*
opencl
multiply
kernel
A
B
C
+
opencl
add
kernelD
E *+
fused
opencl
multiply/
add
kernel
A
D
EB
val A = ScalarField(10,10)
val B = ScalarField(10,10)
val C = A * B
val D = ScalarField(10,10)
val E = C + D
CogX compiler:
translating CogX to OpenCL with kernel fusion
18
• Basic operators • FFT/DCT • Type coercion
• +, -, *, /, % • fft, fftInverse • toScalarField, toVectorField
• Logical operators • fftRI, fftInverseRI • toMatrixField, toComplexField
• >, >=, <, <=, ===, !=== • fftRows, fftInverseRows • toComplexVectorField, toColorField
• Pointwise functions • fftColumns, fftInverseColumns • toGenericComplexField
• cos, cosh, acos • dct, dctInverse, dctTransposed • Type construction
• sin, sinh, asin • dctInverseTransposed • complex, polarComplex
• tan, tanh, atan2 • Complex numbers • vectorField, complexVectorField
• sq, sqrt, log, signum • phase, magnitude, conjugate • matrixField, colorField
• pow, reciprocal • realPart, imaginaryPart • Reductions
• exp, abs, floor • Convolution-like • reduceSum, blockReduceSum
• Comparison functions • crossCorrelate, • reduceMin, blockReduceMin
• max, min crossCorrelateSeparable • reduceMax, blockReduceMax
• Shape manipulation • convolve, convolveSeparable • fieldReduceMax, fieldReduceMin
• flip, shift, shiftCyclic • projectFrame, backProjectFrame • fieldReduceSum, fieldReduceMedian
• transpose, subfield • crossCorrelateFilterAdjoint • Normalizations
• expand, select, stack • convolveFilterAdjoint • normalizeL1, normalizeL2
• matrixRow, reshape • Gradient/divergence • Resampling
• subfields, trim • backwardDivergence • supersample, downsample, upsample
• vectorElement, vectorElements • backwardGradient • Special operators
• transposeMatrices • centralGradient • winnerTakeAll
• transposeVectors • forwardGradient • random
• replicate, slice • Linear algebra • solve
• dot, crossDot • transform
• reverseCrossDot • warp
• Debugging • <==
• probe
CogX core functions and operators
19
• Computer Vision
• Annotation tools
• Color space transformations
• Polynomial dense optic flow
• Segmentation
• Solvers
• Boundary-gated nonlinear
diffusion
• FISTAsolver (with sub-
variants)
• Golden section solver
• Incremental k-means
implementation
• LSQR solver (with sub-
variants)
• Poisson solver (with sub-
variants)
• Filtering
• Contourlets
• 4 frequency-domain filters
• Mathematical morphology
operators
• 27 space-domain filters (from
a simple box filter up to local
polynomial expansion and
steerable Gabor filters)
• Steerable pyramid filter
• Wavelets
• Variants of whitening
transforms
• Contrast normalization
• Domain transfer filter
• Gaussian pyramid
• Monogenic phase
congruency
• Dynamical Systems
• Kalman filter
• Linear system modeling
support
• CPU matrix pseudo-
inverse
• Statistics
• Normal and uniform
distributions
• Histograms
• Moment calculations
• Pseudo-random number
generator sensors
CogX toolkit functions
20
Application
CogX debugger
CogX compiler and standard library
Neural network
toolkit
Sandbox toolkitI/O toolkit
Scala CogX runtime C++ CogX runtime
HDF5 loader JOCL
HDF5 OpenCL HDF5
CogX core
External
libraries
CogX
libraries/toolkit
Cluster package
Apache Mesos
Applications are written by users
– Introductory and training examples for single-GPU and distributed computation
– Performance benchmarks covering the core and neural network package
– Several larger-scale demo applications integrating multiple CogX functions
HPE Cognitive Computing Toolkit
http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php?searchByKeyword=S6772&searchItems=&sessionTopic=&sessionEvent=&sessionYear=&sessionFormat=&submit=&select=
21
SOME MACHINE LEARNING APPLICATIONS
21
22
….But what about “enterprise-class” use cases?
Games
Chat bots (Cortana, Suri, Jarvis, etc.)
Intelligent Assistants (Siri, Alexa, etc)
Deep Learning Use Cases
The better-known, well publicized implementations..
Self-driving cars
23
Finance Medicine E-Commerce
shoppers
Security
threats
AI-assisted trading,
beyond current
algorithmic trading
Rise of “AI Hedge Funds”
Healthcare
institutions use AI-
assisted diagnosis,
recommendations,
reduce human error
Agent and chatbots
provide product
recommendations,
“interacts” with
potential
Beyond facial
recognition,
understand “context”
of danger and flag
security
AI in the Enterprise
Deep Learning and Neural Networks for the mainstream?
24
Social networking Geospatial
Yan LeCunn was hired by
Facebook, Geoff Hinton by Google
and Andrew Ng by Baidu.
Sentiment analysis.
Facial recognition.
Understanding text.
Image recognition.
High spatial resolution remote-
sensing (HSR-RS) images scene
classification (BoVWs)
Oil and Gas
Channel sands
identification.
Other seismic analysis.
AI in the Enterprise
Deep Learning and Neural Networks for the mainstream?
25
Self-driving cars
Deep neural networks are being used to understand the scene in self-driving cars!
The 4 Stage IoT Solutions Architecture:
Primarily
analog data
sources
Devices,
machines,
people, tools,
cars, animals,
clothes, toys,
environment,
buildings, etc.
The “Things”
Data Flow:
TheEdge
Sensors/Actuators
(wired, wireless)
Internet Gateways,
Data Acquisition
Systems
(data aggregation, A/D,
measurement, control)
Edge IT
(analytics, pre-
processing)
Data Center / Cloud
(analytics,
management, archive)
Stage 1 Stage 2 Stage 3 Stage 4
Visualization
Control Flow:
SW Stacks:
Analytics
Management
Control
Analytics
Management
Control
Analytics
Management
Control
27
Enable
workplace
productivity
Empower
a data-driven
organization
Transform
to a hybrid
infrastructure
Protect
your digital
enterprise
* Benchmarking results provided at or shortly after announcement
Use Cases Automated
Intelligence
delivered by HPE
Apollo 6500 and Deep
Learning software
Video, Image, Text,
Audio, time series
pattern recognition
solutions
Large, highly complex, Real-time, near
unstructured simulation real-time analytics
and modeling
Faster Model training time, better fusion of data*
Customer benefits
HPE Apollo 6500 is an ideal HPC and Deep Learning platform providing unprecedented performance with 8 GPUs, high bandwidth
fabric and a configurable GPU topology to match deep learning workloads
– Up to 8 high powered GPUs per tray (node), 2P Intel E5-2600 v4 support
– Choice of high-speed, low latency fabrics with 2x IO expansion
– Workload optimized using flexible configuration capabilities
Deliver automated intelligence in real-time
Unprecedented performance and scale with HPE Apollo 6500 high density GPU solution
Apollo 8000
Supercomputing
Apollo 6000
Rack Scale HPC
Apollo 4000
Server Solutions Purpose
Built for Big Data
Apollo 2000
Enterprise Bridge to
Scale-Out Compute
Big Data WorkloadsHPC Workloads
Mellanox NVIDIA Seagate
PlatformsSolutions/ISVs
HPE Apollo platforms and solutions are optimized for HPC, IoT and Big Data
Next Gen Workloads
Moonshot*
Optimized for Next Gen
Workloads
Video
encoding
Mobile
workplace
IoT
Oil and gas Life Sciences Financial
Services
Manufacturing
CAD/CAE
Academia Object
Storage
Data
Analytics
Scality
Cleversafe
Ceph
Hortonworks
Hadoop
Cloudera
Schlumberger
Paradigm
Halliburton
Gaussian
BIOVIA Redline
Synopsys
ANSYS Custom
Apps
28
HPE Software (i.e. Vertica, HPE Haven), HPE Enterprise Services
29
HP APOLLO 6000 POWER SHELF
Pooled Power Efficiency
Efficiency
• External pooled power shelf
• Fits up to 6 power supplies
• 2400W or 2650W power supplies
• Up to 15.9kW non-redundant
• Single or 3-phased AC input
• Up to twelve 12V DC cables
1.5U
2.55”
17.64”
30.88”
Back View
Front View
1.5U (H) x 44.81cm (W) x 78.44cm
(D)
1.5U (H) x 17.64 in (W) x 30.88 in
(D)
30
HPE Apollo 6500
– Dense GPU server optimized for Deep
Learning and HPC workloads
– Density optimization
– High performance fabrics
Cluster Management Enhancements
(Massive Scaling, Open APIs, tight Integration, multiple user
interfaces)
– GPU density
– Configurable GPU topologies
– More network bandwidth
– Power and cooling optimization
– Manageability
– Better productivity
New technologies, products
Unique
Solution differentiators
Deep Learning, HPC Software platform
Enablement
(HPE CCTK, Caffe, CUDA, Google TensorFlow, HPE IDOL)
HPE Apollo 6500 solution innovation
System Design Innovation to maximize GPU capacity and performance with lower TCO
31
方案一 : 企業虛擬化首選 方案二 : 高效能運算首選
HPE Apollo 2000/XL190r 1 node
+ NVIDIA TeslaM60 *1
Apollo r2200 12LFF 或 r2600 24SFF
XL190r Gen9 規格 :
E5-2640v4*2/ 16GB*2/ 1TB*1/ 800W/
3yr Fndn Care 24*7 service NVIDIA
Tesla M60 Dual GPU*1
HPE Apollo 2000/XL190r 1 node
+ NVIDIA TeslaK80 *1
Apollo r2200 12LFF 或 r2600 24SFF
XL190r Gen9 規格 :
E5-2640v4*2/ 16GB*2/ 1TB*1/ 800W/
3yr Fndn Care 24*7 service NVIDIA
Tesla K80 Dual GPU*1
限時限量優惠組合
最強組合
密度最佳的 HPE 伺服器再加 NVIDIA GPU 給你最強大組合
單一 2U 機箱最大可擴至 2 台 HPE Apollo 系統伺服器及
4 張 NVIDIA 高效運算加速卡
Apollo 2000+ NVIDIA GPU 促銷方案
NT$360,000(未稅價) 起 NT$360,000(未稅價) 起
※ 活動截止日期 : 2016 / 12 / 31 如對產品有興趣請撥打:(02)2652-4040 本號碼僅限台灣區使用
TAIPEI | SEP. 21-22, 2016
THANK YOU

More Related Content

What's hot

Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Intel® Software
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonIntel Nervana
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnnNAVER D2
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
 

What's hot (20)

Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in Boston
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural Networks
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
 

Viewers also liked

Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1NVIDIA Taiwan
 
The Birth of Doraemon
The Birth of DoraemonThe Birth of Doraemon
The Birth of DoraemonNVIDIA Taiwan
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer VisionNVIDIA Taiwan
 
Medical Image Processing on NVIDIA TK1/TX1
Medical Image Processing on NVIDIA TK1/TX1Medical Image Processing on NVIDIA TK1/TX1
Medical Image Processing on NVIDIA TK1/TX1NVIDIA Taiwan
 
全面保護企業的關鍵智慧資產
全面保護企業的關鍵智慧資產全面保護企業的關鍵智慧資產
全面保護企業的關鍵智慧資產NVIDIA Taiwan
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用NVIDIA Taiwan
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeNVIDIA Taiwan
 
How to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyHow to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyNVIDIA Taiwan
 
高效益、設計專利保護 如何達成雙贏?
高效益、設計專利保護 如何達成雙贏?高效益、設計專利保護 如何達成雙贏?
高效益、設計專利保護 如何達成雙贏?NVIDIA Taiwan
 
麗明營造 NVIDIA 使用成效分享
麗明營造 NVIDIA 使用成效分享麗明營造 NVIDIA 使用成效分享
麗明營造 NVIDIA 使用成效分享NVIDIA Taiwan
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation OverviewNVIDIA Taiwan
 
以深度學習加速語音及影像辨識應用發展
以深度學習加速語音及影像辨識應用發展以深度學習加速語音及影像辨識應用發展
以深度學習加速語音及影像辨識應用發展NVIDIA Taiwan
 
“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景
“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景
“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景NVIDIA Taiwan
 
東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享
 東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享 東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享
東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享NVIDIA Taiwan
 
Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017NVIDIA
 
Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...
Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...
Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...NVIDIA Taiwan
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
IEEE ITSS Nagoya Chapter NVIDIA
IEEE ITSS Nagoya Chapter NVIDIAIEEE ITSS Nagoya Chapter NVIDIA
IEEE ITSS Nagoya Chapter NVIDIATak Izaki
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009Randall Hand
 

Viewers also liked (20)

Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1Aeroprobing A.I. Drone with TX1
Aeroprobing A.I. Drone with TX1
 
The Birth of Doraemon
The Birth of DoraemonThe Birth of Doraemon
The Birth of Doraemon
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer Vision
 
Medical Image Processing on NVIDIA TK1/TX1
Medical Image Processing on NVIDIA TK1/TX1Medical Image Processing on NVIDIA TK1/TX1
Medical Image Processing on NVIDIA TK1/TX1
 
全面保護企業的關鍵智慧資產
全面保護企業的關鍵智慧資產全面保護企業的關鍵智慧資產
全面保護企業的關鍵智慧資產
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
How to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyHow to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR Ready
 
高效益、設計專利保護 如何達成雙贏?
高效益、設計專利保護 如何達成雙贏?高效益、設計專利保護 如何達成雙贏?
高效益、設計專利保護 如何達成雙贏?
 
麗明營造 NVIDIA 使用成效分享
麗明營造 NVIDIA 使用成效分享麗明營造 NVIDIA 使用成效分享
麗明營造 NVIDIA 使用成效分享
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation Overview
 
以深度學習加速語音及影像辨識應用發展
以深度學習加速語音及影像辨識應用發展以深度學習加速語音及影像辨識應用發展
以深度學習加速語音及影像辨識應用發展
 
“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景
“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景
“樓下的房客”以數位特效技術 打造寫實近代台灣風格街景
 
東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享
 東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享 東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享
東海大學使用 NVIDIA Quadro & GRID 技術在教育雲端創新服務的經驗分享
 
Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017Artificial Intelligence: Predictions for 2017
Artificial Intelligence: Predictions for 2017
 
Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...
Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...
Future of Making Things in Media & Entertainment FOMT - Design Visualisation ...
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
IEEE ITSS Nagoya Chapter NVIDIA
IEEE ITSS Nagoya Chapter NVIDIAIEEE ITSS Nagoya Chapter NVIDIA
IEEE ITSS Nagoya Chapter NVIDIA
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 

Similar to A Platform for Accelerating Machine Learning Applications

State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...inside-BigData.com
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 
Final training course
Final training courseFinal training course
Final training courseNoor Dhiya
 
Neurosynaptic chips
Neurosynaptic chipsNeurosynaptic chips
Neurosynaptic chipsJeffrey Funk
 
11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptxSaloniMalhotra23
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with RPoo Kuan Hoong
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 

Similar to A Platform for Accelerating Machine Learning Applications (20)

State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Final training course
Final training courseFinal training course
Final training course
 
Neurosynaptic chips
Neurosynaptic chipsNeurosynaptic chips
Neurosynaptic chips
 
11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 

More from NVIDIA Taiwan

GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說NVIDIA Taiwan
 
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統NVIDIA Taiwan
 
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷NVIDIA Taiwan
 
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發 GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發 NVIDIA Taiwan
 
GTC Taiwan 2017 人工智慧:保險科技的未來
GTC Taiwan 2017 人工智慧:保險科技的未來GTC Taiwan 2017 人工智慧:保險科技的未來
GTC Taiwan 2017 人工智慧:保險科技的未來NVIDIA Taiwan
 
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道NVIDIA Taiwan
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心NVIDIA Taiwan
 
GTC Taiwan 2017 用計算來凝視複雜的世界
GTC Taiwan 2017 用計算來凝視複雜的世界 GTC Taiwan 2017 用計算來凝視複雜的世界
GTC Taiwan 2017 用計算來凝視複雜的世界 NVIDIA Taiwan
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化NVIDIA Taiwan
 
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗NVIDIA Taiwan
 
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享NVIDIA Taiwan
 
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用NVIDIA Taiwan
 
GTC Taiwan 2017 結合智能視覺系統之機械手臂
GTC Taiwan 2017 結合智能視覺系統之機械手臂GTC Taiwan 2017 結合智能視覺系統之機械手臂
GTC Taiwan 2017 結合智能視覺系統之機械手臂NVIDIA Taiwan
 
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化NVIDIA Taiwan
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用NVIDIA Taiwan
 
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用NVIDIA Taiwan
 
GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用NVIDIA Taiwan
 
GTC Taiwan 2017 應用智慧科技於傳染病防治
GTC Taiwan 2017 應用智慧科技於傳染病防治GTC Taiwan 2017 應用智慧科技於傳染病防治
GTC Taiwan 2017 應用智慧科技於傳染病防治NVIDIA Taiwan
 
NVIDIA深度學習教育機構 (DLI): Deep Learning Institute
NVIDIA深度學習教育機構 (DLI): Deep Learning InstituteNVIDIA深度學習教育機構 (DLI): Deep Learning Institute
NVIDIA深度學習教育機構 (DLI): Deep Learning InstituteNVIDIA Taiwan
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA Taiwan
 

More from NVIDIA Taiwan (20)

GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說
 
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
GTC Taiwan 2017 基於 CNN 對易混淆中藥的手機辨識系統
 
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
GTC Taiwan 2017 CUDA 加速先進影像分析技術與深度學習於臨床電腦斷層掃瞄肝細胞腫瘤輔助診斷
 
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發 GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
 
GTC Taiwan 2017 人工智慧:保險科技的未來
GTC Taiwan 2017 人工智慧:保險科技的未來GTC Taiwan 2017 人工智慧:保險科技的未來
GTC Taiwan 2017 人工智慧:保險科技的未來
 
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
GTC Taiwan 2017 從雲端到終端的瓶頸及解決之道
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
 
GTC Taiwan 2017 用計算來凝視複雜的世界
GTC Taiwan 2017 用計算來凝視複雜的世界 GTC Taiwan 2017 用計算來凝視複雜的世界
GTC Taiwan 2017 用計算來凝視複雜的世界
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
GTC Taiwan 2017 NVIDIA VRWorks SDK 加速性能與提升 VR 使用經驗
 
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
GTC Taiwan 2017 NVIDIA Holodeck 與 Isaac VR 技術分享
 
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
GTC Taiwan 2017 深度學習於表面瑕疵檢測之應用
 
GTC Taiwan 2017 結合智能視覺系統之機械手臂
GTC Taiwan 2017 結合智能視覺系統之機械手臂GTC Taiwan 2017 結合智能視覺系統之機械手臂
GTC Taiwan 2017 結合智能視覺系統之機械手臂
 
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
GTC Taiwan 2017 以雲端 GPU 將傳統硬體人工智慧化
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
 
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
GTC Taiwan 2017 深度學習與該技術於視訊監控產業上之應用
 
GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用
 
GTC Taiwan 2017 應用智慧科技於傳染病防治
GTC Taiwan 2017 應用智慧科技於傳染病防治GTC Taiwan 2017 應用智慧科技於傳染病防治
GTC Taiwan 2017 應用智慧科技於傳染病防治
 
NVIDIA深度學習教育機構 (DLI): Deep Learning Institute
NVIDIA深度學習教育機構 (DLI): Deep Learning InstituteNVIDIA深度學習教育機構 (DLI): Deep Learning Institute
NVIDIA深度學習教育機構 (DLI): Deep Learning Institute
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

A Platform for Accelerating Machine Learning Applications

  • 1. TAIPEI | SEP. 21-22, 2016 Robert Sheen HPE APJeC Principle Solution Architect Sep 21, 2016 A PLATFORM FOR ACCELERATING MACHINE LEARNING APPLICATIONS
  • 2. 2 WHAT CONFUSION! ARTIFICIAL INTELLIGENCE … MACHINE LEARNING … NEURAL NETWORKS … DEEP LEARNING
  • 3. 3 A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS The (artificial) neuron. Artificial Neural Networks (ANNs) are inspired by biological systems similar to our brain f(z) xo x1 x2 x3 x4 1 y1 Bias = threshold Inputs 1 1,0w 1 2,0w 1 3,0w 1 0b Weights 1 4,0w 1 5,0w 1 0 11 0 bxwz l kk l jk    )( 1 0 1 0 zfa  NNs are made up of neurons, which are a mathematical approximation to biological neurons ReLU / SoftplusHyperbolic tangent +1 -1 Logistic (sigma) 𝑎 𝑧 = tanh 𝑧 𝑎 𝑧 = max 0, 𝑧 𝑎 𝑧 = ln 1 + 𝑒 𝑧 𝑎 𝑧 = 1 1 + 𝑒−𝑧 𝑤ℎ𝑒𝑟𝑒 𝑧 = 𝑗 𝑤𝑗 𝑥𝑗 − 𝑏 Artificial Neural Networks (ANNs) are inspired by biological systems similar to our brain f(z) xo x1 x2 x3 x4 1 y1 Bias = threshold Inputs 1 1,0w 1 2,0w 1 3,0w 1 0b Weights 1 4,0w 1 5,0w 1 0 11 0 bxwz l kk l jk    )( 1 0 1 0 zfa  NNs are made up of neurons, which are a mathematical approximation to biological neurons In a typical neuron the inputs (xn) are multiplied by weights (𝑤𝑗𝑘 𝑙 ) and then summed up ( 𝑤𝑗𝑘 𝑙 ). A non-linear activation function, 𝑓, is applied to the summed and “thresholded” output (𝑧𝑖 𝑙 ) using a non-linear activation function, 𝑓 𝑧 . This activation is the output of the neuron.
  • 4. 4 A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS To solve useful problems we have to connect multiple neurons together. The output from a neuron in one layer becomes the input to neurons in the next layer. Notice that the arrows go in one direction only. We will only be discussing “feed-forward” networks. There are others. What is deep learning? It is essentially artificial neural networks consisting of many (>1) layers and a large number of neurons (units). This is very computationally intensive and uses mathematical techniques typical of high performance computing (matrix-matrix multiplies, vector operations, FFTs, convolutions) and requires HPC hardware. Training deep networks requires high performance computing hardware and techniques.
  • 5. 5 A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS What do neural networks do? They classify -E.g., Given an image is it a bird, is it a cat? Is it Stephen Fleischman? -Given an audio signal, what are the words. What do they mean? -This requires a training data set with inputs and their classes. -This is supervised learning and what we will focus on. -They cluster -Find groups of similar things. -Does not require classified training sets. -This is unsupervised learning. -It is often used together with supervised learning. MNIST handwriting recognition data set for digits. Classify each image as 0 .. 9.
  • 6. 6  The most important networks that solve the ImageNet challenge over the years are benchmarked.  Some of them are:  Alexnet (The original!)  VGG_A  Overfeat  Inception V1 (and now Inception V3!) (From Google)  The ImageNet dataset is a database of around 1.2 million annotated images.  The challenge is to train the neural network using a subset of the database and then attempt to classify all the images in the dataset.  The industry standard parameter is the number of images per second that we can train.  Training time is forward + back propagation time of the network  Every year various teams compete to classify the ImageNet dataset in the “ImageNet Large Scale Visual Recognition Challenge” (ILSVRC). The network that has the greatest accuracy wins. Testing Performance The ImageNet dataset and benchmark
  • 7. 7  The most important networks that solve the ImageNet challenge over the years are benchmarked.  The classification accuracy has been improving year on year, so much that now it is better than humans! Testing Performance The ImageNet dataset and benchmark Lowerisbetter
  • 8. 8 Computers have to be explicitly programmed Analyze the problem to be solved. Write the code in a programming language. Deductive reasoning Instruction and PC Neural networks learn from examples No requirement of an explicit description of the problem. The neural computer adapts itself during a training period, based on examples of similar problems Able to generalize or to handle incomplete data. Inductive reasoning Works well with “natural” data (like speech, image etc.) How does a Neural Network work? A quick introduction to (Deep) Neural Networks
  • 9. 9 Why is Deep Learning High Performance Computing? DNNs are compute intensive and the training for a typical DNN application runs for weeks even on modern hardware Maps to BLAS functions like SGEMM, finding max/min, matrix inversions, FFTs etc. Easily mapped to accelerators thus these applications becomes natural target for HPC platforms Analysis shows that about 80% of time is spent in convolutions, which are basically SGEMM computations Recent developments in learning models have enhanced parallelism with both data and model parallelisms Recent advances with Nvidia libraries have supported multiple GPUs (1-8) in a single node Known to scale well with scale-out configurations too. A quick introduction to (Deep) Neural Networks
  • 10. 10 Challenges in training deep neural networks – Slow convergence with millions of weights / parameters. – Activations saturate or explode. – Depends on the function but result is that weights going into that neuron stop training. – Vanishing gradient problem. – Result of how we optimize the weights. – Overfitting (or Overtraining) - So many parameters you can easily train to fit the training data but then be completely unable to generalize. – Achieving scalability in training is crucial but to do so on more than one GPU For each of these challenges there are methods to ameliorate them. Depends on the problem and the choices that you make in the activation function, the cost function, the number of layers, the number of neurons, the types of layers etc. These are the hyper-parameters of the neural network model and choosing them is currently 1) an art as much as a science 2) an active area of research 3) a major factor in sizing the hardware for deep learning. A quick introduction to (Deep) Neural Networks
  • 11. 11 Getting training to scale – Model parallelism – Split the model (neural network) across GPUs and servers. – Parallelizes well on a single GPU – Up to 8 GPUs currently but some claims of better efficiency (Baidu). – Multiple server is a problem. – Data parallelism – Gather scatter (SXM2) – Split the training set across processing units and gather the updates. Requires peer to peer communication. – Parameter servers (Master-Slave) – Traditional manager/worker parallelism. Use the CPU to gather and dispatch the data. Not being used for much anyway. Need to store the entire model on the GPU but no peer to peer communication. – Hyper-parameters – Figuring out the number of layers, number of neurons, training momentum can be done in parallels. – Consensus – Can have multiple neural networks training on the same data with different models and have them vote or otherwise combine their weights. – Potentially more suitable for clusters of servers. – Inference: Run it in parallel if you replicate the model. A quick introduction to (Deep) Neural Networks
  • 12. 12 • Domain-specific embedded language with associated optimizing compiler and runtime • Array programming language embedded in a state machine execution model • Targets advanced analytics workloads on massively parallel distributed systems • Design Goals – Optimal deployment on parallel hardware – Fast design iterations – Enforce scalability – Broad COTS hardware support – Compatible with shared infrastructure – High productivity for analysts and algorithm engineers What is CogX? CogX
  • 13. 13 Compute graph moviet backgroundt +*0.999f *0.001f nextBackgroundt backgroundt+1 - abs reduce Sum suspicioust ColorMovie Opportunities for optimization
  • 14. 14 Compute graph moviet backgroundt nextBackgroundt backgroundt+1 suspicioust ColorMovie *0.001f *0.999f + - Abs reduce Sum device kernel Opportunities for optimization Initially: 6 separate devie kernels.
  • 15. 15 Compute graph moviet backgroundt +*0.999f *0.001f nextBackgroundt backgroundt+1 - abs reduce Sum suspicioust ColorMovie device kernel Opportunities for optimization After a “single-output” kernel fuser pass: 2 device kernels remain.
  • 16. 16 Compute graph moviet backgroundt +*0.999f *0.001f nextBackgroundt backgroundt+1 - abs reduce Sum suspicioust ColorMovie device kernel Opportunities for optimization After a “multi-output” kernel fuser pass: only a single device kernel remains
  • 17. 17 User CogX model (scala) parsing and OpenCL code generation Kernel circuit (kernels, field bufs) Optimized kernel circuit (merged kernels) optimizations, including kernel fusion CogX code snippet * opencl multiply kernel A B C + opencl add kernelD E *+ fused opencl multiply/ add kernel A D EB val A = ScalarField(10,10) val B = ScalarField(10,10) val C = A * B val D = ScalarField(10,10) val E = C + D CogX compiler: translating CogX to OpenCL with kernel fusion
  • 18. 18 • Basic operators • FFT/DCT • Type coercion • +, -, *, /, % • fft, fftInverse • toScalarField, toVectorField • Logical operators • fftRI, fftInverseRI • toMatrixField, toComplexField • >, >=, <, <=, ===, !=== • fftRows, fftInverseRows • toComplexVectorField, toColorField • Pointwise functions • fftColumns, fftInverseColumns • toGenericComplexField • cos, cosh, acos • dct, dctInverse, dctTransposed • Type construction • sin, sinh, asin • dctInverseTransposed • complex, polarComplex • tan, tanh, atan2 • Complex numbers • vectorField, complexVectorField • sq, sqrt, log, signum • phase, magnitude, conjugate • matrixField, colorField • pow, reciprocal • realPart, imaginaryPart • Reductions • exp, abs, floor • Convolution-like • reduceSum, blockReduceSum • Comparison functions • crossCorrelate, • reduceMin, blockReduceMin • max, min crossCorrelateSeparable • reduceMax, blockReduceMax • Shape manipulation • convolve, convolveSeparable • fieldReduceMax, fieldReduceMin • flip, shift, shiftCyclic • projectFrame, backProjectFrame • fieldReduceSum, fieldReduceMedian • transpose, subfield • crossCorrelateFilterAdjoint • Normalizations • expand, select, stack • convolveFilterAdjoint • normalizeL1, normalizeL2 • matrixRow, reshape • Gradient/divergence • Resampling • subfields, trim • backwardDivergence • supersample, downsample, upsample • vectorElement, vectorElements • backwardGradient • Special operators • transposeMatrices • centralGradient • winnerTakeAll • transposeVectors • forwardGradient • random • replicate, slice • Linear algebra • solve • dot, crossDot • transform • reverseCrossDot • warp • Debugging • <== • probe CogX core functions and operators
  • 19. 19 • Computer Vision • Annotation tools • Color space transformations • Polynomial dense optic flow • Segmentation • Solvers • Boundary-gated nonlinear diffusion • FISTAsolver (with sub- variants) • Golden section solver • Incremental k-means implementation • LSQR solver (with sub- variants) • Poisson solver (with sub- variants) • Filtering • Contourlets • 4 frequency-domain filters • Mathematical morphology operators • 27 space-domain filters (from a simple box filter up to local polynomial expansion and steerable Gabor filters) • Steerable pyramid filter • Wavelets • Variants of whitening transforms • Contrast normalization • Domain transfer filter • Gaussian pyramid • Monogenic phase congruency • Dynamical Systems • Kalman filter • Linear system modeling support • CPU matrix pseudo- inverse • Statistics • Normal and uniform distributions • Histograms • Moment calculations • Pseudo-random number generator sensors CogX toolkit functions
  • 20. 20 Application CogX debugger CogX compiler and standard library Neural network toolkit Sandbox toolkitI/O toolkit Scala CogX runtime C++ CogX runtime HDF5 loader JOCL HDF5 OpenCL HDF5 CogX core External libraries CogX libraries/toolkit Cluster package Apache Mesos Applications are written by users – Introductory and training examples for single-GPU and distributed computation – Performance benchmarks covering the core and neural network package – Several larger-scale demo applications integrating multiple CogX functions HPE Cognitive Computing Toolkit http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php?searchByKeyword=S6772&searchItems=&sessionTopic=&sessionEvent=&sessionYear=&sessionFormat=&submit=&select=
  • 21. 21 SOME MACHINE LEARNING APPLICATIONS 21
  • 22. 22 ….But what about “enterprise-class” use cases? Games Chat bots (Cortana, Suri, Jarvis, etc.) Intelligent Assistants (Siri, Alexa, etc) Deep Learning Use Cases The better-known, well publicized implementations.. Self-driving cars
  • 23. 23 Finance Medicine E-Commerce shoppers Security threats AI-assisted trading, beyond current algorithmic trading Rise of “AI Hedge Funds” Healthcare institutions use AI- assisted diagnosis, recommendations, reduce human error Agent and chatbots provide product recommendations, “interacts” with potential Beyond facial recognition, understand “context” of danger and flag security AI in the Enterprise Deep Learning and Neural Networks for the mainstream?
  • 24. 24 Social networking Geospatial Yan LeCunn was hired by Facebook, Geoff Hinton by Google and Andrew Ng by Baidu. Sentiment analysis. Facial recognition. Understanding text. Image recognition. High spatial resolution remote- sensing (HSR-RS) images scene classification (BoVWs) Oil and Gas Channel sands identification. Other seismic analysis. AI in the Enterprise Deep Learning and Neural Networks for the mainstream?
  • 25. 25 Self-driving cars Deep neural networks are being used to understand the scene in self-driving cars!
  • 26. The 4 Stage IoT Solutions Architecture: Primarily analog data sources Devices, machines, people, tools, cars, animals, clothes, toys, environment, buildings, etc. The “Things” Data Flow: TheEdge Sensors/Actuators (wired, wireless) Internet Gateways, Data Acquisition Systems (data aggregation, A/D, measurement, control) Edge IT (analytics, pre- processing) Data Center / Cloud (analytics, management, archive) Stage 1 Stage 2 Stage 3 Stage 4 Visualization Control Flow: SW Stacks: Analytics Management Control Analytics Management Control Analytics Management Control
  • 27. 27 Enable workplace productivity Empower a data-driven organization Transform to a hybrid infrastructure Protect your digital enterprise * Benchmarking results provided at or shortly after announcement Use Cases Automated Intelligence delivered by HPE Apollo 6500 and Deep Learning software Video, Image, Text, Audio, time series pattern recognition solutions Large, highly complex, Real-time, near unstructured simulation real-time analytics and modeling Faster Model training time, better fusion of data* Customer benefits HPE Apollo 6500 is an ideal HPC and Deep Learning platform providing unprecedented performance with 8 GPUs, high bandwidth fabric and a configurable GPU topology to match deep learning workloads – Up to 8 high powered GPUs per tray (node), 2P Intel E5-2600 v4 support – Choice of high-speed, low latency fabrics with 2x IO expansion – Workload optimized using flexible configuration capabilities Deliver automated intelligence in real-time Unprecedented performance and scale with HPE Apollo 6500 high density GPU solution
  • 28. Apollo 8000 Supercomputing Apollo 6000 Rack Scale HPC Apollo 4000 Server Solutions Purpose Built for Big Data Apollo 2000 Enterprise Bridge to Scale-Out Compute Big Data WorkloadsHPC Workloads Mellanox NVIDIA Seagate PlatformsSolutions/ISVs HPE Apollo platforms and solutions are optimized for HPC, IoT and Big Data Next Gen Workloads Moonshot* Optimized for Next Gen Workloads Video encoding Mobile workplace IoT Oil and gas Life Sciences Financial Services Manufacturing CAD/CAE Academia Object Storage Data Analytics Scality Cleversafe Ceph Hortonworks Hadoop Cloudera Schlumberger Paradigm Halliburton Gaussian BIOVIA Redline Synopsys ANSYS Custom Apps 28 HPE Software (i.e. Vertica, HPE Haven), HPE Enterprise Services
  • 29. 29 HP APOLLO 6000 POWER SHELF Pooled Power Efficiency Efficiency • External pooled power shelf • Fits up to 6 power supplies • 2400W or 2650W power supplies • Up to 15.9kW non-redundant • Single or 3-phased AC input • Up to twelve 12V DC cables 1.5U 2.55” 17.64” 30.88” Back View Front View 1.5U (H) x 44.81cm (W) x 78.44cm (D) 1.5U (H) x 17.64 in (W) x 30.88 in (D)
  • 30. 30 HPE Apollo 6500 – Dense GPU server optimized for Deep Learning and HPC workloads – Density optimization – High performance fabrics Cluster Management Enhancements (Massive Scaling, Open APIs, tight Integration, multiple user interfaces) – GPU density – Configurable GPU topologies – More network bandwidth – Power and cooling optimization – Manageability – Better productivity New technologies, products Unique Solution differentiators Deep Learning, HPC Software platform Enablement (HPE CCTK, Caffe, CUDA, Google TensorFlow, HPE IDOL) HPE Apollo 6500 solution innovation System Design Innovation to maximize GPU capacity and performance with lower TCO
  • 31. 31 方案一 : 企業虛擬化首選 方案二 : 高效能運算首選 HPE Apollo 2000/XL190r 1 node + NVIDIA TeslaM60 *1 Apollo r2200 12LFF 或 r2600 24SFF XL190r Gen9 規格 : E5-2640v4*2/ 16GB*2/ 1TB*1/ 800W/ 3yr Fndn Care 24*7 service NVIDIA Tesla M60 Dual GPU*1 HPE Apollo 2000/XL190r 1 node + NVIDIA TeslaK80 *1 Apollo r2200 12LFF 或 r2600 24SFF XL190r Gen9 規格 : E5-2640v4*2/ 16GB*2/ 1TB*1/ 800W/ 3yr Fndn Care 24*7 service NVIDIA Tesla K80 Dual GPU*1 限時限量優惠組合 最強組合 密度最佳的 HPE 伺服器再加 NVIDIA GPU 給你最強大組合 單一 2U 機箱最大可擴至 2 台 HPE Apollo 系統伺服器及 4 張 NVIDIA 高效運算加速卡 Apollo 2000+ NVIDIA GPU 促銷方案 NT$360,000(未稅價) 起 NT$360,000(未稅價) 起 ※ 活動截止日期 : 2016 / 12 / 31 如對產品有興趣請撥打:(02)2652-4040 本號碼僅限台灣區使用
  • 32. TAIPEI | SEP. 21-22, 2016 THANK YOU