Robert Sheen from HPE gave a presentation on machine learning applications and accelerating deep learning. He provided a quick introduction to neural networks, discussing their structure and how they are inspired by biological neurons. Deep learning requires high performance computing due to its computational intensity during training. Popular deep learning frameworks like CogX were also discussed, which provide tools and libraries to help build and optimize neural networks. Finally, several enterprise use cases for machine learning and deep learning were highlighted, such as in finance, healthcare, security, and geospatial applications.
3. 3
A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS
The (artificial) neuron.
Artificial Neural Networks
(ANNs) are inspired by
biological systems similar to
our brain
f(z)
xo
x1
x2
x3
x4
1
y1
Bias = threshold
Inputs
1
1,0w
1
2,0w
1
3,0w
1
0b
Weights
1
4,0w
1
5,0w 1
0
11
0 bxwz l
kk
l
jk
)( 1
0
1
0 zfa
NNs are made up of neurons, which are a mathematical
approximation to biological neurons
ReLU / SoftplusHyperbolic tangent
+1
-1
Logistic (sigma)
𝑎 𝑧 = tanh 𝑧
𝑎 𝑧 = max 0, 𝑧
𝑎 𝑧 = ln 1 + 𝑒 𝑧
𝑎 𝑧 =
1
1 + 𝑒−𝑧
𝑤ℎ𝑒𝑟𝑒 𝑧 =
𝑗
𝑤𝑗 𝑥𝑗 − 𝑏
Artificial Neural Networks
(ANNs) are inspired by
biological systems similar to
our brain
f(z)
xo
x1
x2
x3
x4
1
y1
Bias = threshold
Inputs
1
1,0w
1
2,0w
1
3,0w
1
0b
Weights
1
4,0w
1
5,0w 1
0
11
0 bxwz l
kk
l
jk
)( 1
0
1
0 zfa
NNs are made up of neurons, which are a mathematical
approximation to biological neurons
In a typical neuron the inputs (xn) are multiplied by weights
(𝑤𝑗𝑘
𝑙
) and then summed up ( 𝑤𝑗𝑘
𝑙
).
A non-linear activation function, 𝑓, is applied to the
summed and “thresholded” output (𝑧𝑖
𝑙
) using a non-linear
activation function, 𝑓 𝑧 .
This activation is the output of the neuron.
4. 4
A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS
To solve useful problems we have to connect multiple neurons together. The output from a neuron in one layer becomes
the input to neurons in the next layer.
Notice that the arrows go in one direction only. We will only be discussing “feed-forward” networks. There are others.
What is deep learning? It is essentially artificial neural networks consisting of many (>1) layers and a large number
of neurons (units). This is very computationally intensive and uses mathematical techniques typical of high
performance computing (matrix-matrix multiplies, vector operations, FFTs, convolutions) and requires HPC
hardware.
Training deep networks requires
high performance computing
hardware and techniques.
5. 5
A QUICK INTRODUCTION TO (DEEP) NEURAL NETWORKS
What do neural networks do?
They classify
-E.g., Given an image is it a bird, is it a cat? Is it Stephen Fleischman?
-Given an audio signal, what are the words. What do they mean?
-This requires a training data set with inputs and their classes.
-This is supervised learning and what we will focus on.
-They cluster
-Find groups of similar things.
-Does not require classified training sets.
-This is unsupervised learning.
-It is often used together with supervised learning.
MNIST handwriting recognition
data set for digits. Classify
each image as 0 .. 9.
6. 6
The most important networks that solve the ImageNet
challenge over the years are benchmarked.
Some of them are:
Alexnet (The original!)
VGG_A
Overfeat
Inception V1 (and now Inception V3!) (From Google)
The ImageNet dataset is a database of around 1.2 million annotated images.
The challenge is to train the neural network using a subset of the database and then attempt to classify all
the images in the dataset.
The industry standard parameter is the number of images per second that we can train.
Training time is forward + back propagation time of the network
Every year various teams compete to classify the ImageNet dataset in the “ImageNet Large Scale Visual
Recognition Challenge” (ILSVRC). The network that has the greatest accuracy wins.
Testing Performance
The ImageNet dataset and benchmark
7. 7
The most important networks that solve the ImageNet challenge over the years are benchmarked.
The classification accuracy has been improving year on year, so much that now it is better than humans!
Testing Performance
The ImageNet dataset and benchmark
Lowerisbetter
8. 8
Computers have to be explicitly programmed
Analyze the problem to be solved.
Write the code in a programming language.
Deductive reasoning
Instruction and PC
Neural networks learn from examples
No requirement of an explicit description of the problem.
The neural computer adapts itself during a training period, based on examples of similar problems
Able to generalize or to handle incomplete data.
Inductive reasoning
Works well with “natural” data (like speech, image etc.)
How does a Neural Network work?
A quick introduction to (Deep) Neural Networks
9. 9
Why is Deep Learning High Performance Computing?
DNNs are compute intensive and the training for a typical DNN application runs for weeks even on
modern hardware
Maps to BLAS functions like SGEMM, finding max/min, matrix inversions, FFTs etc.
Easily mapped to accelerators thus these applications becomes natural target for HPC platforms
Analysis shows that about 80% of time is spent in convolutions, which are basically SGEMM
computations
Recent developments in learning models have enhanced parallelism with both data and model
parallelisms
Recent advances with Nvidia libraries have supported multiple GPUs (1-8) in a single node
Known to scale well with scale-out configurations too.
A quick introduction to (Deep) Neural Networks
10. 10
Challenges in training deep neural networks
– Slow convergence with millions of weights / parameters.
– Activations saturate or explode.
– Depends on the function but result is that weights going into that neuron stop training.
– Vanishing gradient problem.
– Result of how we optimize the weights.
– Overfitting (or Overtraining)
- So many parameters you can easily train to fit the training data but then be completely unable to generalize.
– Achieving scalability in training is crucial but to do so on more than one GPU
For each of these challenges there are methods to ameliorate them. Depends on the problem and the
choices that you make in the activation function, the cost function, the number of layers, the number of
neurons, the types of layers etc.
These are the hyper-parameters of the neural network model and choosing them is currently 1) an art as
much as a science 2) an active area of research 3) a major factor in sizing the hardware for deep learning.
A quick introduction to (Deep) Neural Networks
11. 11
Getting training to scale
– Model parallelism
– Split the model (neural network) across GPUs and servers.
– Parallelizes well on a single GPU
– Up to 8 GPUs currently but some claims of better efficiency (Baidu).
– Multiple server is a problem.
– Data parallelism
– Gather scatter (SXM2)
– Split the training set across processing units and gather the updates. Requires peer to peer communication.
– Parameter servers (Master-Slave)
– Traditional manager/worker parallelism. Use the CPU to gather and dispatch the data. Not being used for much anyway. Need to store the
entire model on the GPU but no peer to peer communication.
– Hyper-parameters
– Figuring out the number of layers, number of neurons, training momentum can be done in parallels.
– Consensus
– Can have multiple neural networks training on the same data with different models and have them vote or otherwise combine their weights.
– Potentially more suitable for clusters of servers.
– Inference: Run it in parallel if you replicate the model.
A quick introduction to (Deep) Neural Networks
12. 12
• Domain-specific embedded language with associated optimizing compiler and runtime
• Array programming language embedded in a state machine execution model
• Targets advanced analytics workloads on massively parallel distributed systems
• Design Goals
– Optimal deployment on parallel hardware
– Fast design iterations
– Enforce scalability
– Broad COTS hardware support
– Compatible with shared infrastructure
– High productivity for analysts and algorithm engineers
What is CogX?
CogX
17. 17
User CogX
model
(scala)
parsing and
OpenCL code
generation
Kernel
circuit
(kernels,
field bufs)
Optimized
kernel
circuit
(merged
kernels)
optimizations,
including kernel
fusion
CogX code snippet
*
opencl
multiply
kernel
A
B
C
+
opencl
add
kernelD
E *+
fused
opencl
multiply/
add
kernel
A
D
EB
val A = ScalarField(10,10)
val B = ScalarField(10,10)
val C = A * B
val D = ScalarField(10,10)
val E = C + D
CogX compiler:
translating CogX to OpenCL with kernel fusion
22. 22
….But what about “enterprise-class” use cases?
Games
Chat bots (Cortana, Suri, Jarvis, etc.)
Intelligent Assistants (Siri, Alexa, etc)
Deep Learning Use Cases
The better-known, well publicized implementations..
Self-driving cars
23. 23
Finance Medicine E-Commerce
shoppers
Security
threats
AI-assisted trading,
beyond current
algorithmic trading
Rise of “AI Hedge Funds”
Healthcare
institutions use AI-
assisted diagnosis,
recommendations,
reduce human error
Agent and chatbots
provide product
recommendations,
“interacts” with
potential
Beyond facial
recognition,
understand “context”
of danger and flag
security
AI in the Enterprise
Deep Learning and Neural Networks for the mainstream?
24. 24
Social networking Geospatial
Yan LeCunn was hired by
Facebook, Geoff Hinton by Google
and Andrew Ng by Baidu.
Sentiment analysis.
Facial recognition.
Understanding text.
Image recognition.
High spatial resolution remote-
sensing (HSR-RS) images scene
classification (BoVWs)
Oil and Gas
Channel sands
identification.
Other seismic analysis.
AI in the Enterprise
Deep Learning and Neural Networks for the mainstream?
26. The 4 Stage IoT Solutions Architecture:
Primarily
analog data
sources
Devices,
machines,
people, tools,
cars, animals,
clothes, toys,
environment,
buildings, etc.
The “Things”
Data Flow:
TheEdge
Sensors/Actuators
(wired, wireless)
Internet Gateways,
Data Acquisition
Systems
(data aggregation, A/D,
measurement, control)
Edge IT
(analytics, pre-
processing)
Data Center / Cloud
(analytics,
management, archive)
Stage 1 Stage 2 Stage 3 Stage 4
Visualization
Control Flow:
SW Stacks:
Analytics
Management
Control
Analytics
Management
Control
Analytics
Management
Control
27. 27
Enable
workplace
productivity
Empower
a data-driven
organization
Transform
to a hybrid
infrastructure
Protect
your digital
enterprise
* Benchmarking results provided at or shortly after announcement
Use Cases Automated
Intelligence
delivered by HPE
Apollo 6500 and Deep
Learning software
Video, Image, Text,
Audio, time series
pattern recognition
solutions
Large, highly complex, Real-time, near
unstructured simulation real-time analytics
and modeling
Faster Model training time, better fusion of data*
Customer benefits
HPE Apollo 6500 is an ideal HPC and Deep Learning platform providing unprecedented performance with 8 GPUs, high bandwidth
fabric and a configurable GPU topology to match deep learning workloads
– Up to 8 high powered GPUs per tray (node), 2P Intel E5-2600 v4 support
– Choice of high-speed, low latency fabrics with 2x IO expansion
– Workload optimized using flexible configuration capabilities
Deliver automated intelligence in real-time
Unprecedented performance and scale with HPE Apollo 6500 high density GPU solution
28. Apollo 8000
Supercomputing
Apollo 6000
Rack Scale HPC
Apollo 4000
Server Solutions Purpose
Built for Big Data
Apollo 2000
Enterprise Bridge to
Scale-Out Compute
Big Data WorkloadsHPC Workloads
Mellanox NVIDIA Seagate
PlatformsSolutions/ISVs
HPE Apollo platforms and solutions are optimized for HPC, IoT and Big Data
Next Gen Workloads
Moonshot*
Optimized for Next Gen
Workloads
Video
encoding
Mobile
workplace
IoT
Oil and gas Life Sciences Financial
Services
Manufacturing
CAD/CAE
Academia Object
Storage
Data
Analytics
Scality
Cleversafe
Ceph
Hortonworks
Hadoop
Cloudera
Schlumberger
Paradigm
Halliburton
Gaussian
BIOVIA Redline
Synopsys
ANSYS Custom
Apps
28
HPE Software (i.e. Vertica, HPE Haven), HPE Enterprise Services
29. 29
HP APOLLO 6000 POWER SHELF
Pooled Power Efficiency
Efficiency
• External pooled power shelf
• Fits up to 6 power supplies
• 2400W or 2650W power supplies
• Up to 15.9kW non-redundant
• Single or 3-phased AC input
• Up to twelve 12V DC cables
1.5U
2.55”
17.64”
30.88”
Back View
Front View
1.5U (H) x 44.81cm (W) x 78.44cm
(D)
1.5U (H) x 17.64 in (W) x 30.88 in
(D)
30. 30
HPE Apollo 6500
– Dense GPU server optimized for Deep
Learning and HPC workloads
– Density optimization
– High performance fabrics
Cluster Management Enhancements
(Massive Scaling, Open APIs, tight Integration, multiple user
interfaces)
– GPU density
– Configurable GPU topologies
– More network bandwidth
– Power and cooling optimization
– Manageability
– Better productivity
New technologies, products
Unique
Solution differentiators
Deep Learning, HPC Software platform
Enablement
(HPE CCTK, Caffe, CUDA, Google TensorFlow, HPE IDOL)
HPE Apollo 6500 solution innovation
System Design Innovation to maximize GPU capacity and performance with lower TCO
31. 31
方案一 : 企業虛擬化首選 方案二 : 高效能運算首選
HPE Apollo 2000/XL190r 1 node
+ NVIDIA TeslaM60 *1
Apollo r2200 12LFF 或 r2600 24SFF
XL190r Gen9 規格 :
E5-2640v4*2/ 16GB*2/ 1TB*1/ 800W/
3yr Fndn Care 24*7 service NVIDIA
Tesla M60 Dual GPU*1
HPE Apollo 2000/XL190r 1 node
+ NVIDIA TeslaK80 *1
Apollo r2200 12LFF 或 r2600 24SFF
XL190r Gen9 規格 :
E5-2640v4*2/ 16GB*2/ 1TB*1/ 800W/
3yr Fndn Care 24*7 service NVIDIA
Tesla K80 Dual GPU*1
限時限量優惠組合
最強組合
密度最佳的 HPE 伺服器再加 NVIDIA GPU 給你最強大組合
單一 2U 機箱最大可擴至 2 台 HPE Apollo 系統伺服器及
4 張 NVIDIA 高效運算加速卡
Apollo 2000+ NVIDIA GPU 促銷方案
NT$360,000(未稅價) 起 NT$360,000(未稅價) 起
※ 活動截止日期 : 2016 / 12 / 31 如對產品有興趣請撥打:(02)2652-4040 本號碼僅限台灣區使用