SlideShare a Scribd company logo
1 of 47
© 2018 IBM Corporation
조규남 /mystous@
Samsung Research
삼성전자
Deep Learning
남들보다 100배 빠르게
(High Performance Computing for AI)
© 2018 IBM Corporation Page
mystous@kyunam.com:~$ who am i
2
– Principal Software Engineer @ Samsung Electronics
– C언어 pointer 이해한지 22년째…
– About this topic
Possibility of HPC application on Cloud infrastructure by container cluster
(ACM Symposium On Applied Computing, 2019 – Submission)
Parallel Eigenvalue Computation for very Large Scale Symmetric Matrix
(2016년 Master Thesis)
Time-efficient simulations of tight-binding electronic structures with
Intel Xeon PhiTM many-core processors
(Computer Physics Communications 209권, 2016 – 2저자)
Design of Efficient Light-Absorption Layers with Earth-Abundant Materials:
A Tight-Binding Study on Inter-Band Transition Rate of Si:P Quantum Dots
(MRS fall 2015)
인텔 제온 파이를 활용한 푸아송 방정식 풀이의 병렬화
(한국정보처리학회 2015년 추계학술발표대회)
한국 슈퍼컴퓨팅 프로그래밍 경진대회 우수상(2015)
© 2018 IBM Corporation Page
What is High Performance?
3
High
Performance
High
Throughput
Low
Latency
High
Availability
Resource
Demand
Resource
Management
Resource
Arbitration
Increase
Computation
Efficiency
Reduce
Computation
Overhead
Manage
event rate
Control
frequency of
sampling
Introduce
Concurrency
Maintain
Multiple
Copies
Increase
Available
Resource
Scheduling
Policy
BASS, Len; CLEMENTS, Paul; KAZMAN, Rick. Software architecture in practice. Addison-Wesley Professional, 2003.
© 2018 IBM Corporation Page© 2018 IBM Corporation 4Page
AI saga
From the Deep Neural Network
ⓒ Bikingdog@Wikimedia Commons
© 2018 IBM Corporation Page
Era of Artificial intelligence
Domain 영역의 변화
5
AI의 모든 영역 적용
: AI 전 영역 확대 및 적용, 영역별 솔루션 Convergence
영역별 전문가의 시대
: HPC, End user app, AI 등 영역별로 별도의 기술 발전
High Performance Computing
Application Service Artificial intelligence
ⓒ Kamran kowsari@Wikimedia Commons
Application Disaster recoveryHPC Medical Transportation
© 2018 IBM Corporation Page
Journey of machine learning training
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
6
© 2018 IBM Corporation Page
Journey of machine learning training
From where to where is Lifecycle?
Where are bottleneck points?
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
7
© 2018 IBM Corporation Page
Journey of machine learning training
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
Bottleneck & Lifecycle
8
AI Engineer
© 2018 IBM Corporation Page
Trend in Machine Learning (Deep Neural Network)
Record numbers of 25 Open dataset
9
Analytics Vidhya web page(https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/) 재구성
Category Dataset Compressed Data Size Number of Records Type of Record
Image
MNIST 50 70,000 Image
MS-COCO O 25,000
330,000 Image
250,000 key point
ImageNet 150,000 1,500,000 Image
Open Images Dataset O 500,000 9,011,219 Image
VisualQA O 250,000 265,016 Image
The Street View House Numbers
(SVHN)
2,500 630,420 Image
CIFAR-10 170 60,000 Image
Fashion-MNIST 30 70,000 Image
Audio/Speech
Free Spoken Digit Dataset 10 1,500 Audio
Free Music Archive (FMA) 1,000,000 100,000 Audio
Ballroom O 14,000 700 Audio
Million Song Dataset 280,000 1,000,000 Audio
LibriSpeechv 60,000 1,000 Hours
VoxCeleb 150 100,000 Utterance
Category Dataset Compressed Data Size Number of Records Type of Record
Natural Language
Processing
IMDB Reviews 80 25,000 Article
Twenty Newsgroups 20 20,000 Message
Sentiment140 O 80 1,600,000 Tweet
WordNet 10 117,000 Synset
Yelp Reviews O
2,660 (JSON) 5,200,000 Article
2,900 (SQL) 174,000
Business
attributes
7,500 (Photos) 200,000 Image
The Wikipedia Corpus 20 4,400,000 Article
The Blog Authorship Corpus 300 140,000,000 Word
Machine Translation of
Various Languages
15,000 30,000,000 Sentence
X 70,000
MNIST
X 1,500,000
ImageNet
X 9,011,219
Open Images Dataset
X 117,000
WordNet
X 5,200,000
Yelp Review (JSON)
X 140,000,000
The Blog Authorship Corpus
© 2018 IBM Corporation Page
Trend in Machine Learning (Deep Neural Network)
Open dataset record number
10
Category Dataset Compressed Data Size Number of Records Type of Record
Image
MNIST 50 70,000 Image
MS-COCO O 25,000
330,000 Image
250,000 key point
ImageNet 150,000 1,500,000 Image
Open Images Dataset O 500,000 9,011,219 Image
VisualQA O 250,000 265,016 Image
The Street View House Numbers
(SVHN)
2,500 630,420 Image
CIFAR-10 170 60,000 Image
Fashion-MNIST 30 70,000 Image
Audio/Speech
Free Spoken Digit Dataset 10 1,500 Audio
Free Music Archive (FMA) 1,000,000 100,000 Audio
Ballroom O 14,000 700 Audio
Million Song Dataset 280,000 1,000,000 Audio
LibriSpeechv 60,000 1,000 Hours
VoxCeleb 150 100,000 Utterance
Category Dataset Compressed Data Size Number of Records Type of Record
Natural Language
Processing
IMDB Reviews 80 25,000 Article
Twenty Newsgroups 20 20,000 Message
Sentiment140 O 80 1,600,000 Tweet
WordNet 10 117,000 Synset
Yelp Reviews O
2,660 (JSON) 5,200,000 Article
2,900 (SQL) 174,000
Business
attributes
7,500 (Photos) 200,000 Image
The Wikipedia Corpus 20 4,400,000 Article
The Blog Authorship Corpus 300 140,000,000 Word
Machine Translation of Various Langu
ages
15,000 30,000,000 Sentence
X 70,000
MNIST
X 1,500,000
ImageNet
X 9,011,219
Open Images Dataset
X 117,000
WordNet
X 5,200,000
Yelp Review (JSON)
X 140,000,000
The Blog Authorship Corpus
@arunsasi revisited from @xkcdComic
Analytics Vidhya web page(https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/) 재구성
© 2018 IBM Corporation Page
Trend in Machine Learning (Deep Neural Network)
ResNet-50 Success Case
11
ResNet-50 Success Time-to-accuracy GPUs Efficiency
Facebook
(Caffe2)
2 days 1 hour 352 90% (Large-batch)
IBM Power AI
(Caffe)
50 minutes 256 95% (Large-batch)
Google
(TensorFlow)
~24 hours 64 TPUs > 90%
Preferred Networks
(Chainer)
15 minutes 1,000 > 90%
Cray @ CSCS
(Tensorflow)
< 14 minutes 1,000 > 95%
“AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management” by Dr. Rangan Sukumar, Office of CTO, Cray Inc @SCA2018 발췌
© 2018 IBM Corporation 12Page
High Performance
Computing (HPC)
Definition & Technologies
© 2018 IBM Corporation Page
High Performance Computing
13
HPC(High Performance Computing) 이란
고정밀도 계산 및 대규모 계산량이 수반되는 연산을 위한 고성능 컴퓨터(슈퍼 컴퓨터) 및 고성능
컴퓨터 클러스터를 이용하여 구축된 시스템 또는 환경, 다양한 수치해석/수학 라이브러리를 포함
Framework & Library
: Parallel programming
Algorithm
: Problem domain specific solution
System Hardware
: High Speed devices & Algorithm
Image from ‘Theoretical Heterogeneous Catalysis from Advanced Ab Initio
Molecular Dynamics Simulations project from Gauss Centre for Supercomputing’
1) http://www.gauss-centre.eu/gauss-centre/EN/Projects/MaterialsScienceChemistry/2018/marx_pr63ce.html?nn=1236240
© 2018 IBM Corporation Page
General Approach of HPC Application
14
Coarse grain, Fine grain
Fine grain – Data & Loop ParallelismCoarse grain – Task & Data Parallelism
Source) Teodoro, G., Pan, T., Kurc, T. M., Kong, J., Cooper, L. a. D., Podhorszki, N., … Saltz, J. H. (2013). High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Rectangle tilesOriginal Image
Workers
© 2018 IBM Corporation Page
HPC Technical Landscape
15
Some icons from the noun project (http://thenounprojecct.com) - Creaticca Creative Agency, Chad Remsing,
Cluster
Hardware
System
Software
HPCTechnology
Middleware &
Management
Infiniband + Ethernet SAN + Local Node Storage GPGPU or Accelerators
Revisited from A. Reed, Daniel & Dongarra, Jack. (2015). Exascale Computing and Big Data. Communications of the ACM. 58. 56-68. 10.1145/2699414.
ParallelFramework
NumericalLibraries
SystemTool
Development Language
ⓒ Romanzes637@Wikimedia Commons @Wikimedia Commons ⓒ Éducation nationale @Wikimedia Commons
HPC
Applications
Linux OS variant Hadoop
© 2018 IBM Corporation Page
HPC Technology
16
SIMD & Vectorization Asynchronous Communication & Pipeline
Sparse Matrix Representation Reduce Cache miss & Memory align
• Cf. Compressed Sparse Rows(CSR)
Scalar Operation SIMD Operation
A0
B0
C0
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
+ +
© 2018 IBM Corporation Page
Exascale Computing in HPC
17
Image from Robert F. Service. Design for U.S. exascale computer takes shape. Science 09 Feb 2018
© 2018 IBM Corporation Page© 2018 IBM Corporation 18Page
HPC for AI
HPC technology in Machine
Learning Training
© 2018 IBM Corporation Page
Characteristic of HPC
19
Low Latency and High Throughput not Traffic
Enterprise Solution – Mass Traffic Handling HPC – Large Scale Problem Solving
© 2018 IBM Corporation Page
Enterprise Solution – Mass Traffic Handling HPC – Large Scale Problem Solving
Characteristic of HPC
20
Low Latency and High Throughput not Traffic
© 2018 IBM Corporation Page
HPC goes on
21
For
Science and Engineering
For
Big Data
For
Machine Learning
~2010 ~2016
+ +
© 2018 IBM Corporation Page
The Convergence of Big Data, AI and HPC
22
Image from Rajesh Chhabra (raj@cray.com). New Era of High Performance Computing (convergence of AI, Big Data, HPC)
© 2018 IBM Corporation Page
Hardware Trend in HPC for AI
23
GPGPU Trend
Table and Image from NVIDIA White paper(http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf)
고집적(21.1B Transistor)을 통해서 계산 절대량이 증가
Cache 메모리 증가 및 NVLink를 통한 통신 속도 향상
© 2018 IBM Corporation Page
Hardware Trend in HPC for AI
24
Not only for Latency and Throughput
Image from NVIDIA White paper(http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf)
INT Data type 지원 및
Tensor 연산 H/W 레벨 지원
© 2018 IBM Corporation Page
Nehalem Westmaere Sandy Bridge Ivy Bridge Haswell Broadwell Skylake
45nm 32nm 32nm 22nm 22nm 14nm 14nm
Tock Tick Tock Tick Tock Tick Tock
MMX, SSE, SSE2, SSE3, SSSE3,
SSE4.1, SSE4.2
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
SSE4.2, SSE4A, AES, AVX
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, AES,
AVX, AVX2, FMA3, TSX
128 bit (SIMD Width of INT) 256 bit (SIMD Width of INT)
128 bit (SIMD Width of FP) 256 bit (SIMD Width of FP)
8 Flops (SP) 16 Flops (SP) 32 Flops (SP)
4 Flops (DP) 8 Flops (DP) 16 Flops (DP)
2.5 GT / s QPI 5 GT / s DMI 8 GT / s DMI
Hardware Trend in HPC for AI
25
CPU Trend
Images and Data sheets from https://prohardver.hu/teszt/intel_architekturak_nehalemtol_skylake-ig/nyomtatobarat/teljes.html
© 2018 IBM Corporation Page
Case Study - Accelerating deep neural network training for action recognition on a cluster of GPUs
26
Guojing Cong; Giacomo Domeniconi; Joshua Shapiro; Fan Zhou; Barry Chen(2018). Accelerating deep neural network training for action recognition on a cluster of GPUs. In HPML 2018.
Technology : Distributed multinode computing, Reducing Communication
How to : AAVG 사용으로 최적의 Batch size 계산 하여 지속적으로 변경
*Optical flow: Random 추출한 연속된 이미지 Frame들. 픽셀의 흐름을 고려 다음 Frame 속도, 위치 추정.
Two-Streams CNNs for Action Recognition
Accuracy
• RGB stream = 85.04%
• Flow stream = 84.5%
• Two-streams combined = 91.3%
There are 4 IBM Minsky nodes.
Each node has 2 Power8 CPUs with 10 cores and 4 NVIDIA
Tesla P100 GPUs. The Nodes communicate using infiniband.
AAVG : Adaptive-batchsize model averaging with sparse communication
분산 트레이닝 환경에서 매 Epoch마다 동적으로 batch size를 결정한 후 트레이닝하는 알고리즘.
각 Node들이 병렬적으로 SGD 수행 후 각 gradient의 평균을 종합하여 batch size를 결정함으로써 성능을 개선함.
© 2018 IBM Corporation Page
Case Study - Accelerating deep neural network training for action recognition on a cluster of GPUs
27
Conclusion
Base-line single-GPU: Takes 2,067 minutes to train to achieve similar validation accuracy.
UCF101 with 16 GPUs : Takes 61 minutes (x33.8) using AAVG with customized adam
Guojing Cong; Giacomo Domeniconi; Joshua Shapiro; Fan Zhou; Barry Chen(2018). Accelerating deep neural network training for action recognition on a cluster of GPUs. In HPML 2018.
© 2018 IBM Corporation Page
Case Study - A Case Study on Optimizing Accurate Half Precision Average
28
Kenny Peou; Joel Falcou and Alan Kelly(2018). A Case Study on Optimizing Accurate Half Precision Average. In HPML 2018.
Technology : SIMD, Optimized Algorithm
How to : 성능과 효율을 위해서 FP16을 사용하여 Avg. 계산 방법을 개선
"평균"을 구하는 연산은 Machine Lerarning에서 매우 빈번하게 사용됨. (k-means, meanshift, average pooling 등)
방대한 데이터를 수억, 수조번 연산해야하므로, 기본 연산의 정확성과 속도를 향상시키는 것은 매우 중요.
하지만 메모리 효율을 위하여 FP16을 사용할 경우 많은 오류가 발생하게 됨
ㆍSum then divide : High speed. But overflow easily.
ㆍIterative average : Fails for large N
ㆍKahan sum : Still susceptible to overflow and misalignments
ㆍCascading Average
- Pairwise operations
- no overflow
- ≤ 2N operations
© 2018 IBM Corporation Page
Case Study - A Case Study on Optimizing Accurate Half Precision Average
29
Conclusion
Kenny Peou; Joel Falcou and Alan Kelly(2018). A Case Study on Optimizing Accurate Half Precision Average. In HPML 2018.
© 2018 IBM Corporation Page
Case Study - Deep Learning on Large-scale Multicore Clusters
30
1. Simple Convolution
2. Lowering Convolution
3. Pipelined Lowering Convolution
Kazumasa Sakiyama, Shinpei Kato; Yutaka Ishikawa; Atsushi Hori and Abraham Monrroy (2018). Deep Learning on Large-scale Multicore Clusters. In HPML 2018.
Technology : Lowering, Data and Model Parallelism
How to : CNN에서 Matrix 연산을 Node 및 CPU Core level에서 병렬화
CNN을 가속화하는 방법으로 Lowering 기법이 있음 (Lowering: 각각의 convolutional layer를 행렬의 곱셈 연산식으로 변환)
Lowering과 행렬의 곱셈연산을 하나의 core에서 동시 수행하도록 병렬화를 진행 아래의 3가지 case에 대해 성능 비교
© 2018 IBM Corporation Page
Case Study - Deep Learning on Large-scale Multicore Clusters
31
Conclusion
Pipelined Convolution이 Conventional Convolution에 비해 약 1.64배의 성능향상을 보임.
Pipelined Convolution의 성능이 weight가 높은 layer에 대해서는 성능이 감소함.
CNN이 multi-core cluster에서 병렬 처리 되었을 때 데이터 병렬화에 따라 성능은 linear하게 향상하였다.
Kazumasa Sakiyama, Shinpei Kato; Yutaka Ishikawa; Atsushi Hori and Abraham Monrroy (2018). Deep Learning on Large-scale Multicore Clusters. In HPML 2018.
© 2018 IBM Corporation Page
Case Study – Performance comparison between Dist. tensorflow and w/Horovod
32
Technology : Optimize communication, Reducing communication overhead
How to : Dist. tensorflow를 통신 최적화 F/W 사용 여부 및 H/W 최적화에 따른 성능 비교
Horovod is a distributed training framework for TensorFlow,
Keras, and PyTorch.
The goal of Horovod is to make distributed Deep Learning
fast and easy to use.
Images, Description and Sample code from Horovod Github (https://github.com/uber/horovod)
import tensorflow as tf
import horovod.tensorflow as hvd
# Initialize Horovod
hvd.init()
# Pin GPU to be used to process local rank (one GPU per process)
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())
# Build model...
loss = ...
opt = tf.train.AdagradOptimizer(0.01 * hvd.size())
# Add Horovod Distributed Optimizer
opt = hvd.DistributedOptimizer(opt)
# Add hook to broadcast variables from rank 0 to all other processes during # initialization.
hooks = [hvd.BroadcastGlobalVariablesHook(0)]
# Make training operation
train_op = opt.minimize(loss)
# Save checkpoints only on worker 0 to prevent other workers from corrupting them.
checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None
# The MonitoredTrainingSession takes care of session initialization,
# restoring from a checkpoint, saving to a checkpoint, and closing when done
# or an error occurs.
with tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir,
config=config,
hooks=hooks) as mon_sess:
while not mon_sess.should_stop():
# Perform synchronous training.
mon_sess.run(train_op)
© 2018 IBM Corporation Page
Case Study – Performance comparison between Dist. tensorflow and w/Horovod
33
GPUDirectRDMA
Revisits from https://developer.nvidia.com/gpudirect
Mallanox. Accelerating High Performance Computing with GPUDirect RDMA. GTC 2013
Image Source from http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-gpudirect-rdma.pdf
© 2018 IBM Corporation Page
Case Study – Performance comparison between Dist. tensorflow and w/Horovod
34
Single Root Complex
Revisits from https://developer.nvidia.com/gpudirect
Images from Microway homepage https://www.microway.com/product/octoputer-4u-8-gpu-server-2-5-drives/octoputer-8-gpu-with-dual-root-tesla-v100/
https://www.microway.com/product/octoputer-4u-8-gpu-server-2-5-drives/
© 2018 IBM Corporation Page
Case Study – Performance comparison between Dist. tensorflow and w/Horovod
35
Conclusion
[ Throughput 비교 ] [ 병렬 효율화 비교 ]
© 2018 IBM Corporation Page© 2018 IBM Corporation 36Page
Difference
between HPC and AI
따로 또 같이
© 2018 IBM Corporation Page
Journey of machine learning training
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
Bottleneck & Lifecycle
37
AI Engineer
© 2018 IBM Corporation Page
Journey of machine learning training
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
Bottleneck & Lifecycle
38
AI Engineer
© 2018 IBM Corporation Page
Journey of machine learning training
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon, Rose Alice Design
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
Bottleneck & Lifecycle
39
Company
© 2018 IBM Corporation Page
Journey of machine learning training
All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon, Rose Alice Design
Collect
Cleansing & Labeling
Model Selection
Training
Evaluation
Parameter Tuning
Prediction
Bottleneck & Lifecycle
40
Company
Closed Loop
© 2018 IBM Corporation Page
HPC vs AI
41
HPC
AI
Data Collecting
Data Cleaning
Reduce
Dimensions
Distributing Data Calculation Evaluation
Model
Deployment
Data Modeling &
Generation
Calculation Evaluation Paper or Applying
© 2018 IBM Corporation Page
HPC vs AI
42
HPC
AI
Data Modeling &
Generation
Data Collecting
Data Cleaning
Reduce
Dimensions
Distributing Data
Calculation
Calculation Evaluation
Evaluation
Model
Deployment
Paper or Applying
Closed Loop
Cluster
Management
Server
Outside of
system
© 2018 IBM Corporation Page
HPC Technical Landscape (Recall)
43
Some icons from the noun project (http://thenounprojecct.com) - Creaticca Creative Agency, Chad Remsing,
Cluster
Hardware
System
Software
HPCTechnology
Middleware &
Management
Infiniband + Ethernet SAN + Local Node Storage
Linux OS variant
GPGPU or Accelerators
Revisited from A. Reed, Daniel & Dongarra, Jack. (2015). Exascale Computing and Big Data. Communications of the ACM. 58. 56-68. 10.1145/2699414.
ParallelFramework
NumericalLibraries
SystemTool
Development Language
ⓒ Romanzes637@Wikimedia Commons @Wikimedia Commons ⓒ Éducation nationale @Wikimedia Commons
HPC
Applications
Hadoop
© 2018 IBM Corporation Page
HPC on Cloud Technical Landscape
44
Cluster
Hardware
System
Software
HPCTechnology
Middleware &
Management
Some icons from the noun project (http://thenounprojecct.com) - Creaticca Creative Agency, Chad Remsing,
Infiniband + Ethernet SAN + Local Node Storage
Linux OS variant
GPGPU or Accelerators
Revisited from A. Reed, Daniel & Dongarra, Jack. (2015). Exascale Computing and Big Data. Communications of the ACM. 58. 56-68. 10.1145/2699414.
ParallelFramework
NumericalLibraries
SystemTool
Development Language
ⓒ Romanzes637@Wikimedia Commons @Wikimedia Commons ⓒ Éducation nationale @Wikimedia Commons
HPC & AI
Applications
MLFramework
Hadoop
Source) https://towardsdatascience.com/gan-by-example-using-keras-on-tensorflow-backend-1a6d515a60d0
© 2018 IBM Corporation Page
ML or AI Platform in Public and Private
45
Copyright to AWS
Image from https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/how-it-works-training.html
Copyright to Airbnb
Image from https://www.slideshare.net/databricks/bighead-airbnbs-endtoend-machine-learning-platform-
with-krishna-puttaswamy-and-andrew-hoh
Copyright to Facebook
Image from https://www.matroid.com/scaledml/2018/yangqing.pdf
+Others…
© 2018 IBM Corporation Page
Wrap up
46
AI Saga HPC and AI
HPC for AI
Development
ⓒ Kamran kowsari@Wikimedia Commons
Application Disaster recoveryHPC Medical Transportation
© 2018 IBM Corporation 47Page
Thank you

More Related Content

Recently uploaded

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Recently uploaded (20)

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Deep Learning 남들보다 100배 빠르게 (High Performance Computing for AI)

  • 1. © 2018 IBM Corporation 조규남 /mystous@ Samsung Research 삼성전자 Deep Learning 남들보다 100배 빠르게 (High Performance Computing for AI)
  • 2. © 2018 IBM Corporation Page mystous@kyunam.com:~$ who am i 2 – Principal Software Engineer @ Samsung Electronics – C언어 pointer 이해한지 22년째… – About this topic Possibility of HPC application on Cloud infrastructure by container cluster (ACM Symposium On Applied Computing, 2019 – Submission) Parallel Eigenvalue Computation for very Large Scale Symmetric Matrix (2016년 Master Thesis) Time-efficient simulations of tight-binding electronic structures with Intel Xeon PhiTM many-core processors (Computer Physics Communications 209권, 2016 – 2저자) Design of Efficient Light-Absorption Layers with Earth-Abundant Materials: A Tight-Binding Study on Inter-Band Transition Rate of Si:P Quantum Dots (MRS fall 2015) 인텔 제온 파이를 활용한 푸아송 방정식 풀이의 병렬화 (한국정보처리학회 2015년 추계학술발표대회) 한국 슈퍼컴퓨팅 프로그래밍 경진대회 우수상(2015)
  • 3. © 2018 IBM Corporation Page What is High Performance? 3 High Performance High Throughput Low Latency High Availability Resource Demand Resource Management Resource Arbitration Increase Computation Efficiency Reduce Computation Overhead Manage event rate Control frequency of sampling Introduce Concurrency Maintain Multiple Copies Increase Available Resource Scheduling Policy BASS, Len; CLEMENTS, Paul; KAZMAN, Rick. Software architecture in practice. Addison-Wesley Professional, 2003.
  • 4. © 2018 IBM Corporation Page© 2018 IBM Corporation 4Page AI saga From the Deep Neural Network ⓒ Bikingdog@Wikimedia Commons
  • 5. © 2018 IBM Corporation Page Era of Artificial intelligence Domain 영역의 변화 5 AI의 모든 영역 적용 : AI 전 영역 확대 및 적용, 영역별 솔루션 Convergence 영역별 전문가의 시대 : HPC, End user app, AI 등 영역별로 별도의 기술 발전 High Performance Computing Application Service Artificial intelligence ⓒ Kamran kowsari@Wikimedia Commons Application Disaster recoveryHPC Medical Transportation
  • 6. © 2018 IBM Corporation Page Journey of machine learning training All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction 6
  • 7. © 2018 IBM Corporation Page Journey of machine learning training From where to where is Lifecycle? Where are bottleneck points? All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction 7
  • 8. © 2018 IBM Corporation Page Journey of machine learning training All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction Bottleneck & Lifecycle 8 AI Engineer
  • 9. © 2018 IBM Corporation Page Trend in Machine Learning (Deep Neural Network) Record numbers of 25 Open dataset 9 Analytics Vidhya web page(https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/) 재구성 Category Dataset Compressed Data Size Number of Records Type of Record Image MNIST 50 70,000 Image MS-COCO O 25,000 330,000 Image 250,000 key point ImageNet 150,000 1,500,000 Image Open Images Dataset O 500,000 9,011,219 Image VisualQA O 250,000 265,016 Image The Street View House Numbers (SVHN) 2,500 630,420 Image CIFAR-10 170 60,000 Image Fashion-MNIST 30 70,000 Image Audio/Speech Free Spoken Digit Dataset 10 1,500 Audio Free Music Archive (FMA) 1,000,000 100,000 Audio Ballroom O 14,000 700 Audio Million Song Dataset 280,000 1,000,000 Audio LibriSpeechv 60,000 1,000 Hours VoxCeleb 150 100,000 Utterance Category Dataset Compressed Data Size Number of Records Type of Record Natural Language Processing IMDB Reviews 80 25,000 Article Twenty Newsgroups 20 20,000 Message Sentiment140 O 80 1,600,000 Tweet WordNet 10 117,000 Synset Yelp Reviews O 2,660 (JSON) 5,200,000 Article 2,900 (SQL) 174,000 Business attributes 7,500 (Photos) 200,000 Image The Wikipedia Corpus 20 4,400,000 Article The Blog Authorship Corpus 300 140,000,000 Word Machine Translation of Various Languages 15,000 30,000,000 Sentence X 70,000 MNIST X 1,500,000 ImageNet X 9,011,219 Open Images Dataset X 117,000 WordNet X 5,200,000 Yelp Review (JSON) X 140,000,000 The Blog Authorship Corpus
  • 10. © 2018 IBM Corporation Page Trend in Machine Learning (Deep Neural Network) Open dataset record number 10 Category Dataset Compressed Data Size Number of Records Type of Record Image MNIST 50 70,000 Image MS-COCO O 25,000 330,000 Image 250,000 key point ImageNet 150,000 1,500,000 Image Open Images Dataset O 500,000 9,011,219 Image VisualQA O 250,000 265,016 Image The Street View House Numbers (SVHN) 2,500 630,420 Image CIFAR-10 170 60,000 Image Fashion-MNIST 30 70,000 Image Audio/Speech Free Spoken Digit Dataset 10 1,500 Audio Free Music Archive (FMA) 1,000,000 100,000 Audio Ballroom O 14,000 700 Audio Million Song Dataset 280,000 1,000,000 Audio LibriSpeechv 60,000 1,000 Hours VoxCeleb 150 100,000 Utterance Category Dataset Compressed Data Size Number of Records Type of Record Natural Language Processing IMDB Reviews 80 25,000 Article Twenty Newsgroups 20 20,000 Message Sentiment140 O 80 1,600,000 Tweet WordNet 10 117,000 Synset Yelp Reviews O 2,660 (JSON) 5,200,000 Article 2,900 (SQL) 174,000 Business attributes 7,500 (Photos) 200,000 Image The Wikipedia Corpus 20 4,400,000 Article The Blog Authorship Corpus 300 140,000,000 Word Machine Translation of Various Langu ages 15,000 30,000,000 Sentence X 70,000 MNIST X 1,500,000 ImageNet X 9,011,219 Open Images Dataset X 117,000 WordNet X 5,200,000 Yelp Review (JSON) X 140,000,000 The Blog Authorship Corpus @arunsasi revisited from @xkcdComic Analytics Vidhya web page(https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/) 재구성
  • 11. © 2018 IBM Corporation Page Trend in Machine Learning (Deep Neural Network) ResNet-50 Success Case 11 ResNet-50 Success Time-to-accuracy GPUs Efficiency Facebook (Caffe2) 2 days 1 hour 352 90% (Large-batch) IBM Power AI (Caffe) 50 minutes 256 95% (Large-batch) Google (TensorFlow) ~24 hours 64 TPUs > 90% Preferred Networks (Chainer) 15 minutes 1,000 > 90% Cray @ CSCS (Tensorflow) < 14 minutes 1,000 > 95% “AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management” by Dr. Rangan Sukumar, Office of CTO, Cray Inc @SCA2018 발췌
  • 12. © 2018 IBM Corporation 12Page High Performance Computing (HPC) Definition & Technologies
  • 13. © 2018 IBM Corporation Page High Performance Computing 13 HPC(High Performance Computing) 이란 고정밀도 계산 및 대규모 계산량이 수반되는 연산을 위한 고성능 컴퓨터(슈퍼 컴퓨터) 및 고성능 컴퓨터 클러스터를 이용하여 구축된 시스템 또는 환경, 다양한 수치해석/수학 라이브러리를 포함 Framework & Library : Parallel programming Algorithm : Problem domain specific solution System Hardware : High Speed devices & Algorithm Image from ‘Theoretical Heterogeneous Catalysis from Advanced Ab Initio Molecular Dynamics Simulations project from Gauss Centre for Supercomputing’ 1) http://www.gauss-centre.eu/gauss-centre/EN/Projects/MaterialsScienceChemistry/2018/marx_pr63ce.html?nn=1236240
  • 14. © 2018 IBM Corporation Page General Approach of HPC Application 14 Coarse grain, Fine grain Fine grain – Data & Loop ParallelismCoarse grain – Task & Data Parallelism Source) Teodoro, G., Pan, T., Kurc, T. M., Kong, J., Cooper, L. a. D., Podhorszki, N., … Saltz, J. H. (2013). High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Rectangle tilesOriginal Image Workers
  • 15. © 2018 IBM Corporation Page HPC Technical Landscape 15 Some icons from the noun project (http://thenounprojecct.com) - Creaticca Creative Agency, Chad Remsing, Cluster Hardware System Software HPCTechnology Middleware & Management Infiniband + Ethernet SAN + Local Node Storage GPGPU or Accelerators Revisited from A. Reed, Daniel & Dongarra, Jack. (2015). Exascale Computing and Big Data. Communications of the ACM. 58. 56-68. 10.1145/2699414. ParallelFramework NumericalLibraries SystemTool Development Language ⓒ Romanzes637@Wikimedia Commons @Wikimedia Commons ⓒ Éducation nationale @Wikimedia Commons HPC Applications Linux OS variant Hadoop
  • 16. © 2018 IBM Corporation Page HPC Technology 16 SIMD & Vectorization Asynchronous Communication & Pipeline Sparse Matrix Representation Reduce Cache miss & Memory align • Cf. Compressed Sparse Rows(CSR) Scalar Operation SIMD Operation A0 B0 C0 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 + +
  • 17. © 2018 IBM Corporation Page Exascale Computing in HPC 17 Image from Robert F. Service. Design for U.S. exascale computer takes shape. Science 09 Feb 2018
  • 18. © 2018 IBM Corporation Page© 2018 IBM Corporation 18Page HPC for AI HPC technology in Machine Learning Training
  • 19. © 2018 IBM Corporation Page Characteristic of HPC 19 Low Latency and High Throughput not Traffic Enterprise Solution – Mass Traffic Handling HPC – Large Scale Problem Solving
  • 20. © 2018 IBM Corporation Page Enterprise Solution – Mass Traffic Handling HPC – Large Scale Problem Solving Characteristic of HPC 20 Low Latency and High Throughput not Traffic
  • 21. © 2018 IBM Corporation Page HPC goes on 21 For Science and Engineering For Big Data For Machine Learning ~2010 ~2016 + +
  • 22. © 2018 IBM Corporation Page The Convergence of Big Data, AI and HPC 22 Image from Rajesh Chhabra (raj@cray.com). New Era of High Performance Computing (convergence of AI, Big Data, HPC)
  • 23. © 2018 IBM Corporation Page Hardware Trend in HPC for AI 23 GPGPU Trend Table and Image from NVIDIA White paper(http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf) 고집적(21.1B Transistor)을 통해서 계산 절대량이 증가 Cache 메모리 증가 및 NVLink를 통한 통신 속도 향상
  • 24. © 2018 IBM Corporation Page Hardware Trend in HPC for AI 24 Not only for Latency and Throughput Image from NVIDIA White paper(http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf) INT Data type 지원 및 Tensor 연산 H/W 레벨 지원
  • 25. © 2018 IBM Corporation Page Nehalem Westmaere Sandy Bridge Ivy Bridge Haswell Broadwell Skylake 45nm 32nm 32nm 22nm 22nm 14nm 14nm Tock Tick Tock Tick Tock Tick Tock MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, AES, AVX MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, AES, AVX, AVX2, FMA3, TSX 128 bit (SIMD Width of INT) 256 bit (SIMD Width of INT) 128 bit (SIMD Width of FP) 256 bit (SIMD Width of FP) 8 Flops (SP) 16 Flops (SP) 32 Flops (SP) 4 Flops (DP) 8 Flops (DP) 16 Flops (DP) 2.5 GT / s QPI 5 GT / s DMI 8 GT / s DMI Hardware Trend in HPC for AI 25 CPU Trend Images and Data sheets from https://prohardver.hu/teszt/intel_architekturak_nehalemtol_skylake-ig/nyomtatobarat/teljes.html
  • 26. © 2018 IBM Corporation Page Case Study - Accelerating deep neural network training for action recognition on a cluster of GPUs 26 Guojing Cong; Giacomo Domeniconi; Joshua Shapiro; Fan Zhou; Barry Chen(2018). Accelerating deep neural network training for action recognition on a cluster of GPUs. In HPML 2018. Technology : Distributed multinode computing, Reducing Communication How to : AAVG 사용으로 최적의 Batch size 계산 하여 지속적으로 변경 *Optical flow: Random 추출한 연속된 이미지 Frame들. 픽셀의 흐름을 고려 다음 Frame 속도, 위치 추정. Two-Streams CNNs for Action Recognition Accuracy • RGB stream = 85.04% • Flow stream = 84.5% • Two-streams combined = 91.3% There are 4 IBM Minsky nodes. Each node has 2 Power8 CPUs with 10 cores and 4 NVIDIA Tesla P100 GPUs. The Nodes communicate using infiniband. AAVG : Adaptive-batchsize model averaging with sparse communication 분산 트레이닝 환경에서 매 Epoch마다 동적으로 batch size를 결정한 후 트레이닝하는 알고리즘. 각 Node들이 병렬적으로 SGD 수행 후 각 gradient의 평균을 종합하여 batch size를 결정함으로써 성능을 개선함.
  • 27. © 2018 IBM Corporation Page Case Study - Accelerating deep neural network training for action recognition on a cluster of GPUs 27 Conclusion Base-line single-GPU: Takes 2,067 minutes to train to achieve similar validation accuracy. UCF101 with 16 GPUs : Takes 61 minutes (x33.8) using AAVG with customized adam Guojing Cong; Giacomo Domeniconi; Joshua Shapiro; Fan Zhou; Barry Chen(2018). Accelerating deep neural network training for action recognition on a cluster of GPUs. In HPML 2018.
  • 28. © 2018 IBM Corporation Page Case Study - A Case Study on Optimizing Accurate Half Precision Average 28 Kenny Peou; Joel Falcou and Alan Kelly(2018). A Case Study on Optimizing Accurate Half Precision Average. In HPML 2018. Technology : SIMD, Optimized Algorithm How to : 성능과 효율을 위해서 FP16을 사용하여 Avg. 계산 방법을 개선 "평균"을 구하는 연산은 Machine Lerarning에서 매우 빈번하게 사용됨. (k-means, meanshift, average pooling 등) 방대한 데이터를 수억, 수조번 연산해야하므로, 기본 연산의 정확성과 속도를 향상시키는 것은 매우 중요. 하지만 메모리 효율을 위하여 FP16을 사용할 경우 많은 오류가 발생하게 됨 ㆍSum then divide : High speed. But overflow easily. ㆍIterative average : Fails for large N ㆍKahan sum : Still susceptible to overflow and misalignments ㆍCascading Average - Pairwise operations - no overflow - ≤ 2N operations
  • 29. © 2018 IBM Corporation Page Case Study - A Case Study on Optimizing Accurate Half Precision Average 29 Conclusion Kenny Peou; Joel Falcou and Alan Kelly(2018). A Case Study on Optimizing Accurate Half Precision Average. In HPML 2018.
  • 30. © 2018 IBM Corporation Page Case Study - Deep Learning on Large-scale Multicore Clusters 30 1. Simple Convolution 2. Lowering Convolution 3. Pipelined Lowering Convolution Kazumasa Sakiyama, Shinpei Kato; Yutaka Ishikawa; Atsushi Hori and Abraham Monrroy (2018). Deep Learning on Large-scale Multicore Clusters. In HPML 2018. Technology : Lowering, Data and Model Parallelism How to : CNN에서 Matrix 연산을 Node 및 CPU Core level에서 병렬화 CNN을 가속화하는 방법으로 Lowering 기법이 있음 (Lowering: 각각의 convolutional layer를 행렬의 곱셈 연산식으로 변환) Lowering과 행렬의 곱셈연산을 하나의 core에서 동시 수행하도록 병렬화를 진행 아래의 3가지 case에 대해 성능 비교
  • 31. © 2018 IBM Corporation Page Case Study - Deep Learning on Large-scale Multicore Clusters 31 Conclusion Pipelined Convolution이 Conventional Convolution에 비해 약 1.64배의 성능향상을 보임. Pipelined Convolution의 성능이 weight가 높은 layer에 대해서는 성능이 감소함. CNN이 multi-core cluster에서 병렬 처리 되었을 때 데이터 병렬화에 따라 성능은 linear하게 향상하였다. Kazumasa Sakiyama, Shinpei Kato; Yutaka Ishikawa; Atsushi Hori and Abraham Monrroy (2018). Deep Learning on Large-scale Multicore Clusters. In HPML 2018.
  • 32. © 2018 IBM Corporation Page Case Study – Performance comparison between Dist. tensorflow and w/Horovod 32 Technology : Optimize communication, Reducing communication overhead How to : Dist. tensorflow를 통신 최적화 F/W 사용 여부 및 H/W 최적화에 따른 성능 비교 Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use. Images, Description and Sample code from Horovod Github (https://github.com/uber/horovod) import tensorflow as tf import horovod.tensorflow as hvd # Initialize Horovod hvd.init() # Pin GPU to be used to process local rank (one GPU per process) config = tf.ConfigProto() config.gpu_options.visible_device_list = str(hvd.local_rank()) # Build model... loss = ... opt = tf.train.AdagradOptimizer(0.01 * hvd.size()) # Add Horovod Distributed Optimizer opt = hvd.DistributedOptimizer(opt) # Add hook to broadcast variables from rank 0 to all other processes during # initialization. hooks = [hvd.BroadcastGlobalVariablesHook(0)] # Make training operation train_op = opt.minimize(loss) # Save checkpoints only on worker 0 to prevent other workers from corrupting them. checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None # The MonitoredTrainingSession takes care of session initialization, # restoring from a checkpoint, saving to a checkpoint, and closing when done # or an error occurs. with tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir, config=config, hooks=hooks) as mon_sess: while not mon_sess.should_stop(): # Perform synchronous training. mon_sess.run(train_op)
  • 33. © 2018 IBM Corporation Page Case Study – Performance comparison between Dist. tensorflow and w/Horovod 33 GPUDirectRDMA Revisits from https://developer.nvidia.com/gpudirect Mallanox. Accelerating High Performance Computing with GPUDirect RDMA. GTC 2013 Image Source from http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-gpudirect-rdma.pdf
  • 34. © 2018 IBM Corporation Page Case Study – Performance comparison between Dist. tensorflow and w/Horovod 34 Single Root Complex Revisits from https://developer.nvidia.com/gpudirect Images from Microway homepage https://www.microway.com/product/octoputer-4u-8-gpu-server-2-5-drives/octoputer-8-gpu-with-dual-root-tesla-v100/ https://www.microway.com/product/octoputer-4u-8-gpu-server-2-5-drives/
  • 35. © 2018 IBM Corporation Page Case Study – Performance comparison between Dist. tensorflow and w/Horovod 35 Conclusion [ Throughput 비교 ] [ 병렬 효율화 비교 ]
  • 36. © 2018 IBM Corporation Page© 2018 IBM Corporation 36Page Difference between HPC and AI 따로 또 같이
  • 37. © 2018 IBM Corporation Page Journey of machine learning training All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction Bottleneck & Lifecycle 37 AI Engineer
  • 38. © 2018 IBM Corporation Page Journey of machine learning training All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction Bottleneck & Lifecycle 38 AI Engineer
  • 39. © 2018 IBM Corporation Page Journey of machine learning training All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon, Rose Alice Design Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction Bottleneck & Lifecycle 39 Company
  • 40. © 2018 IBM Corporation Page Journey of machine learning training All icon from the noun project (http://thenounprojecct.com) - National Park service, Yazmin Alanis, Bakunetsu Kaito, Gan Khoon Lay, ProSymbols, pxLens, Matt Hawdon, Rose Alice Design Collect Cleansing & Labeling Model Selection Training Evaluation Parameter Tuning Prediction Bottleneck & Lifecycle 40 Company Closed Loop
  • 41. © 2018 IBM Corporation Page HPC vs AI 41 HPC AI Data Collecting Data Cleaning Reduce Dimensions Distributing Data Calculation Evaluation Model Deployment Data Modeling & Generation Calculation Evaluation Paper or Applying
  • 42. © 2018 IBM Corporation Page HPC vs AI 42 HPC AI Data Modeling & Generation Data Collecting Data Cleaning Reduce Dimensions Distributing Data Calculation Calculation Evaluation Evaluation Model Deployment Paper or Applying Closed Loop Cluster Management Server Outside of system
  • 43. © 2018 IBM Corporation Page HPC Technical Landscape (Recall) 43 Some icons from the noun project (http://thenounprojecct.com) - Creaticca Creative Agency, Chad Remsing, Cluster Hardware System Software HPCTechnology Middleware & Management Infiniband + Ethernet SAN + Local Node Storage Linux OS variant GPGPU or Accelerators Revisited from A. Reed, Daniel & Dongarra, Jack. (2015). Exascale Computing and Big Data. Communications of the ACM. 58. 56-68. 10.1145/2699414. ParallelFramework NumericalLibraries SystemTool Development Language ⓒ Romanzes637@Wikimedia Commons @Wikimedia Commons ⓒ Éducation nationale @Wikimedia Commons HPC Applications Hadoop
  • 44. © 2018 IBM Corporation Page HPC on Cloud Technical Landscape 44 Cluster Hardware System Software HPCTechnology Middleware & Management Some icons from the noun project (http://thenounprojecct.com) - Creaticca Creative Agency, Chad Remsing, Infiniband + Ethernet SAN + Local Node Storage Linux OS variant GPGPU or Accelerators Revisited from A. Reed, Daniel & Dongarra, Jack. (2015). Exascale Computing and Big Data. Communications of the ACM. 58. 56-68. 10.1145/2699414. ParallelFramework NumericalLibraries SystemTool Development Language ⓒ Romanzes637@Wikimedia Commons @Wikimedia Commons ⓒ Éducation nationale @Wikimedia Commons HPC & AI Applications MLFramework Hadoop Source) https://towardsdatascience.com/gan-by-example-using-keras-on-tensorflow-backend-1a6d515a60d0
  • 45. © 2018 IBM Corporation Page ML or AI Platform in Public and Private 45 Copyright to AWS Image from https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/how-it-works-training.html Copyright to Airbnb Image from https://www.slideshare.net/databricks/bighead-airbnbs-endtoend-machine-learning-platform- with-krishna-puttaswamy-and-andrew-hoh Copyright to Facebook Image from https://www.matroid.com/scaledml/2018/yangqing.pdf +Others…
  • 46. © 2018 IBM Corporation Page Wrap up 46 AI Saga HPC and AI HPC for AI Development ⓒ Kamran kowsari@Wikimedia Commons Application Disaster recoveryHPC Medical Transportation
  • 47. © 2018 IBM Corporation 47Page Thank you