SlideShare a Scribd company logo
1 of 25
Download to read offline
/
/ /
National Institute of
Advanced Industrial Science and Technology (AIST)
Artificial Intelligence Research Center
Hitoshi Sato
1
About AIST
2
ITRI
Information Technology
Research Institute
Artificial Intelligence
Research Center
Real World Big-Data
Computing – Open
Innovation Laboratory
One of the public research institute in Japan under METI*, focus on
“bridging” the gap between innovative technological seeds and commercialization.
*Ministry of Economy, Trade and Industry
@Odaiba, Tokyo Joint Lab w/ Tokyo Tech
HQ @Tsukuba
Current Status of AI R&D in Japan:
AI requires the convergence of Algorithm, Big Data, and
Computing Infrastructure
3
Big Data
Algorithm
Computing Infrastructure
Deep Learning, Machine Learning, etc.
Cutting edge infrastructures,
Clouds, Supercomputers, etc.
We lack the cutting edge infrastructure
dedicated to AI and Big Data, open to public use
Big Companies AI/BD
R&D (also Science)
AI Venture Startups
AI/BD Centers & Labs in National
Labs & Universities
/
• Dense Problem
• Compute-intensive
• Dense Matrix Op., Multiply-Add
• Reduced Precision
• Regular data/memory accesses
• Sparse Problem
– Graph Data Analytics Symbol Processing
• Data-intensive
• Sparse Matrix Op., Integer
• Big Data Kernels (Sort, PrefixSum, etc.)
• Irregular data/memory accesses
4
Different Workload Characteristics,
But, Similar to HPC
AI requires HPC
Example of Real AI/DL Apps Usage
• lang2program
– https://github.com/kelvinguu/
lang2program
– Implementation for ACL 2017
(http://acl2017.org/ ) paper
– Provided as Dockerfile
• Including Tensorflow, CUDA, CuDNN,
PostgresQL, Python pip Packages, etc.
• Various software dependencies for installation
5
Can traditional shared HPC systems support
such DL Apps?
• Not allowed Docker for security reason
(rootless access required)
• Elaborate efforts for manual installation
6
F E DD 86G CBF
/ HA E 86
E6E F
H EL EC E
+L D B L D B
6E6 F FG AF
HFGE L
6G8 C
8 H EF
Linux O S
+B B 6
/M C86
GCE6
M
88 E6GCEF
VMsLLinux Containers
Ethernet
Local Node
Storage
x86 CPU
Distributed
Filesystems
HDFS
M apReduce Fram ew orks
Spark/Hadoop
User Applications
RDB
PostgresQL
ML Libraries
MLlib/
Mahout
Graph
Libraries
GraphX/
Giraph
JavaL ScalaM IDL
SQL
Hive/Pig
CloudDB/NoSQL
Hbase/Cassandra/MondoDB
CordinationEngine
ZooKeeper
BHJ
CH F HD E8CADHG EFDD 86G CB
FG A C GI6E
6E I6E
CEGE6BL L M +
• Cloud Jobs are often Interactive w/ resource control REST APIs
• HPC Jobs are Batch-Oriented, resource control by MPI
• Cloud employs High Productivity Languages but performance neglected, focus on
data analytics and dynamic frequent changes
• HPC employs High Performance Languages but requires Ninja Programmers, low
productivity. Kernels & compilers well tuned & result shared by many programs,
less rewrite
• Cloud HW based on Web Server “commodity” x86 servers, distributed storage on
nodes assuming REST API access
• HPC HW aggressively adopts new technologies such a s GPUs, focused on ultimate
performance at higher cost, shared storage to support legacy apps
Various convergence research efforts underway but no realistic converged
SW Stack yet
C GI6E 8CF FG A CE B +
//
CE CI
• Cloud focused on databases and data manipulation workflow
• HPC focused on compute kernels, even for data processing. Jobs scales to
thousands of jobs, thus debugging and performance tuning
• Cloud requires purpose-specific computing/data environment as well as their mutual
isolation & security
• HPC requires environment for fast & lean use of resources, but on modern machines
require considerable system software support
Characteristics of Deep Learning I/O Workloads
• Small READ I/O to Huge Files
– Native Media Files such as Jpeg, Wav, Text, etc.
– Painful I/O patterns for shared parallel file
systems
– Increasing Metadata Operation
• Randomness is essentially important
– Runtime Data Augmentation for
Training Accuracy
• Random Crop
• Random Horizontal Flip
• Normalization, etc.
7
Training Dataset to ImageNet1K
• Expected Power Limit of Real HPC Systems
– Multi MW for systems
– 60 70 kW per rack
• General-purpose Data Centers
– Multi kW per rack
– Lack of power density for HPC
→ How to introduce supercomputer cooling technologies
to data centers
üPower Efficient, Thermal Density, Maintainability, etc.
üIn-expensive Build
8
AIST AI Cloud
Air Cooling
TSUBAME-KFC
Liquid
Immersion
Cooling
#3 Green500
2017 June
TSUBAME2
Water Cooling
#2 Green500
2010 Nov
#1 Green500
2013 Nov
• Accelerating AI/Big Data R&D
– Dedicated HPC Infra converged w/ AI/Big Data
• Extreme Computing Power
• Ultra High Bandwidth Low Latency in Memory, Network, and Storage
• Extreme Green
• Bridging AI Tech through Common Platform
– For AI R&D, Apps, Services, and Infrastructure Designs
– Aiming fast tech transfer through industry and academia collaboration
– De facto Standard Software
• Cloud-Datacenter-ready Infrastructure, but Supercomputing
– Supercomputer Cooling Technologies to Datacenter
– Inexpensive Facilities, etc.
9
AI Bridging Cloud Infrastructure
as World’s First Large-scale Open AI Infrastructure
• Open, Public, and Dedicated infrastructure for AI/Big Data
• Platform to accelerate joint academic-industry R&D for AI in Japan
• Top-level compute capability w/ 0.550EFlops(AI), 37.2 PFlops(DP)
( )2 5 0.1 ,. 10 5 ## 0 #
10
Univ. Tokyo Kashiwa II Campus
Operation Scheduled in 2018
• 1088x compute nodes w/ 4352x NVIDIA Tesla V100 GPUs, 476TiB of Memory,
1.6PB of NVMe SSDs, 22PB of HDD-based Storage and Infiniband EDR network
• Ultra-dense IDC design from the ground-up w/ 20x thermal density of standard IDC
• Extreme Green w/ ambient warm liquid cooling, high-efficiency power supplies, etc.,
commoditizing supercomputer cooling technologies to clouds ( 2.3MW, 70kW/rack)
11
Gateway and
Firewall
Computing Nodes: 0.550 EFlops(HP), 37 PFlops(DP)
476 TiB Mem, 1.6 PB NVMe SSD
Storage: 22 PB GPFS
High Performance Computing Nodes (w/GPU) x1088
• Intel Xeon Gold6148 (2.4GHz/20cores) x2
• NVIDIA Tesla V100 (SXM2) x 4
• 384GiB Memory, 1.6TB NVMe SSD
Multi-platform Nodes (w/o GPU) x10
• Intel Xeon Gold6132 (2.6GHz/14cores) x2
• 768GiB Memory, 3.8TB NVMe SSD
Interactive Nodes
DDN SFA14K
(w/ SS8462 Enclosure x 10) x 3
• 12TB 7.2Krpm NL-SAS HDD x 2400
• 3.84TB SAS SSD x 216
• NSD Servers x 12
Object Storage for Protocol Nodes
100GbE
Service Network (10GbE)
External
Networks
SINET5
Interconnect (Infiniband EDR)
ABCI: AI Bridging Cloud Infrastructure
12
System
(32 Racks)Rack
(17 Chassis)
Compute Node
(4GPUs, 2CPUs)
Chips
(GPU, CPU)
Node Chassis
(2 Compute Nodes)
NVIDIA Tesla V100
(16GB SMX2)
3.72 TB/s MEM BW
384 GiB MEM
200 Gbps NW BW
1.6TB NVMe SSD
1.16 PFlops(DP)
17.2 PFlops (AI)
37.2 PFlops(DP)
0.550 EFlops (AI)
68.5 PFlops(DP)
1.01 PFlops (AI)
34.2 TFlops(DP)
506 TFlops (AI)
GPU:
7.8 TFlops(DP)
125 TFlops (AI)
CPU:
1.53 TFlops(DP)
3.07 TFlops (AI)
Intel Xeon Gold 6148
(27.5M Cache,
2.40 GHz, 20 Core)
0.550 EFlops(AI), 37.2 PFlops(DP)
19.88 PFlops(Peak), Ranked #5 Top500 June 2018
131TB/s MEM BW
Full Bisection BW within Rack
70kW Max
1088 Compute Nodes
4352 GPUs
4.19 PB/s MEM BW
1/3 of Oversubscription BW
2.3MW
GPU Compute Nodes
• NVIDIA TESLA V100
(16GB, SXM2) x 4
• Intel Xeon Gold 6148
x 2 Sockets
– 20 cores per Socket
• 384GiB of DDR4 Memory
• 1.6TB NVMe SSD x 1
– Intel DC P4600 u.2
• EDR Infiniband HCA x 2
– Connected to other Compute Notes
and Filesystems
13
Xeon Gold
6148
Xeon Gold
6148
10.4GT/s x3DDR4-2666
32GB x 6
DDR4-2666
32GB x 6
128GB/s 128GB/s
IB HCA (100Gbps)IB HCA (100Gbps)
NVMe
UPI x3
x48 switch x64 switch
Tesla V100 SXM2 Tesla V100 SXM2
Tesla V100 SXM2 Tesla V100 SXM2
PCIe gen3 x16 PCIe gen3 x16
PCIe gen3 x16 PCIe gen3 x16
NVLink2 x2
Rack as Dense-packaged “Pod”
( AB < ) 1 6
) 0 BC / 0 BC ,3
0G < F BA -7
BH<D G D CF BA -7 FB
<IF<DA CB
) 7 7 4 I
7 F<D , D 2 D .BB A
14Pod #1
LEAF#1
(SB7890)
LEAF#2
(SB7890)
LEAF#3
(SB7890)
LEAF#4
(SB7890)
SPINE#1
(CS7500)
SPINE#2
(CS7500)
CX40
0#1
CX2570#1
CX2570#2
CX40
0#2
CX2570#3
CX2570#4
CX40
0#3
CX2570#5
CX2570#6
CX40
0#17
CX2570#33
CX2570#34
FBB#1
(SB7890)
FBB#2
(SB7890)
FBB#3
(SB7890)
1/3 Oversubscription BW
IB-EDR x 24
Full bisection BW
IB-EDR x 72
InfiniBand EDR x1
InfiniBand EDR x6
InfiniBand EDR x4
x 32 pods
Hierarchical Storage Tiers
• Local Storage
– 1.6 TB NVMe SSD (Intel DC P4600 u.2) per Node
– Local Storage Aggregation w/ BeeOnd
• Parallel Filesystem
– 22PB of GPFS
• DDN SFA14K ( w/ SS8462 Enclosure x 10) x 3 set
• Bare Metal NSD servers and Flash-based Metadata
Volumes for metadata operation acceleration
– Home and Shared Use
• Object Storage
– Part of GPFS using OpenStack Swift
– S3-like API Access, Global Shared Use
– Additional Secure Volumes w/ Encryption
(Planned)
15
Parallel Filesystem
Local Storage
as Burst Buffers
Object Storage as Campaign Storage
16
Better• Environments
– ABCI 64 nodes (256 GPUs)
– Framework: ChainerMN v1.3.0
• Chainer 4.2.0, Cupy 4.2.3, mpi4py 3.0.0, Python 3.6.5
– Baremetal
• CentOS 7.4, gcc-4.8.5,
CUDA 9.2, CuDNN 7.1.4, NCCL2.2, OpenMPI 2,1.3
• Settings
– Dataset: Imagenet-1K
– Model: ResNet-50
– Training:
• Batch size: 32 per GPU, 32 x 256 in total
• Learning Rate: Starting 0.1 and x0.1 at 30, 60, 80 epoch
w/ warm up scheduling
• Optimization: Momentum SGD (momentum=0.9)
• Weight Decay: 0.0001
• Training Epoch: 100
.F F A A A 5L I F L I .FFCA 6 FCF A B I B
1A M A R I FL R LAC A
5L: A C 6FC I M F
4 B
P I B
P I B FI .0
3FM I
P ( 2
P 2 FI .0
:A I 1A LA .FFCA
P I CF B ) B I B
P / .FAC 7 A B I B
P 2 A F C
17
19m x 24m x 6m
計算
8
計算
7
計算
6
計算
5
計算
4
計算
3
計算
2
計算
1
計算
16
計算
15
計算
14
計算
13
計算
12
計算
11
計算
10
計算
9
計算
23
計算
22
計算
21
計算
20
計算
28
計算
27
計算
19
計算
18
計算
17
計算
32
計算
31
計算
30
計算
29
IB
1
IB2
計算
26
計算
25
計算
24
STR
G38
STR
G37
STR
G36
STR
G35
STR
G34
その
他
MPF
39
MP
F4
STR
G33
ラックレイアウト案IBスイッチラックをUPSゾーンに設置した場合
High Voltage Transform ers (3.25M W )
Passive
Cooling Tower
Free cooling
Cooling Capacity: 3MW
Lithium Battery
(Plan)
1MWh, 1MVA
Active Chillers
Cooling Capacity: 200kW
Future Expansion Space
UPS
72 Racks
w/ Hybrid Cooling
18 Racks
w/ UPS
Server Room Cooling
• Skeleton frame structure for
computer racks, water circuit,
air-cooling units, and cables
• 70kW/rack cooling capacity
60kW liquid (32 C water) + 10kW air
18
19 or 23 inch
Rack (48U)
ComputingServer
Hot Water
Circuit
Front side
Air
CDU
Coolability 10kW
Water
Hot Aisle Capping
35 C
Water Block
(CPU or/and Accelerator, etc.)
Air 40 C
32 C 40 C
Cold Water
Circuit
Fan Coil Unit
Coolability 10kW
Cold Aisle
35 C
Hot Aisle
40 C
Capping Wall
Capping Wall
Fan Coil Unit
Server Rack
W ater
Circuit
Fan Coil
Unit
Server
Rack
Hot
Aisle
Software Stuck for ABCI
• Batch Job Scheduler
– High throughput computing
• Minimum Pre-installed Software
– Users can deploy their environments
using anaconda, pip, python venv, etc.
– Reduce operational cost
• Container Support
– Singularity for multi-node jobs w/
user customized images
– Docker for single-node jobs w/
site-certified images
19
User Applications
DL
Frameworks
Python, Ruby, R, Java, Scala, Perl, Lua, etc.
ISV AppsOSSHadoop/
Spark
GCC PGI
OpenMPI MVAPICH2
Intel Parallel
Studio XE
Cluster Edition
CUDA/CUDNN/NCCL
GPFS BeeOND
OpenStack
Swift
Univa Grid Engine
Singularity Docker
CentOS/RHEL
• Singularity
– HPC Linux Container developed at LBNL and Sylabs Inc.
• http://singularity.lbl.gov/
• https://github.com/singularityware/singularity
– Rootless program execution, storage access
• No daemons w/ root privilege like Docker
– Computing resources are managed by job schedulers such as UGE
• Root privilege is given by commands w/ setuid
– Mount, Linux namespace creation, Bind paths between host and container
– Existing Docker images available via DockerHub
– HPC software stacks available
• CUDA, MPI, Infiniband, etc.
20
How to Create Container Images for Distributed DL
• Host
– Mount native device drivers and
system libraries to containers
• GPU, Infiniband, etc.
• cf. nvidia-docker
• Containers
– Install user space libraries
• CUDA, CuDNN, NCCL2, ibverbs, MPI, etc.
– Install distributed deep learning
frameworks
• Optimized build for underlying computing
resources
21
Base Drivers, Libraries on Host
CUDA
Drivers
Infiniband
Drivers
Filesystem
Libraries
(GPFS, Lustre)
Userland Libraries on Container
CUDA CuDNN NCCL2
MPI
(mpi4py)
Mount
ibverbs
Distributed Deep Learning
Frameworks
Caffe2 ChainerMNDistributed
TensorflowMXNet
AICloudModules: Container Image Collection for AI
• Base Images
– Optimized OS Images for
AI Cloud
• Including CUDA,CuDNN, NCCL,
MPI, etc., Libraries
• Excluding CUDA, Infiniband, GPFS
etc., native drivers on host
DL Images
• DL Images
– Installed Deep Learning
Frameworks and AI Applications
based on Base OS Images
22
Base Images
Base OS Images
incl. CUDA, CuDNN, NCCL2, MPI, Infiniband
DL Images
Images for Deep Learning Frameworks
based on HPCBase
CentOSUbuntu
Base Drivers, Libraries on Host
GPU Infiniband GPFSMPI
Caffe2 ChainerMNDistributed
TensorflowMXNet CNTK
https://hub.docker.com/r/aistairc/aaic/
as a prototype
• Accommodate open ecosystems of data and software for AI R&D
– Sharable/Reusable/Reproducible algorithm methods,
training data w/ labels, trained models, etc.
• Creating various social applications that use AI (ML, DL, etc.)
23
X
Y
Training
Dataset
Public Data
on Internet Deep Learning
Framework
Data
Collection and
Cleansing
Modeling Learning
Open
Data in
Enterprise
Open
Data in
Government
Trained
Model
Inference
Publish Trained Models to Public
to apply various domains
Prepare Training Datasets
Trial and error for designing
networks, parameters
Accelerate learning with HPC
to improve accuracy of models
Reuse
Trained
Model
Implementing
AI Techs
to Society
Retail Stores
Factories
Logistics Centers
Object Detection and
Tracking
Summary
• Issues for Public AI Infrastructures
– Accelerating AI R&D
– Bridging AI Tech through Common Platform
– Cloud-Datacenter-ready Infrastructure, but Supercomputing
• ABCI (AI Bridging Cloud Infrastructure)
– 0.550 EFlops(AI), 37.2 PFlops(DP),
19.88 PFlops(Peak) #5 Top500 June 2018
– Open, Public, Dedicated Infrastructure for AI/Big Data
– A Reference for AI/Big Data and HPC System
24
25

More Related Content

What's hot

最近のストリーム処理事情振り返り
最近のストリーム処理事情振り返り最近のストリーム処理事情振り返り
最近のストリーム処理事情振り返りSotaro Kimura
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIANVIDIA Japan
 
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)NTT DATA Technology & Innovation
 
開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDKNVIDIA Japan
 
お金が無いときのMySQL Cluster頼み
お金が無いときのMySQL Cluster頼みお金が無いときのMySQL Cluster頼み
お金が無いときのMySQL Cluster頼みaoike
 
機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点
機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点
機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点西岡 賢一郎
 
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)NTT DATA Technology & Innovation
 
20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔
20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔
20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔Preferred Networks
 
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)NTT DATA Technology & Innovation
 
ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43
ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43
ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43Preferred Networks
 
KDD'17読み会:Anomaly Detection with Robust Deep Autoencoders
KDD'17読み会:Anomaly Detection with Robust Deep AutoencodersKDD'17読み会:Anomaly Detection with Robust Deep Autoencoders
KDD'17読み会:Anomaly Detection with Robust Deep AutoencodersSatoshi Hara
 
Slurmのジョブスケジューリングと実装
Slurmのジョブスケジューリングと実装Slurmのジョブスケジューリングと実装
Slurmのジョブスケジューリングと実装Ryuichi Sakamoto
 
【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用
【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用
【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用日本マイクロソフト株式会社
 
DLL_言語系MicrosoftAIサービス最新情報_202302.pdf
DLL_言語系MicrosoftAIサービス最新情報_202302.pdfDLL_言語系MicrosoftAIサービス最新情報_202302.pdf
DLL_言語系MicrosoftAIサービス最新情報_202302.pdfAyako Omori
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksDeep Learning JP
 
PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」
PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」
PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」PC Cluster Consortium
 

What's hot (20)

最近のストリーム処理事情振り返り
最近のストリーム処理事情振り返り最近のストリーム処理事情振り返り
最近のストリーム処理事情振り返り
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIA
 
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
大量のデータ処理や分析に使えるOSS Apache Spark入門(Open Source Conference 2021 Online/Kyoto 発表資料)
 
BERT+XLNet+RoBERTa
BERT+XLNet+RoBERTaBERT+XLNet+RoBERTa
BERT+XLNet+RoBERTa
 
開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK
 
お金が無いときのMySQL Cluster頼み
お金が無いときのMySQL Cluster頼みお金が無いときのMySQL Cluster頼み
お金が無いときのMySQL Cluster頼み
 
機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点
機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点
機械学習プラットフォーム5つの課題とAmazon SageMakerの4つの利点
 
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
 
オンプレML基盤on Kubernetes 〜Yahoo! JAPAN AIPF〜
オンプレML基盤on Kubernetes 〜Yahoo! JAPAN AIPF〜オンプレML基盤on Kubernetes 〜Yahoo! JAPAN AIPF〜
オンプレML基盤on Kubernetes 〜Yahoo! JAPAN AIPF〜
 
20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔
20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔
20180115_東大医学部機能生物学セミナー_深層学習の最前線とこれから_岡野原大輔
 
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
 
ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43
ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43
ゼロから作るKubernetesによるJupyter as a Service ー Kubernetes Meetup Tokyo #43
 
[GKE & Spanner 勉強会] GKE 入門
[GKE & Spanner 勉強会] GKE 入門[GKE & Spanner 勉強会] GKE 入門
[GKE & Spanner 勉強会] GKE 入門
 
KDD'17読み会:Anomaly Detection with Robust Deep Autoencoders
KDD'17読み会:Anomaly Detection with Robust Deep AutoencodersKDD'17読み会:Anomaly Detection with Robust Deep Autoencoders
KDD'17読み会:Anomaly Detection with Robust Deep Autoencoders
 
Slurmのジョブスケジューリングと実装
Slurmのジョブスケジューリングと実装Slurmのジョブスケジューリングと実装
Slurmのジョブスケジューリングと実装
 
GPT解説
GPT解説GPT解説
GPT解説
 
【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用
【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用
【de:code 2020】 アマダの Azure への取り組みと DevOPS・MLOPS 環境の構築と運用
 
DLL_言語系MicrosoftAIサービス最新情報_202302.pdf
DLL_言語系MicrosoftAIサービス最新情報_202302.pdfDLL_言語系MicrosoftAIサービス最新情報_202302.pdf
DLL_言語系MicrosoftAIサービス最新情報_202302.pdf
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」
PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」
PCCC22:インテル株式会社 テーマ1「インテル® Agilex™ FPGA デバイス 最新情報」
 

Similar to ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data

OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...Facultad de Informática UCM
 
From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009Olli-Pekka Lehto
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Exadata architecture and internals presentation
Exadata architecture and internals presentationExadata architecture and internals presentation
Exadata architecture and internals presentationSanjoy Dasgupta
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 

Similar to ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data (20)

Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
NSCC Training Introductory Class
NSCC Training Introductory Class NSCC Training Introductory Class
NSCC Training Introductory Class
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
 
NSCC Training - Introductory Class
NSCC Training - Introductory ClassNSCC Training - Introductory Class
NSCC Training - Introductory Class
 
From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Exadata architecture and internals presentation
Exadata architecture and internals presentationExadata architecture and internals presentation
Exadata architecture and internals presentation
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Py tables
Py tablesPy tables
Py tables
 
PyTables
PyTablesPyTables
PyTables
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
Available HPC Resources at CSUC
Available HPC Resources at CSUCAvailable HPC Resources at CSUC
Available HPC Resources at CSUC
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 

More from Hitoshi Sato

AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合Hitoshi Sato
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerHitoshi Sato
 
Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習Hitoshi Sato
 
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会Hitoshi Sato
 
産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMNHitoshi Sato
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014Hitoshi Sato
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus WorkshopHitoshi Sato
 

More from Hitoshi Sato (10)

AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
 
Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習
 
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
 
GTC Japan 2017
GTC Japan 2017GTC Japan 2017
GTC Japan 2017
 
産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus Workshop
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
GTC Japan 2014
GTC Japan 2014GTC Japan 2014
GTC Japan 2014
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data

  • 1. / / / National Institute of Advanced Industrial Science and Technology (AIST) Artificial Intelligence Research Center Hitoshi Sato 1
  • 2. About AIST 2 ITRI Information Technology Research Institute Artificial Intelligence Research Center Real World Big-Data Computing – Open Innovation Laboratory One of the public research institute in Japan under METI*, focus on “bridging” the gap between innovative technological seeds and commercialization. *Ministry of Economy, Trade and Industry @Odaiba, Tokyo Joint Lab w/ Tokyo Tech HQ @Tsukuba
  • 3. Current Status of AI R&D in Japan: AI requires the convergence of Algorithm, Big Data, and Computing Infrastructure 3 Big Data Algorithm Computing Infrastructure Deep Learning, Machine Learning, etc. Cutting edge infrastructures, Clouds, Supercomputers, etc. We lack the cutting edge infrastructure dedicated to AI and Big Data, open to public use Big Companies AI/BD R&D (also Science) AI Venture Startups AI/BD Centers & Labs in National Labs & Universities
  • 4. / • Dense Problem • Compute-intensive • Dense Matrix Op., Multiply-Add • Reduced Precision • Regular data/memory accesses • Sparse Problem – Graph Data Analytics Symbol Processing • Data-intensive • Sparse Matrix Op., Integer • Big Data Kernels (Sort, PrefixSum, etc.) • Irregular data/memory accesses 4 Different Workload Characteristics, But, Similar to HPC AI requires HPC
  • 5. Example of Real AI/DL Apps Usage • lang2program – https://github.com/kelvinguu/ lang2program – Implementation for ACL 2017 (http://acl2017.org/ ) paper – Provided as Dockerfile • Including Tensorflow, CUDA, CuDNN, PostgresQL, Python pip Packages, etc. • Various software dependencies for installation 5 Can traditional shared HPC systems support such DL Apps? • Not allowed Docker for security reason (rootless access required) • Elaborate efforts for manual installation
  • 6. 6 F E DD 86G CBF / HA E 86 E6E F H EL EC E +L D B L D B 6E6 F FG AF HFGE L 6G8 C 8 H EF Linux O S +B B 6 /M C86 GCE6 M 88 E6GCEF VMsLLinux Containers Ethernet Local Node Storage x86 CPU Distributed Filesystems HDFS M apReduce Fram ew orks Spark/Hadoop User Applications RDB PostgresQL ML Libraries MLlib/ Mahout Graph Libraries GraphX/ Giraph JavaL ScalaM IDL SQL Hive/Pig CloudDB/NoSQL Hbase/Cassandra/MondoDB CordinationEngine ZooKeeper BHJ CH F HD E8CADHG EFDD 86G CB FG A C GI6E 6E I6E CEGE6BL L M + • Cloud Jobs are often Interactive w/ resource control REST APIs • HPC Jobs are Batch-Oriented, resource control by MPI • Cloud employs High Productivity Languages but performance neglected, focus on data analytics and dynamic frequent changes • HPC employs High Performance Languages but requires Ninja Programmers, low productivity. Kernels & compilers well tuned & result shared by many programs, less rewrite • Cloud HW based on Web Server “commodity” x86 servers, distributed storage on nodes assuming REST API access • HPC HW aggressively adopts new technologies such a s GPUs, focused on ultimate performance at higher cost, shared storage to support legacy apps Various convergence research efforts underway but no realistic converged SW Stack yet C GI6E 8CF FG A CE B + // CE CI • Cloud focused on databases and data manipulation workflow • HPC focused on compute kernels, even for data processing. Jobs scales to thousands of jobs, thus debugging and performance tuning • Cloud requires purpose-specific computing/data environment as well as their mutual isolation & security • HPC requires environment for fast & lean use of resources, but on modern machines require considerable system software support
  • 7. Characteristics of Deep Learning I/O Workloads • Small READ I/O to Huge Files – Native Media Files such as Jpeg, Wav, Text, etc. – Painful I/O patterns for shared parallel file systems – Increasing Metadata Operation • Randomness is essentially important – Runtime Data Augmentation for Training Accuracy • Random Crop • Random Horizontal Flip • Normalization, etc. 7 Training Dataset to ImageNet1K
  • 8. • Expected Power Limit of Real HPC Systems – Multi MW for systems – 60 70 kW per rack • General-purpose Data Centers – Multi kW per rack – Lack of power density for HPC → How to introduce supercomputer cooling technologies to data centers üPower Efficient, Thermal Density, Maintainability, etc. üIn-expensive Build 8 AIST AI Cloud Air Cooling TSUBAME-KFC Liquid Immersion Cooling #3 Green500 2017 June TSUBAME2 Water Cooling #2 Green500 2010 Nov #1 Green500 2013 Nov
  • 9. • Accelerating AI/Big Data R&D – Dedicated HPC Infra converged w/ AI/Big Data • Extreme Computing Power • Ultra High Bandwidth Low Latency in Memory, Network, and Storage • Extreme Green • Bridging AI Tech through Common Platform – For AI R&D, Apps, Services, and Infrastructure Designs – Aiming fast tech transfer through industry and academia collaboration – De facto Standard Software • Cloud-Datacenter-ready Infrastructure, but Supercomputing – Supercomputer Cooling Technologies to Datacenter – Inexpensive Facilities, etc. 9
  • 10. AI Bridging Cloud Infrastructure as World’s First Large-scale Open AI Infrastructure • Open, Public, and Dedicated infrastructure for AI/Big Data • Platform to accelerate joint academic-industry R&D for AI in Japan • Top-level compute capability w/ 0.550EFlops(AI), 37.2 PFlops(DP) ( )2 5 0.1 ,. 10 5 ## 0 # 10 Univ. Tokyo Kashiwa II Campus Operation Scheduled in 2018
  • 11. • 1088x compute nodes w/ 4352x NVIDIA Tesla V100 GPUs, 476TiB of Memory, 1.6PB of NVMe SSDs, 22PB of HDD-based Storage and Infiniband EDR network • Ultra-dense IDC design from the ground-up w/ 20x thermal density of standard IDC • Extreme Green w/ ambient warm liquid cooling, high-efficiency power supplies, etc., commoditizing supercomputer cooling technologies to clouds ( 2.3MW, 70kW/rack) 11 Gateway and Firewall Computing Nodes: 0.550 EFlops(HP), 37 PFlops(DP) 476 TiB Mem, 1.6 PB NVMe SSD Storage: 22 PB GPFS High Performance Computing Nodes (w/GPU) x1088 • Intel Xeon Gold6148 (2.4GHz/20cores) x2 • NVIDIA Tesla V100 (SXM2) x 4 • 384GiB Memory, 1.6TB NVMe SSD Multi-platform Nodes (w/o GPU) x10 • Intel Xeon Gold6132 (2.6GHz/14cores) x2 • 768GiB Memory, 3.8TB NVMe SSD Interactive Nodes DDN SFA14K (w/ SS8462 Enclosure x 10) x 3 • 12TB 7.2Krpm NL-SAS HDD x 2400 • 3.84TB SAS SSD x 216 • NSD Servers x 12 Object Storage for Protocol Nodes 100GbE Service Network (10GbE) External Networks SINET5 Interconnect (Infiniband EDR)
  • 12. ABCI: AI Bridging Cloud Infrastructure 12 System (32 Racks)Rack (17 Chassis) Compute Node (4GPUs, 2CPUs) Chips (GPU, CPU) Node Chassis (2 Compute Nodes) NVIDIA Tesla V100 (16GB SMX2) 3.72 TB/s MEM BW 384 GiB MEM 200 Gbps NW BW 1.6TB NVMe SSD 1.16 PFlops(DP) 17.2 PFlops (AI) 37.2 PFlops(DP) 0.550 EFlops (AI) 68.5 PFlops(DP) 1.01 PFlops (AI) 34.2 TFlops(DP) 506 TFlops (AI) GPU: 7.8 TFlops(DP) 125 TFlops (AI) CPU: 1.53 TFlops(DP) 3.07 TFlops (AI) Intel Xeon Gold 6148 (27.5M Cache, 2.40 GHz, 20 Core) 0.550 EFlops(AI), 37.2 PFlops(DP) 19.88 PFlops(Peak), Ranked #5 Top500 June 2018 131TB/s MEM BW Full Bisection BW within Rack 70kW Max 1088 Compute Nodes 4352 GPUs 4.19 PB/s MEM BW 1/3 of Oversubscription BW 2.3MW
  • 13. GPU Compute Nodes • NVIDIA TESLA V100 (16GB, SXM2) x 4 • Intel Xeon Gold 6148 x 2 Sockets – 20 cores per Socket • 384GiB of DDR4 Memory • 1.6TB NVMe SSD x 1 – Intel DC P4600 u.2 • EDR Infiniband HCA x 2 – Connected to other Compute Notes and Filesystems 13 Xeon Gold 6148 Xeon Gold 6148 10.4GT/s x3DDR4-2666 32GB x 6 DDR4-2666 32GB x 6 128GB/s 128GB/s IB HCA (100Gbps)IB HCA (100Gbps) NVMe UPI x3 x48 switch x64 switch Tesla V100 SXM2 Tesla V100 SXM2 Tesla V100 SXM2 Tesla V100 SXM2 PCIe gen3 x16 PCIe gen3 x16 PCIe gen3 x16 PCIe gen3 x16 NVLink2 x2
  • 14. Rack as Dense-packaged “Pod” ( AB < ) 1 6 ) 0 BC / 0 BC ,3 0G < F BA -7 BH<D G D CF BA -7 FB <IF<DA CB ) 7 7 4 I 7 F<D , D 2 D .BB A 14Pod #1 LEAF#1 (SB7890) LEAF#2 (SB7890) LEAF#3 (SB7890) LEAF#4 (SB7890) SPINE#1 (CS7500) SPINE#2 (CS7500) CX40 0#1 CX2570#1 CX2570#2 CX40 0#2 CX2570#3 CX2570#4 CX40 0#3 CX2570#5 CX2570#6 CX40 0#17 CX2570#33 CX2570#34 FBB#1 (SB7890) FBB#2 (SB7890) FBB#3 (SB7890) 1/3 Oversubscription BW IB-EDR x 24 Full bisection BW IB-EDR x 72 InfiniBand EDR x1 InfiniBand EDR x6 InfiniBand EDR x4 x 32 pods
  • 15. Hierarchical Storage Tiers • Local Storage – 1.6 TB NVMe SSD (Intel DC P4600 u.2) per Node – Local Storage Aggregation w/ BeeOnd • Parallel Filesystem – 22PB of GPFS • DDN SFA14K ( w/ SS8462 Enclosure x 10) x 3 set • Bare Metal NSD servers and Flash-based Metadata Volumes for metadata operation acceleration – Home and Shared Use • Object Storage – Part of GPFS using OpenStack Swift – S3-like API Access, Global Shared Use – Additional Secure Volumes w/ Encryption (Planned) 15 Parallel Filesystem Local Storage as Burst Buffers Object Storage as Campaign Storage
  • 16. 16 Better• Environments – ABCI 64 nodes (256 GPUs) – Framework: ChainerMN v1.3.0 • Chainer 4.2.0, Cupy 4.2.3, mpi4py 3.0.0, Python 3.6.5 – Baremetal • CentOS 7.4, gcc-4.8.5, CUDA 9.2, CuDNN 7.1.4, NCCL2.2, OpenMPI 2,1.3 • Settings – Dataset: Imagenet-1K – Model: ResNet-50 – Training: • Batch size: 32 per GPU, 32 x 256 in total • Learning Rate: Starting 0.1 and x0.1 at 30, 60, 80 epoch w/ warm up scheduling • Optimization: Momentum SGD (momentum=0.9) • Weight Decay: 0.0001 • Training Epoch: 100
  • 17. .F F A A A 5L I F L I .FFCA 6 FCF A B I B 1A M A R I FL R LAC A 5L: A C 6FC I M F 4 B P I B P I B FI .0 3FM I P ( 2 P 2 FI .0 :A I 1A LA .FFCA P I CF B ) B I B P / .FAC 7 A B I B P 2 A F C 17 19m x 24m x 6m 計算 8 計算 7 計算 6 計算 5 計算 4 計算 3 計算 2 計算 1 計算 16 計算 15 計算 14 計算 13 計算 12 計算 11 計算 10 計算 9 計算 23 計算 22 計算 21 計算 20 計算 28 計算 27 計算 19 計算 18 計算 17 計算 32 計算 31 計算 30 計算 29 IB 1 IB2 計算 26 計算 25 計算 24 STR G38 STR G37 STR G36 STR G35 STR G34 その 他 MPF 39 MP F4 STR G33 ラックレイアウト案IBスイッチラックをUPSゾーンに設置した場合 High Voltage Transform ers (3.25M W ) Passive Cooling Tower Free cooling Cooling Capacity: 3MW Lithium Battery (Plan) 1MWh, 1MVA Active Chillers Cooling Capacity: 200kW Future Expansion Space UPS 72 Racks w/ Hybrid Cooling 18 Racks w/ UPS
  • 18. Server Room Cooling • Skeleton frame structure for computer racks, water circuit, air-cooling units, and cables • 70kW/rack cooling capacity 60kW liquid (32 C water) + 10kW air 18 19 or 23 inch Rack (48U) ComputingServer Hot Water Circuit Front side Air CDU Coolability 10kW Water Hot Aisle Capping 35 C Water Block (CPU or/and Accelerator, etc.) Air 40 C 32 C 40 C Cold Water Circuit Fan Coil Unit Coolability 10kW Cold Aisle 35 C Hot Aisle 40 C Capping Wall Capping Wall Fan Coil Unit Server Rack W ater Circuit Fan Coil Unit Server Rack Hot Aisle
  • 19. Software Stuck for ABCI • Batch Job Scheduler – High throughput computing • Minimum Pre-installed Software – Users can deploy their environments using anaconda, pip, python venv, etc. – Reduce operational cost • Container Support – Singularity for multi-node jobs w/ user customized images – Docker for single-node jobs w/ site-certified images 19 User Applications DL Frameworks Python, Ruby, R, Java, Scala, Perl, Lua, etc. ISV AppsOSSHadoop/ Spark GCC PGI OpenMPI MVAPICH2 Intel Parallel Studio XE Cluster Edition CUDA/CUDNN/NCCL GPFS BeeOND OpenStack Swift Univa Grid Engine Singularity Docker CentOS/RHEL
  • 20. • Singularity – HPC Linux Container developed at LBNL and Sylabs Inc. • http://singularity.lbl.gov/ • https://github.com/singularityware/singularity – Rootless program execution, storage access • No daemons w/ root privilege like Docker – Computing resources are managed by job schedulers such as UGE • Root privilege is given by commands w/ setuid – Mount, Linux namespace creation, Bind paths between host and container – Existing Docker images available via DockerHub – HPC software stacks available • CUDA, MPI, Infiniband, etc. 20
  • 21. How to Create Container Images for Distributed DL • Host – Mount native device drivers and system libraries to containers • GPU, Infiniband, etc. • cf. nvidia-docker • Containers – Install user space libraries • CUDA, CuDNN, NCCL2, ibverbs, MPI, etc. – Install distributed deep learning frameworks • Optimized build for underlying computing resources 21 Base Drivers, Libraries on Host CUDA Drivers Infiniband Drivers Filesystem Libraries (GPFS, Lustre) Userland Libraries on Container CUDA CuDNN NCCL2 MPI (mpi4py) Mount ibverbs Distributed Deep Learning Frameworks Caffe2 ChainerMNDistributed TensorflowMXNet
  • 22. AICloudModules: Container Image Collection for AI • Base Images – Optimized OS Images for AI Cloud • Including CUDA,CuDNN, NCCL, MPI, etc., Libraries • Excluding CUDA, Infiniband, GPFS etc., native drivers on host DL Images • DL Images – Installed Deep Learning Frameworks and AI Applications based on Base OS Images 22 Base Images Base OS Images incl. CUDA, CuDNN, NCCL2, MPI, Infiniband DL Images Images for Deep Learning Frameworks based on HPCBase CentOSUbuntu Base Drivers, Libraries on Host GPU Infiniband GPFSMPI Caffe2 ChainerMNDistributed TensorflowMXNet CNTK https://hub.docker.com/r/aistairc/aaic/ as a prototype
  • 23. • Accommodate open ecosystems of data and software for AI R&D – Sharable/Reusable/Reproducible algorithm methods, training data w/ labels, trained models, etc. • Creating various social applications that use AI (ML, DL, etc.) 23 X Y Training Dataset Public Data on Internet Deep Learning Framework Data Collection and Cleansing Modeling Learning Open Data in Enterprise Open Data in Government Trained Model Inference Publish Trained Models to Public to apply various domains Prepare Training Datasets Trial and error for designing networks, parameters Accelerate learning with HPC to improve accuracy of models Reuse Trained Model Implementing AI Techs to Society Retail Stores Factories Logistics Centers Object Detection and Tracking
  • 24. Summary • Issues for Public AI Infrastructures – Accelerating AI R&D – Bridging AI Tech through Common Platform – Cloud-Datacenter-ready Infrastructure, but Supercomputing • ABCI (AI Bridging Cloud Infrastructure) – 0.550 EFlops(AI), 37.2 PFlops(DP), 19.88 PFlops(Peak) #5 Top500 June 2018 – Open, Public, Dedicated Infrastructure for AI/Big Data – A Reference for AI/Big Data and HPC System 24
  • 25. 25