SlideShare a Scribd company logo
1 of 15
Download to read offline
On the benchmark of Chainer
2016年7⽉2⽇
Chainer Meetup #3@Dowango Seminar Room
Preferred Networks Inc.
Kenta Oono
oono@preferred.jp
Self Introduction
• Kenta Oono (twitter: @delta2323_)
– Bio. : MSc@MathSci Univ. Tokyo → 2012.4 PFI → 2014.10 PFN
– Role: BioHealthcare project, Chainer dev. team, etc.
– blog: http://delta2323.github.io
• Recent activity
– Study meetup (NIPS2014, ICML2015, NIPS2015)
– Several articles and talks on Deep Learning
7⽉21⽇ ICML2016読み会
@ドワンゴセミナールーム
What is Benchmark?
• Metrics that evaluate the performance of frameworks
– elapsed time, memory consumption, easiness of use etc.
• Related to, but different from profiling
– Profiling needs finer information of frameworks, possibly at the cost of
performance
– Benchmarking measures the overall behavior of frameworks
• For framework developers:
– provides suggestion for further enhancement of the framework
– provides objective comparison with other frameworks
• For framework users:
– provides better choice of frameworks that satisfies their needs
Example: convnet-benchmarks
• Author: Soumith Chintala(Facebook AI Research)
• Measures latencies of convolutional neural networks
• Provides objective comparison across various frameworks
• Metric
– Elapsed time of forward and backward propagation
• Architecture
– AlexNet-OWT / Overfeat / VGG-A / GoogleNet
– Single 2D convolution layer of various sizes
• Frameworks
– Torch, neon, TenforFlow, fbfft (Torch), Chainer, cudaconvnet2, Caffe,
CL-nn, Caffe-CL GreenTea etc.
convnet-benchmark
Basics of measurement of kernel execution
• We cannot measure GPU execution
time as CPU because launch of
kernels is asynchronous !
clock_t start, end;
start = clock();
// launch kernel
end = clock();
elapsed_time = end - start;
CPU GPU
kernel
exec.
clock
kick
clock kernel
exec.
Basics of measurement of kernel execution
• We can measure the kernel execution time
by inserting two events at the start and end
of the launch.
float elapsed=0;
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
// launch the kernel
cudaEventRecord(stop, 0);
cudaEventSynchronize (stop);
cudaEventElapsedTime(
&elapsed, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
CPU GPU
record
kick
record
kernel
exec.
kernel
exec.
Event
Eventsync
Measurement of single Chainer Function Execution
• Suppose GPU impl. of F.f consists of
Python parts and single GPU kernel.
• Elapsed time calculates kernel
execution in this case.
CPU GPU
Python
kick
record
kernel
exec.
kernel
exec.
Event
Eventsync
Python
record
start = cupy.cuda.Event()
end = cupy.cuda.Event()
start.record()
y = F.f(x) # forward prop
end.record()
end.synchronize()
cupy.cuda.get_elapsed_time(
start, end)
F.f
Measurement of single Chainer Function Execution
• Suppose
– no other kernels are waiting in the
queue
– Python overhead is large
– the kernel is light
• get_elapsed_time equals to whole
execution time including Python code.
CPU GPU
Python
kick
record
kernel
exec.
Event
Event
sync
Python
record
start = cupy.cuda.Event()
end = cupy.cuda.Event()
start.record()
y = F.f(x) # forward prop
end.record()
end.synchronize()
cupy.cuda.get_elapsed_time(
start, end)
Measurement of single Chainer Function Execution
• In general, the elapsed time between two
events are different from what we measured
in the two previous situations.
• What we really measure depends on
– the status of the waiting queue
– the amounts of Python code and kernel
CPU GPU
Python
kick
record
kernel
exec.
Event
Event
sync
Python
record
kernel
exec.
start = cupy.cuda.Event()
end = cupy.cuda.Event()
start.record()
y = F.f(x) # forward prop
end.record()
end.synchronize()
cupy.cuda.get_elapsed_time(
start, end)
Synchronization before start Event
• It ensures the start Event point is
right before the execution of
Python code.
• But the timing of end Event is still
undetermined.
start = cupy.cuda.Event()
end = cupy.cuda.Event()
start.record()
start.synchronize()
y = F.f(x) # forward prop
end.record()
end.synchronize()
cupy.cuda.get_elapsed_time(
start, end)
CPU GPU
kick
record
kernel
exec.
Event
kernel
exec.
sync
Python
・・・
・・・
Measurement of multi-layered NNs
• Should we insert synchronization points
before all function executions?
• But it exposes Python code that should
have been hidden if it were not for the
synchronization.
• I guess this is the reason why convnet-
benchmarks offers the architectures that
consist of single convolution layer.
CPU GPU
kick
record
kernel
exe-
cution
Event
kernel
exe-
cution
sync
Python
record
Python
kick
record
sync
Python
record
Python
Event
Event
kernel
exe-
cution
It would be hidden by the
kernel execution if we did
not measure elapsed times.
Tentative solution (Timer class: PR #1249)
• Offers start and stop methods for measuring lap times.
• Three patterns for synchronization before measurement by
blocking_methodargument
– block_every_time: synchronizes every start events
– block_first_time: synchronizes only first start event
– non_block: does not synchronize at the start of measurement
• When we get the total time, Timer class implicitly call synchronize
method.
• synchronize method synchronizes all Events inserted by start and
stop and calculates lap times lazily.
• Once synchronize is invoked, the timer CANNOT accumulate lap times
until it is reset.
DeepMark aurthored by Soumith Chintala
• Comparison with convnet-benchmarks
– Not only image recognition but also various use cases
– Relatively newer architectures are employed
– Multi-GPU evaluation will be supported (planned)
• Many details of specifications are under discussion.
• Architectures (planned)
– Images : InceptionV3-batchnorm / Alexnet-OWT / VGG-D / ResNet-50
– Video: C3D
– Audio: DeepSpeech2 / MSR's 5 layer FC
– Text: Small RNN LSTM / Large RNN LSTM
• Chainer support (delta2323/chainer-deepmark)
– Not all features are supported (see issue for details)
Conclusion
• Measurement of elapsed time of multi-layered NNs have many things to be
considered.
• We will participate in DeepMark, a general-purpose deep learning
benchmarks.
• Many criteria are to be measured:
– Elapsed time <- Today’s topic
– Memory consumption
– etc…
We are hiring!

More Related Content

What's hot

Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R Kai Lichtenberg
 
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon GarciaOpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon GarciaOpenNebula Project
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & ProfilingIsuru Perera
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsKusano Hitoshi
 
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...Continuent
 
Galaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWSGalaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWSEnis Afgan
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notificationMahendra M
 
Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Niketan Pansare
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsBrendan Gregg
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 
From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceSpyros Trigazis
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 

What's hot (20)

Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R
 
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon GarciaOpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
 
Docker meetup
Docker meetupDocker meetup
Docker meetup
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
 
Java 8 고급 (6/6)
Java 8 고급 (6/6)Java 8 고급 (6/6)
Java 8 고급 (6/6)
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
 
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
 
Galaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWSGalaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWS
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notification
 
Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
Js on-microcontrollers
Js on-microcontrollersJs on-microcontrollers
Js on-microcontrollers
 
ChainerX and How to Take Part
ChainerX and How to Take PartChainerX and How to Take Part
ChainerX and How to Take Part
 
From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container service
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 

Viewers also liked

深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02Yuta Kashino
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA Japan
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたsamacoba1983
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例Yahoo!デベロッパーネットワーク
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstationYusuke HIDESHIMA
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門Yuya Unno
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例nlab_utokyo
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp TalkYusuke Oda
 
Chainer meetup
Chainer meetupChainer meetup
Chainer meetupkikusu
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5Seiya Tokui
 
Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Jun-ya Norimatsu
 
Chainer meetup20151014
Chainer meetup20151014Chainer meetup20151014
Chainer meetup20151014Jiro Nishitoba
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデルYuta Kashino
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Jun-ya Norimatsu
 
Chainer入門と最近の機能
Chainer入門と最近の機能Chainer入門と最近の機能
Chainer入門と最近の機能Yuya Unno
 
Lighting talk chainer hands on
Lighting talk chainer hands onLighting talk chainer hands on
Lighting talk chainer hands onOgushi Masaya
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution GuideKenta Oono
 
ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)Motoki Sato
 

Viewers also liked (20)

深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
Deep parking
Deep parkingDeep parking
Deep parking
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp Talk
 
Chainer meetup
Chainer meetupChainer meetup
Chainer meetup
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)
 
Chainer meetup20151014
Chainer meetup20151014Chainer meetup20151014
Chainer meetup20151014
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
 
Chainer入門と最近の機能
Chainer入門と最近の機能Chainer入門と最近の機能
Chainer入門と最近の機能
 
Lighting talk chainer hands on
Lighting talk chainer hands onLighting talk chainer hands on
Lighting talk chainer hands on
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
 
ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)
 

Similar to On the benchmark of Chainer

Document 14 (6).pdf
Document 14 (6).pdfDocument 14 (6).pdf
Document 14 (6).pdfRajMantry
 
Nt1330 Final Paper
Nt1330 Final PaperNt1330 Final Paper
Nt1330 Final PaperTraci Webb
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
CPU scheduling in Operating System Explanation
CPU scheduling in Operating System ExplanationCPU scheduling in Operating System Explanation
CPU scheduling in Operating System ExplanationAnitaSofiaKeyser
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Adam Dunkels
 
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...CODE BLUE
 
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal BootRenesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal Bootandrewmurraympc
 
Module 3-cpu-scheduling
Module 3-cpu-schedulingModule 3-cpu-scheduling
Module 3-cpu-schedulingHesham Elmasry
 
A Project Run@Timer, J2SE,
A Project Run@Timer, J2SE,A Project Run@Timer, J2SE,
A Project Run@Timer, J2SE,Swapnil Dubey
 
Cpu performance matrix
Cpu performance matrixCpu performance matrix
Cpu performance matrixRehman baig
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and pythonChetan Giridhar
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLPeace Lee
 
Measurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetMeasurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetVasyl Senko
 
Process scheduling
Process schedulingProcess scheduling
Process schedulingHao-Ran Liu
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 

Similar to On the benchmark of Chainer (20)

Document 14 (6).pdf
Document 14 (6).pdfDocument 14 (6).pdf
Document 14 (6).pdf
 
AMC Minor Technical Issues
AMC Minor Technical IssuesAMC Minor Technical Issues
AMC Minor Technical Issues
 
Nt1330 Final Paper
Nt1330 Final PaperNt1330 Final Paper
Nt1330 Final Paper
 
Section05 scheduling
Section05 schedulingSection05 scheduling
Section05 scheduling
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
exp 3.docx
exp 3.docxexp 3.docx
exp 3.docx
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
CPU scheduling in Operating System Explanation
CPU scheduling in Operating System ExplanationCPU scheduling in Operating System Explanation
CPU scheduling in Operating System Explanation
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
 
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
 
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal BootRenesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
 
Module 3-cpu-scheduling
Module 3-cpu-schedulingModule 3-cpu-scheduling
Module 3-cpu-scheduling
 
Os2
Os2Os2
Os2
 
A Project Run@Timer, J2SE,
A Project Run@Timer, J2SE,A Project Run@Timer, J2SE,
A Project Run@Timer, J2SE,
 
Cpu performance matrix
Cpu performance matrixCpu performance matrix
Cpu performance matrix
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGL
 
Measurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetMeasurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNet
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 

More from Kenta Oono

Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Kenta Oono
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Kenta Oono
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介Kenta Oono
 
20170422 数学カフェ Part2
20170422 数学カフェ Part220170422 数学カフェ Part2
20170422 数学カフェ Part2Kenta Oono
 
20170422 数学カフェ Part1
20170422 数学カフェ Part120170422 数学カフェ Part1
20170422 数学カフェ Part1Kenta Oono
 
情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノートKenta Oono
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionKenta Oono
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1Kenta Oono
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
 
Introduction to Chainer and CuPy
Introduction to Chainer and CuPyIntroduction to Chainer and CuPy
Introduction to Chainer and CuPyKenta Oono
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMCKenta Oono
 
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 Kenta Oono
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Kenta Oono
 
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料Kenta Oono
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについてKenta Oono
 
Chainerインストール
ChainerインストールChainerインストール
ChainerインストールKenta Oono
 
Caffeインストール
CaffeインストールCaffeインストール
CaffeインストールKenta Oono
 

More from Kenta Oono (20)

Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
20170422 数学カフェ Part2
20170422 数学カフェ Part220170422 数学カフェ Part2
20170422 数学カフェ Part2
 
20170422 数学カフェ Part1
20170422 数学カフェ Part120170422 数学カフェ Part1
20170422 数学カフェ Part1
 
情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introduction
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
Introduction to Chainer and CuPy
Introduction to Chainer and CuPyIntroduction to Chainer and CuPy
Introduction to Chainer and CuPy
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMC
 
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)
 
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについて
 
Chainerインストール
ChainerインストールChainerインストール
Chainerインストール
 
Caffeインストール
CaffeインストールCaffeインストール
Caffeインストール
 

Recently uploaded

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 

Recently uploaded (20)

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 

On the benchmark of Chainer

  • 1. On the benchmark of Chainer 2016年7⽉2⽇ Chainer Meetup #3@Dowango Seminar Room Preferred Networks Inc. Kenta Oono oono@preferred.jp
  • 2. Self Introduction • Kenta Oono (twitter: @delta2323_) – Bio. : MSc@MathSci Univ. Tokyo → 2012.4 PFI → 2014.10 PFN – Role: BioHealthcare project, Chainer dev. team, etc. – blog: http://delta2323.github.io • Recent activity – Study meetup (NIPS2014, ICML2015, NIPS2015) – Several articles and talks on Deep Learning 7⽉21⽇ ICML2016読み会 @ドワンゴセミナールーム
  • 3. What is Benchmark? • Metrics that evaluate the performance of frameworks – elapsed time, memory consumption, easiness of use etc. • Related to, but different from profiling – Profiling needs finer information of frameworks, possibly at the cost of performance – Benchmarking measures the overall behavior of frameworks • For framework developers: – provides suggestion for further enhancement of the framework – provides objective comparison with other frameworks • For framework users: – provides better choice of frameworks that satisfies their needs
  • 4. Example: convnet-benchmarks • Author: Soumith Chintala(Facebook AI Research) • Measures latencies of convolutional neural networks • Provides objective comparison across various frameworks • Metric – Elapsed time of forward and backward propagation • Architecture – AlexNet-OWT / Overfeat / VGG-A / GoogleNet – Single 2D convolution layer of various sizes • Frameworks – Torch, neon, TenforFlow, fbfft (Torch), Chainer, cudaconvnet2, Caffe, CL-nn, Caffe-CL GreenTea etc.
  • 6. Basics of measurement of kernel execution • We cannot measure GPU execution time as CPU because launch of kernels is asynchronous ! clock_t start, end; start = clock(); // launch kernel end = clock(); elapsed_time = end - start; CPU GPU kernel exec. clock kick clock kernel exec.
  • 7. Basics of measurement of kernel execution • We can measure the kernel execution time by inserting two events at the start and end of the launch. float elapsed=0; cudaEvent_t start, stop; cudaEventCreate(&start); cudaEventCreate(&stop); cudaEventRecord(start, 0); // launch the kernel cudaEventRecord(stop, 0); cudaEventSynchronize (stop); cudaEventElapsedTime( &elapsed, start, stop); cudaEventDestroy(start); cudaEventDestroy(stop); CPU GPU record kick record kernel exec. kernel exec. Event Eventsync
  • 8. Measurement of single Chainer Function Execution • Suppose GPU impl. of F.f consists of Python parts and single GPU kernel. • Elapsed time calculates kernel execution in this case. CPU GPU Python kick record kernel exec. kernel exec. Event Eventsync Python record start = cupy.cuda.Event() end = cupy.cuda.Event() start.record() y = F.f(x) # forward prop end.record() end.synchronize() cupy.cuda.get_elapsed_time( start, end) F.f
  • 9. Measurement of single Chainer Function Execution • Suppose – no other kernels are waiting in the queue – Python overhead is large – the kernel is light • get_elapsed_time equals to whole execution time including Python code. CPU GPU Python kick record kernel exec. Event Event sync Python record start = cupy.cuda.Event() end = cupy.cuda.Event() start.record() y = F.f(x) # forward prop end.record() end.synchronize() cupy.cuda.get_elapsed_time( start, end)
  • 10. Measurement of single Chainer Function Execution • In general, the elapsed time between two events are different from what we measured in the two previous situations. • What we really measure depends on – the status of the waiting queue – the amounts of Python code and kernel CPU GPU Python kick record kernel exec. Event Event sync Python record kernel exec. start = cupy.cuda.Event() end = cupy.cuda.Event() start.record() y = F.f(x) # forward prop end.record() end.synchronize() cupy.cuda.get_elapsed_time( start, end)
  • 11. Synchronization before start Event • It ensures the start Event point is right before the execution of Python code. • But the timing of end Event is still undetermined. start = cupy.cuda.Event() end = cupy.cuda.Event() start.record() start.synchronize() y = F.f(x) # forward prop end.record() end.synchronize() cupy.cuda.get_elapsed_time( start, end) CPU GPU kick record kernel exec. Event kernel exec. sync Python ・・・ ・・・
  • 12. Measurement of multi-layered NNs • Should we insert synchronization points before all function executions? • But it exposes Python code that should have been hidden if it were not for the synchronization. • I guess this is the reason why convnet- benchmarks offers the architectures that consist of single convolution layer. CPU GPU kick record kernel exe- cution Event kernel exe- cution sync Python record Python kick record sync Python record Python Event Event kernel exe- cution It would be hidden by the kernel execution if we did not measure elapsed times.
  • 13. Tentative solution (Timer class: PR #1249) • Offers start and stop methods for measuring lap times. • Three patterns for synchronization before measurement by blocking_methodargument – block_every_time: synchronizes every start events – block_first_time: synchronizes only first start event – non_block: does not synchronize at the start of measurement • When we get the total time, Timer class implicitly call synchronize method. • synchronize method synchronizes all Events inserted by start and stop and calculates lap times lazily. • Once synchronize is invoked, the timer CANNOT accumulate lap times until it is reset.
  • 14. DeepMark aurthored by Soumith Chintala • Comparison with convnet-benchmarks – Not only image recognition but also various use cases – Relatively newer architectures are employed – Multi-GPU evaluation will be supported (planned) • Many details of specifications are under discussion. • Architectures (planned) – Images : InceptionV3-batchnorm / Alexnet-OWT / VGG-D / ResNet-50 – Video: C3D – Audio: DeepSpeech2 / MSR's 5 layer FC – Text: Small RNN LSTM / Large RNN LSTM • Chainer support (delta2323/chainer-deepmark) – Not all features are supported (see issue for details)
  • 15. Conclusion • Measurement of elapsed time of multi-layered NNs have many things to be considered. • We will participate in DeepMark, a general-purpose deep learning benchmarks. • Many criteria are to be measured: – Elapsed time <- Today’s topic – Memory consumption – etc… We are hiring!