SlideShare a Scribd company logo
1 of 77
Download to read offline
Using Raspberry Pi GPU
for Deep Neural Network
2017/9/3
Noriyuki OHKAWA
• Advantage
• Data transfer cost
DNN prediction on end devices
DNN prediction on end devices
• Disadvantage
• Poor computing resources
How to implement on end device ?
• Reduce Parameter (include approximation)
• Binary Net
• Parameter Quantization
• Learning Sparse Matrix
• Software
• Fast Convolution Algorithm / Fusion
• Hardware
• FPGA / ASIC
In this presentation
• Software Approach on Raspberry Pi Series
• Use Raspberry Pi GPU efficiently
• There is no..
• Hardware Approach
• Approximation and Re-training
Raspberry Pi GPU
Raspberry Pi CPU/GPU Spec.
Pi 3 Pi Zero/W
CPU ARM Cortex-A53
Quad Core 1.2Ghz
ARM1176JZF-S
Single Core 1Ghz
GPU Broadcom VideoCore IV
400MHz
Broadcom VideoCore IV
250MHz
Single Precision flops (theoretical)
Pi 3 Pi Zero/W
CPU 38.4 Gflops
(1.1 Gflops/$)
1 Gflops
(0.1-0.2 Gflops/$)
GPU 38.4 Gflops
(1.1 Gflops/$)
24 Gflops
(2.4-4.8 Gflops/$)
Architecture Overview
Architecture Overview
QPU / Quad Processing Unit
QPU / Quad Processing Unit
• general purpose register A/B x32 (= 64 register)
• accumulation register r0,r1,r2,r3 (= 4 register)
• see. Architecture Reference Guide
Architecture Overview
TMU / Texture and Memory Lookup Unit
TMU / Texture and Memory Lookup Unit
TMU / Texture and Memory Lookup Unit
Architecture Overview
Uniforms Cache
Uniforms Cache
Uniforms Cache
Architecture Overview
VPM / Vertex Pipe Memory
Efficient Data Transfer
Read GPU Memory from CPU
SLOW
SHOULD compute every layer by GPU
Conv2D in GPU
ReLU in CPU
Conv2D in GPU
Conv2D in GPU
ReLU in GPU
Conv2D in GPU
GPU Mem Read
Architecture Overview
EXCLUSIVE
Architecture Overview
SGEMM
References
• Raspberry PiでGPGPU
• Raspberry PiのGPUで行列乗算(その1)
• Raspberry PiのGPUで行列乗算(その2)
Inner/Direct Product
Inner/Direct Product
Direct Product
Direct Product
Code Example
• PyVideoCore
rotate(broadcast, r1, 0)
fmul(r3, r4, r5)
fadd(ra0, ra0, r3)
rotate(broadcast, r1, 1)
fmul(r3, r4, r5)
fadd(ra1, ra1, r3)
rotate(broadcast, r1, 2)
fmul(r3, r4, r5)
fadd(ra1, ra1, r3)
r5 has
broadcast
result
Code Example
• PyVideoCore
rotate(broadcast, r1, 0)
fmul(r3, r4, r5)
fadd(ra0, ra0, r3)
rotate(broadcast, r1, 1)
fmul(r3, r4, r5)
fadd(ra1, ra1, r3)
rotate(broadcast, r1, 2)
fmul(r3, r4, r5)
fadd(ra1, ra1, r3)
Code Example
• PyVideoCore
rotate(broadcast, r1, 0)
fmul(r3, r4, r5)
rotate(broadcast, r1, 1)
fadd(ra0, ra0, r3).fmul(r3, r4, r5)
rotate(broadcast, r1, 2)
fadd(ra1, ra1, r3).fmul(r3, r4, r5)
rotate(broadcast, r1, 3)
fadd(ra2, ra2, r3).fmul(r3, r4, r5)
rotate(broadcast, r1, 4)
Practical Limit for DNN
Pi 3 Pi Zero/W
CPU ? Gflops ? Gflops
GPU 38.4→19.2 Gflops 24→12 Gflops
Convolution(2D)
Rejected Algorithms
• im2col
• TMU enable equivalent load
• Winograd
• increase data transfer > reduce operation
• Not enough register
• Direct (NCHW → NHWC, im2col equiv. TMU load)
• NHWC has bad data locality for next layer
Direct (NCHW → NHWC)
Accepted Algorithms
• Direct (NCHW → NCHW)
• DMA store with Transpose(C,HW)
• for small image (HxW < 2048)
• Direct (NCHW → NHCW)
• DMA store with Transpose(C,W)
Specialize for Kernel Size
• 3x3 with stride = 1
• Best performance
• Pi3: 18.5 Gflops
• 48% of theoretical limit (96% of practical limit)
• 1x1
• 1xK
• Kx1
• KxK
Specialize for Output Shape
• DMA Transfer Block Size
• HxWxC = 2x16x16
• HxWxC = 4x8x16
• for small image
• HxWxC = 2x14x16+2x16x16
• OverLap-Add Method
• 3x3 only
2x16x16 = 4x8x16 = use 32 general purpose register for accumulation
Other 32 registers have convolution parameters.
Combination
• Using 22 specialized implementations
• NCHW→NCHW / NCHW→NHCW
• 2x14x16+
2x16x16
2x16x16 4x8x16
3x3 Use Use Use
1x1 - Use Use
1xK - Use Use
Kx1 - Use Use
KxK - Use Use
Optimization
Convert Time Optimization
• Fuse Affine Transform
• ex) Convolution + BN + Scale → Convolution
• Inline (Leaky)ReLU
Instruction Golf
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
Decrement r2
Jump if non-0
Instruction Golf
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
Add Op. Only
Mul Op. Only
Mul Op. Only
Instruction Golf
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
Instruction Golf
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -14, set_flags=True).rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3, set_flags=False).fmul(r3, r4, r5)
iadd(r2, r2, -1, cond='zs').rotate(broadcast, r1, -15)
jzc(L.loop)
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
Instruction Golf
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -14, set_flags=True).rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3, set_flags=False).fmul(r3, r4, r5)
iadd(r2, r2, -1, cond='zs').rotate(broadcast, r1, -15)
jzc(L.loop)
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
Can t use 2 different Imm.Can t use 2 different Imm.
Instruction Golf
iadd(null, element_number, -13, set_flags=True).rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3, set_flags=False).fmul(r3, r4, r5)
isub(r2, r2, -14, cond='zs', set_flags=False).rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3, set_flags=False).fmul(r3, r4, r5)
iadd(r2, r2, -15, cond='zs').rotate(broadcast, r1, -15)
jzc(L.loop)
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
Instruction Golf
iadd(null, element_number, -13, set_flags=True).rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3, set_flags=False).fmul(r3, r4, r5)
isub(r2, r2, -14, cond='zs', set_flags=False).rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3, set_flags=False).fmul(r3, r4, r5)
iadd(r2, r2, -15, cond='zs').rotate(broadcast, r1, -15)
jzc(L.loop)
rotate(broadcast, r1, -13)
fadd(rb[14], rb[14], r3).fmul(r3, r4, r5)
rotate(broadcast, r1, -14)
fadd(ra[14], ra[14], r3).fmul(r3, r4, r5)
iadd(null, element_number, -15, set_flags=True).rotate(broadcast, r1, -15)
isub(r2, r2, 1, cond='zs')
jzc(L.loop)
-1
-(-14)
-15
Experiments
Performance
• GoogLeNet
• Pi3: 320ms
• Pi0: 670ms
• ResNet50
• Pi3: 820ms
• YOLO tiny
• Pi3: 580ms
1 99.58% cicada, cicala
2 0.19% cockroach, roach
3 0.06% cricket
4 0.05% grasshopper, hopper
5 0.04% leafhopper
6 0.02% lacewing, lacewing fly
7 0.01% barn spider, Araneus cavaticus
8 0.00% ground beetle, carabid beetle
9 0.00% isopod
10 0.00% mantis, mantid
Future Work
Remove Wasteful Computation
• ex) ResNet
Remove Wasteful Computation
• ex) ResNet
Remove Wasteful Computation
• ex) ResNet
Remove Wasteful Computation
• ex) ResNet
Remove Wasteful Computation
• ex) ResNet
1/4
1/4
1/4
Improve
Data Locality
Improve
Data Locality
Remove Wasteful Computation
• ex) ResNet50
• Pi3: 820ms → 720ms (by hand-made optimization)
Thank You !
Appendix 1: 3x3 Convolution
4x8x16 Case
convolution area
required area
4x8x16 Case TMU load x 4
(16x4 < 6x10)
2x16x16 Case
2x16x16 Case TMU load x 5
(16x4 < 4x18 < 16x5)
2x16x16 Case TMU load x 5
(16x4 < 4x18 < 16x5)
Over
TMU load request
queue size = 4
2x14x16+2x16x16 Case
2x14x16+2x16x16 CaseTMU load x 4
(16x4 = 4x16)
2x14x16+2x16x16 Case OverLap
2x14x16+2x16x16 Case OverLap
Appendix 2: Pooling
Max / Average Pooling
• Convolution like TMU loading
• Specialized for global pooling

More Related Content

What's hot

闇魔術を触ってみた
闇魔術を触ってみた闇魔術を触ってみた
闇魔術を触ってみたSatoshi Sato
 
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)Shinya Takamaeda-Y
 
BoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかBoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかYuki Miyatake
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介MITSUNARI Shigeo
 
Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門Fixstars Corporation
 
PythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミングPythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミングShinya Takamaeda-Y
 
M5StackをRustで動かす
M5StackをRustで動かすM5StackをRustで動かす
M5StackをRustで動かすKenta IDA
 
initとプロセス再起動
initとプロセス再起動initとプロセス再起動
initとプロセス再起動Takashi Takizawa
 
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門Fixstars Corporation
 
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
 
【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~
【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~
【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~Unity Technologies Japan K.K.
 
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』Nagi Teramo
 
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門Fixstars Corporation
 
Code igniter + ci phpunit-test
Code igniter + ci phpunit-testCode igniter + ci phpunit-test
Code igniter + ci phpunit-testME iBotch
 

What's hot (20)

闇魔術を触ってみた
闇魔術を触ってみた闇魔術を触ってみた
闇魔術を触ってみた
 
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
 
BoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかBoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうか
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門Halide による画像処理プログラミング入門
Halide による画像処理プログラミング入門
 
PythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミングPythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミング
 
プログラムを高速化する話
プログラムを高速化する話プログラムを高速化する話
プログラムを高速化する話
 
TVM VTA (TSIM)
TVM VTA (TSIM) TVM VTA (TSIM)
TVM VTA (TSIM)
 
M5StackをRustで動かす
M5StackをRustで動かすM5StackをRustで動かす
M5StackをRustで動かす
 
initとプロセス再起動
initとプロセス再起動initとプロセス再起動
initとプロセス再起動
 
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
 
ゆるバグ
ゆるバグゆるバグ
ゆるバグ
 
【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~
【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~
【Unite Tokyo 2018 Training Day】C#JobSystem & ECSでCPUを極限まで使い倒そう ~C# JobSystem 編~
 
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
 
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門
 
Code igniter + ci phpunit-test
Code igniter + ci phpunit-testCode igniter + ci phpunit-test
Code igniter + ci phpunit-test
 
競プロでGo!
競プロでGo!競プロでGo!
競プロでGo!
 

Viewers also liked

TensorFlow XLAの可能性
TensorFlow XLAの可能性 TensorFlow XLAの可能性
TensorFlow XLAの可能性 Mr. Vengineer
 
モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化Yusuke Uchida
 
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjpcocodrips
 
CTFはとんでもないものを 盗んでいきました。私の時間です…
CTFはとんでもないものを 盗んでいきました。私の時間です…CTFはとんでもないものを 盗んでいきました。私の時間です…
CTFはとんでもないものを 盗んでいきました。私の時間です…Hiromu Yakura
 
Pythonではじめる競技プログラミング
Pythonではじめる競技プログラミングPythonではじめる競技プログラミング
Pythonではじめる競技プログラミングcocodrips
 
パッケージングの今
パッケージングの今パッケージングの今
パッケージングの今Atsushi Odagiri
 
[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, StrongerDeep Learning JP
 
20170721 future of reactive architectures
20170721 future of reactive architectures20170721 future of reactive architectures
20170721 future of reactive architecturesJamie Allen
 
DeNAの機械学習・深層学習活用した 体験提供の挑戦
DeNAの機械学習・深層学習活用した体験提供の挑戦DeNAの機械学習・深層学習活用した体験提供の挑戦
DeNAの機械学習・深層学習活用した 体験提供の挑戦Koichi Hamada
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayAdam Gibson
 
バイナリニューラルネットとハードウェアの関係
バイナリニューラルネットとハードウェアの関係バイナリニューラルネットとハードウェアの関係
バイナリニューラルネットとハードウェアの関係Kento Tajiri
 
Scala の関数型プログラミングを支える技術
Scala の関数型プログラミングを支える技術Scala の関数型プログラミングを支える技術
Scala の関数型プログラミングを支える技術Naoki Aoyama
 
iOSエンジニアのためのScala入門
iOSエンジニアのためのScala入門iOSエンジニアのためのScala入門
iOSエンジニアのためのScala入門Masaya Dake
 
元インフラエンジニアが
Scalaを触ってつまづいたところ。
元インフラエンジニアが
Scalaを触ってつまづいたところ。元インフラエンジニアが
Scalaを触ってつまづいたところ。
元インフラエンジニアが
Scalaを触ってつまづいたところ。takako onoue
 
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY
 
Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Jun-ya Norimatsu
 
Chainer meetup
Chainer meetupChainer meetup
Chainer meetupkikusu
 
Chainer meetup20151014
Chainer meetup20151014Chainer meetup20151014
Chainer meetup20151014Jiro Nishitoba
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5Seiya Tokui
 

Viewers also liked (20)

TensorFlow XLAの可能性
TensorFlow XLAの可能性 TensorFlow XLAの可能性
TensorFlow XLAの可能性
 
モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化
 
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp
 
CTFはとんでもないものを 盗んでいきました。私の時間です…
CTFはとんでもないものを 盗んでいきました。私の時間です…CTFはとんでもないものを 盗んでいきました。私の時間です…
CTFはとんでもないものを 盗んでいきました。私の時間です…
 
Pythonではじめる競技プログラミング
Pythonではじめる競技プログラミングPythonではじめる競技プログラミング
Pythonではじめる競技プログラミング
 
パッケージングの今
パッケージングの今パッケージングの今
パッケージングの今
 
[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger
 
20170721 future of reactive architectures
20170721 future of reactive architectures20170721 future of reactive architectures
20170721 future of reactive architectures
 
DeNAの機械学習・深層学習活用した 体験提供の挑戦
DeNAの機械学習・深層学習活用した体験提供の挑戦DeNAの機械学習・深層学習活用した体験提供の挑戦
DeNAの機械学習・深層学習活用した 体験提供の挑戦
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
バイナリニューラルネットとハードウェアの関係
バイナリニューラルネットとハードウェアの関係バイナリニューラルネットとハードウェアの関係
バイナリニューラルネットとハードウェアの関係
 
Scala の関数型プログラミングを支える技術
Scala の関数型プログラミングを支える技術Scala の関数型プログラミングを支える技術
Scala の関数型プログラミングを支える技術
 
iOSエンジニアのためのScala入門
iOSエンジニアのためのScala入門iOSエンジニアのためのScala入門
iOSエンジニアのためのScala入門
 
元インフラエンジニアが
Scalaを触ってつまづいたところ。
元インフラエンジニアが
Scalaを触ってつまづいたところ。元インフラエンジニアが
Scalaを触ってつまづいたところ。
元インフラエンジニアが
Scalaを触ってつまづいたところ。
 
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. AvailabilityHPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
HPC DAY 2017 | NVIDIA Volta Architecture. Performance. Efficiency. Availability
 
Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)Chainer Meetup LT (Alpaca)
Chainer Meetup LT (Alpaca)
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
Chainer meetup
Chainer meetupChainer meetup
Chainer meetup
 
Chainer meetup20151014
Chainer meetup20151014Chainer meetup20151014
Chainer meetup20151014
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5
 

Similar to Using Raspberry Pi GPU for DNN

Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILPLec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILPHsien-Hsin Sean Lee, Ph.D.
 
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Miguel Arroyo
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcYukio Okuda
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 
JRuby @ Boulder Ruby
JRuby @ Boulder RubyJRuby @ Boulder Ruby
JRuby @ Boulder RubyNick Sieger
 
OSPF (open shortest path first) part iii
OSPF (open shortest path first) part  iiiOSPF (open shortest path first) part  iii
OSPF (open shortest path first) part iiiNetwax Lab
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaDaniel Cukier
 
Practical Testing of Ruby Core
Practical Testing of Ruby CorePractical Testing of Ruby Core
Practical Testing of Ruby CoreHiroshi SHIBATA
 
Getting Started with Raspberry Pi - DCC 2013.1
Getting Started with Raspberry Pi - DCC 2013.1Getting Started with Raspberry Pi - DCC 2013.1
Getting Started with Raspberry Pi - DCC 2013.1Tom Paulus
 
Implementing R7RS on R6RS Scheme
Implementing R7RS on R6RS SchemeImplementing R7RS on R6RS Scheme
Implementing R7RS on R6RS SchemeKato Takashi
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4nomaddo
 

Similar to Using Raspberry Pi GPU for DNN (20)

Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILPLec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
 
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)
 
assembly
assemblyassembly
assembly
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
JRuby @ Boulder Ruby
JRuby @ Boulder RubyJRuby @ Boulder Ruby
JRuby @ Boulder Ruby
 
Tokyo r18
Tokyo r18Tokyo r18
Tokyo r18
 
OSPF (open shortest path first) part iii
OSPF (open shortest path first) part  iiiOSPF (open shortest path first) part  iii
OSPF (open shortest path first) part iii
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Apache pig
Apache pigApache pig
Apache pig
 
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scala
 
Practical Testing of Ruby Core
Practical Testing of Ruby CorePractical Testing of Ruby Core
Practical Testing of Ruby Core
 
Design Of 10 gbps
Design Of 10 gbpsDesign Of 10 gbps
Design Of 10 gbps
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Getting Started with Raspberry Pi - DCC 2013.1
Getting Started with Raspberry Pi - DCC 2013.1Getting Started with Raspberry Pi - DCC 2013.1
Getting Started with Raspberry Pi - DCC 2013.1
 
Implementing R7RS on R6RS Scheme
Implementing R7RS on R6RS SchemeImplementing R7RS on R6RS Scheme
Implementing R7RS on R6RS Scheme
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4
 
Rug hogan-10-03-2012
Rug hogan-10-03-2012Rug hogan-10-03-2012
Rug hogan-10-03-2012
 
November 2014 HUG: Apache Pig 0.14
November 2014 HUG: Apache Pig 0.14 November 2014 HUG: Apache Pig 0.14
November 2014 HUG: Apache Pig 0.14
 

Recently uploaded

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Using Raspberry Pi GPU for DNN