SlideShare a Scribd company logo
1 of 36
Download to read offline
A Peek into Google’s
Edge TPU
Koan-Sin Tan

freedom@computer.org

April 18th, 2019

Hsinchu Coding Serfs Meeting
1
Who Am I?
• An old programmer, learned to use “open
source” stuff on VAX-11/780 running 4.3BSD
before the term “open source” was coined

• TensorFlow Contributor

• Search “Koan-Sin" at https://github.com/
tensorflow/tensorflow/releases

• PRs, https://github.com/tensorflow/
tensorflow/pulls?
utf8=%E2%9C%93&q=is%3Apr+author
%3Afreedomtan+

• Contributing to TensorFlow is quite easy.
There are many typos :-)

• Interested in using NN on edge devices. so
learned TFLite

• label_image for TFLite
2
Google Edge TPU
!3
https://coral.withgoogle.com/products/
Google Edge TPU
• Announced in Google Next
2018 (July, 2018)

• Available to general developers
right before TensorFlow Dev
Summit 2019 (Mar, 2019)

• USB: Coral Accelerator

• Dev Board: Coral Dev Board

• More are coming, e.g., PCI-E
Accelerator and SOM

• Supported framework: TFLite
https://coral.withgoogle.com/products/
4
• Updates released on April 11th, 2019

• Compiler: removed the restriction for specific architectures

• New TensorFlow Lite C++ API

• Updated Python API, mainly for multiple Edge TPUs

• Updated Mendel OS and Mendel Management Tool (MDT) tool

• Environmental Sensor Board, https://coral.withgoogle.com/products/
environmental/

https://developers.googleblog.com/2019/04/updates-from-coral-new-compiler-and.html 

https://coral.withgoogle.com/news/updates-04-2019/
!5
biology hobbyist in Edge TPU team?
!6
https://en.wikipedia.org/wiki/Coral https://en.wikipedia.org/wiki/Charles_Darwin
https://en.wikipedia.org/wiki/HMS_Beagle https://en.wikipedia.org/wiki/Gregor_Mendel
Coral USB Accelerator
• USB 3.1 (gen 1) port and
cable (SuperSpeed, 5Gb/s
transfer speed)

• MobileNet V1 1.0 224
quantized: ~ 4.3 MiB,

• Recommended operating
conditions

•
• https://coral.withgoogle.com/tutorials/accelerator-datasheet/
• https://coral.withgoogle.com/tutorials/accelerator/
4.3 * 106
* 8/(5 * 109
) ≈ 70μs
Operating frequency Max ambient temperature
Default 35°C
Maximum 25°C
• Software environment

• Linux computer with a USB Port

• Debian 6.0 or higher, or any
derivative thereof (such as Ubuntu
10.0+)

• System architecture of either x86_64
or ARM64 with ARMv8 instruction
set

• Some caveats

• USB 2.0 hurts

• With newer Ubuntu, you have to
modify the installation script

• actually, ARMv7 also works
7
https://coral.withgoogle.com/tutorials/accelerator-datasheet/
Performance Setting for
USB Accelerator
!8
!9
Coral Dev Board
• Edge TPU Module (SOM)
◦ NXP i.MX 8M SOC (Quad-core
Cortex-A53, plus Cortex-M4F)
◦ Google Edge TPU ML accelerator
coprocessor
◦ Cryptographic coprocessor
◦ Wi-Fi 2x2 MIMO (802.11b/g/n/ac
2.4/5GHz)
◦ Bluetooth 4.1
◦ 8GB eMMC
◦ 1GB LPDDR4
• USB connections
◦ USB Type-C power port (5V DC)
◦ USB 3.0 Type-C OTG port
◦ USB 3.0 Type-A host port
◦ USB 2.0 Micro-B serial console port
• Audio connections
◦ 3.5mm audio jack (CTIA compliant)
◦ Digital PDM microphone (x2)
◦ 2.54mm 4-pin terminal for stereo speakers
• Video connections
◦ HDMI 2.0a (full size)
◦ 39-pin FFC connector for MIPI DSI
display (4-lane)
◦ 24-pin FFC connector for MIPI CSI-2
camera (4-lane)
• MicroSD card slot
• Gigabit Ethernet port
• 40-pin GPIO expansion header
• Supports Mendel Linux (derivative of Debian)
https://coral.withgoogle.com/tutorials/devboard-datasheet/
https://www.blog.google/products/google-cloud/bringing-intelligence-to-the-edge-with-cloud-iot/10
Mendel Linux?
• https://pypi.org/project/
mendel-development-tool/

• https://
coral.googlesource.com/
mdt.git

• 404, several weeks ago

• now it’s there

• actually, there are lots more
information at https://
coral.googlesource.com/, let’s
look at them later
https://pypi.org/project/mendel-development-tool/
11
Mendel Linux
• It’s Debian-based one, apt tools can tell us many things

• And take a look at /etc/apt/sources.list. Yup, it’s there

• https://packages.cloud.google.com/apt/dists/mendel-bsp-
enterprise-beaker/main

• https://packages.cloud.google.com/apt/dists/mendel-
beaker/main
!12
Mendel Linux
• https://
packages.cloud.google.com/
apt/dists/

mendel-animal
mendel-beaker
mendel-bsp-enterprise-animal
mendel-bsp-enterprise-beaker
mendel-bsp-enterprise-chef
mendel-bsp-enterprise-unstable
mendel-chef
mendel-chef-unstable
mendel-core-animal
mendel-core-beaker
mendel-core-chef
mendel-core-unstable
mendel-unstable
mendel-upstream-stretch
13
Performance?
https://coral.withgoogle.com/tutorials/edgetpu-faq/
!14
Let’s start from the first
demo
• USB getting started guide:

• https://coral.withgoogle.com/tutorials/accelerator/
• BasicEngine->{ClassificationEngine, DetectionEngine}, ImprintingEngine

• BasicEngine is single line

• from edgetpu.swig.edgetpu_cpp_wrapper import BasicEngine
• swig: yes, the > 20 yo SWIG

• _edgetpu_cpp_wrapper.so
!15
ClassificationEngine DetectionEngine
BasicEngine ImprintingEngine
ClassifyWithImage(img, threshold=0.1, top_k=3, resample=Image.NEAREST)
ClassifyWithInputTensor(input_tensor, threshold=0.0, top_k=3)
__dict__
…
ClassificationEngine
RunInference(input)
get_input_tensor_shape()
get_all_output_tensors_sizes()
get_num_of_output_tensors()
get_output_tensor_size()
required_input_array_size()
total_output_array_size()
model_path()
get_raw_output()
get_inference_time()
device_path()
__dict__
…
BasicEngine
What are in Engines
• BasicEngine

• input and output related

• Classification

• still I/O related

• classification specific:
resizing input image and
what to output
16
performance!
• no existing way to reproduce those numbers

• classify_image.py uses
ClassificationEngine.ClassifyWithImage()

• ClassifyWithImage() —>
ClassifyWithInputTensors() —>
RunInference()

• preprocessing: image resize time

• post-processing: top_k and finding labels/
classes

• BasicEngine.get_inference_time() returns
something I cannot understand

• modified label_image.py (and
object_detection) for TFLite

• quite close
https://github.com/freedomtan/edge_tpu_python_scripts
17
numbers in a git repo
• numbers and scripts

•
18
inception_v1_224_quant.tflite 412.79
inception_v1_224_quant_edget
pu.tflite
4.00
inception_v4_299_quant.tflite 3328.34
inception_v4_299_quant_edget
pu.tflite
100.33
mobilenet_ssd_v1_coco_quant
_postprocess.tflite
391.34
mobilenet_ssd_v1_coco_quant
_postprocess_edgetpu.tflite
14.83
mobilenet_ssd_v2_coco_quant
_postprocess.tflite
355.48
mobilenet_ssd_v2_coco_quant
_postprocess_edgetpu.tflite
16.92
mobilenet_ssd_v2_face_quant
_postprocess.tflite
369.02
mobilenet_ssd_v2_face_quant
_postprocess_edgetpu.tflite
7.78
mobilenet_v1_1.0_224_quant.t
flite
184.99
mobilenet_v1_1.0_224_quant_
edgetpu.tflite
2.22
mobilenet_v2_1.0_224_quant.t
flite
160.94
mobilenet_v2_1.0_224_quant_
edgetpu.tflite
2.56
• benchmarks/basic_engine_benchmarks.py[Added - diff]
• benchmarks/classification_benchmarks.py[Added - diff]
• benchmarks/detection_benchmarks.py[Added - diff]
• benchmarks/imprinting_benchmarks.py[Added - diff]
• benchmarks/multiple_tpus_performance_analysis.py[Added - diff]
• benchmarks/reference/basic_engine_reference_aarch64.csv[Added - diff]
• benchmarks/reference/basic_engine_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/basic_engine_reference_rp3b.csv[Added - diff]
• benchmarks/reference/basic_engine_reference_x86_64.csv[Added - diff]
• benchmarks/reference/classification_reference_aarch64.csv[Added - diff]
• benchmarks/reference/classification_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/classification_reference_rp3b.csv[Added - diff]
• benchmarks/reference/classification_reference_x86_64.csv[Added - diff]
• benchmarks/reference/detection_reference_aarch64.csv[Added - diff]
• benchmarks/reference/detection_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/detection_reference_rp3b.csv[Added - diff]
• benchmarks/reference/detection_reference_x86_64.csv[Added - diff]
• benchmarks/reference/imprinting_reference_aarch64.csv[Added - diff]
• benchmarks/reference/imprinting_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/imprinting_reference_rp3b.csv[Added - diff]
• benchmarks/reference/imprinting_reference_x86_64.csv[Added - diff]
https://coral.googlesource.com/edgetpu/+/refs/heads/release-chef
Comparing with NCS 2
!19
device
MobileNet V1
1.0/224
MobileNet V2
1.0/224
Inception V3 ResNet 50 SqueezeNet 1.1
MobileNet V1
0.25/128
SSD MobileNet
V1 COCO
SSD MobileNet
V2 COCO
Coral: Edge
TPU
2.74 2.87 43.27 42.41 1.90 1.11 10.05 12.48
NCS 2 (fp16) 12.11 14.87 52.25 33.1 3.99 4.08 23.53 39.11
iPhone Xs Max
(Neural Engine
accelerated,
fp16)
1.74 2.15 8.65 6.91 1.75 1.16
Mobilenet V1/V2 and SSD Mobilenet V1/V2 are quite good
• Edge TPU: my scripts, https://github.com/freedomtan/edge_tpu_python_scripts
• NCS 2: ./benchmark_app-d MYRIAD -niter 50 -nireq 10 ..
• iPhone Xs Max: my CoreML benchmark, https://github.com/freedomtan/coremlbenchmark
0
2
4
6
8
10
12
14
time(ms)
Mobilenet V1: Edge TPU and NCS2
ncs2 mobilenet_v1_0.25 ncs2 mobilenet_v1_0.5 ncs2 mobilenet_v1_0.75 ncs2 mobilenet_v1_1.0
coral mobilenet_v1_0.25 coral mobilenet_v1_0.5 coral mobilenet_v1_0.75 coral mobilenet_v1_1.0
Mobilenet V1 on EdgeTPU
and NCS2
20
inference time size=128x128 size=160x160 size=192x192 size=224x224
ncs2
mobilenet_v1_0
.25
3.83 3.95 4.06 4.4
ncs2
mobilenet_v1_0
.5
4.98 4.86 5.51 6.51
ncs2
mobilenet_v1_0
.75
6.04 6.67 7.93 9.4
ncs2
mobilenet_v1_1
.0
7.43 8.68 10.13 12.2
coral
mobilenet_v1_0
.25
1.07 1.24 1.30 1.47
coral
mobilenet_v1_0
.5
1.16 1.40 1.53 1.95
coral
mobilenet_v1_0
.75
1.29 1.70 1.80 2.16
coral
mobilenet_v1_1
.0
1.50 1.95 2.15 2.85
https://www.tensorflow.org/lite/images/convert/workflow.svg
https://coral.withgoogle.com/docs/edgetpu/models-intro/• It’s said Edge TPU supports
TFLite

• well, not running TFLite
models directly
Edge TPU’s canned model
!21
Edge TPU’s canned model
• What do you mean by single
custom op
The compiler creates a single custom op for all Edge TPU
compatible ops; anything else stays the same
https://coral.withgoogle.com/docs/edgetpu/models-intro/
22
MobileNet V1 1×224×224×3
1×1001
edgetpu-custom-op
input
Softmax
1×300×300×3
1×1917×91
1×10×4 1×10 1×10 1
edgetpu-custom-op
TFLite_Detection_PostProcess
3 1917×4
normalized_input_image_tensor
TFLite_Detection_PostProcess TFLite_Detection_PostProcess:1 TFLite_Detection_PostProcess:2 TFLite_Detection_PostProcess:3
SSD MobileNet V1
Beyond Python
• _edgetpu_cpp_wrapper.so

• TensorFlow Lite runtime and others

• let’s take a look at _wrap_new_BasicEngine: aiy::BasicEngine::BasicEngine()
• aiy::BasicEngine::RunInference() —>
aiy::BasicEngine::RunInferenceHelper() —>
tflite::Interpreter::Invoke()
• unresolved edgetpu::EdgeTpuManager::GetSingleton()

• libedgetpu.so

• OpenSSL, Edge TPU context, communicating with the Edge TPU via USB or PCI

• edgetpu::EdgeTpuManager::GetSingleton()
• platforms::darwinn::tflite::EdgeTpuManagerDirect::GetSingleton()
!23
Edge TPU C++ API
• Released on April 11th, 2019

• binaries for x86_64, aarch64, and armeabi-v7a

• a simple header file

• two simple examples

• some doc at https://coral.withgoogle.com/docs/edgetpu/api-cpp/

• Native build on Dev Board

• the Dev Board is a quad-CA53 board, surely we can build code on it

• a small aarch64 patch https://github.com/tensorflow/tensorflow/commit/5520a9d82e5,
https://github.com/tensorflow/tensorflow/pull/16175

• https://github.com/freedomtan/edgetpu-native, label_image for tflite ported
!24
Edge TPU C++ API
•class EdgeTpuManager
•static EdgeTpuManager* GetSingleton();
•3 different
std::unique_ptr<EdgeTpuContext>
NewEdgeTpuContext()
•std::vector<DeviceEnumerationRecord>
EnumerateEdgeTpu()
•TfLiteStatus SetVerbosity(int verbosity)
•std::string Version()
• let’s take a look at ‘-v’ logs

• https://drive.google.com/
drive/folders/1-
MhGIgWHuhbKM6XrhPqyuLJ
DzoLD1t2g?usp=sharing

• in short, USB ones seem to
have more overhead
25
https://github.com/freedomtan/edgetpu-native/blob/label_image/libedgetpu/edgetpu.h#L110-
L158
1×224×224×3
1×1×1×1024
1×1×1×1024
1×1×1×5
1×5
1×5
edgetpu-custom-op
L2Normalization
Conv2D
weights 5×1×1×1024
bias 5
Reshape
Softmax
input
Output
Imprinting Engine
• Yes, let’s check what it is

• The Imprinting Engine implements a low-shot learning technique
called ‘Imprinted Weights’ [1][2]

• Can be used to retrain classifiers on-device (either on USB
Accelerator or Dev Board), no back-propagation gradient involved.

• Why?

• Transfer-learning happens on-device, at near-realtime speed.

• You don't need to recompile the model.

• Limitations

• Training data size is limited to a max of 200 images per class.

• It is most suitable only for datasets that have a small inner
class variation.

• The last fully-connected layer runs on the CPU, not the Edge
TPU. So it will be slightly less efficient than running a pre-
compiled on Edge TPU.

• if you are interested in it, check the paper and
aiy::learn::imprinting::ImprintingEngine::Train(un
signed char const*, int, int)
26
[1] https://coral.withgoogle.com/docs/edgetpu/retrain-classification-ondevice/

[2] https://arxiv.org/abs/1712.07136
1×224×224×3
1×1×1×1024
edgetpu-custom-op
input
AvgPool
PCIe device?
• it’s Linux

• `uname -a`: Linux hopeful-nexus 4.9.51-imx #1 SMP
PREEMPT Thu Jan 31 01:58:26 UTC 2019 aarch64
GNU/Linux

• there is /proc/config.gz

• $ zcat /proc/config.gz | grep -i
edge
• CONFIG_SND_GOOGLE_EDGETPU_CARD=y
!27
PCIe Device
• apex driver is in gasket
• https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
drivers/staging/gasket
• It’s upstreamed last year already
• https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/
drivers/staging/gasket/apex_driver.c
!28
Global Unichip Corp
USB Vendor id 0x1a6e = “Global Unichip Corp”
PCI Vendor id 0x1ac1 = “Global Unichip Corp”
!29
USB Accelerator opened
https://twitter.com/generuso/status/1111733195244998656
!30
MCU on USB Accelerator
!31
https://www.seeedstudio.com/Coral-USB-Accelerator-p-2899.html
Power Consumption of the
USB Accelerator
• 4.94 x 0.18 ~= 0.9 W

• running Mobilenet-SSD
https://twitter.com/exsiva/status/1108692847719407616
32
Architecture of Edge TPU?
• Nope, I didn’t read it. Just
FYR

• https://patents.google.com/
patent/US20190050717A1/
33
Concluding Remarks
• Edge TPU is quite good for small models that you can converted to canned
ones

• Quantized UINT8

• not so good for some common larger models, e.g., Inception V3 and
ResNet 50

• your USB and CPU could be problems

• on-device re-training looks promising

• NCS 2 supports much more models for now

• How about NVIDIA Jetson Nano? Dunno, let’s wait and see. I don’t believe
GPU will win in the on long run.
!34
questions?
!35
!36
~ $99.00

More Related Content

What's hot

Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情NVIDIA Japan
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...Edge AI and Vision Alliance
 
HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?NVIDIA Japan
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Koan-Sin Tan
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのNVIDIA Japan
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラNVIDIA Japan
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSELiang Yan
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜Takeo Imai
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門NVIDIA Japan
 
1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門NVIDIA Japan
 
TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT Mia Chang
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architectureKhanh Le
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...Edge AI and Vision Alliance
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
Automatic Mixed Precision の紹介
Automatic Mixed Precision の紹介Automatic Mixed Precision の紹介
Automatic Mixed Precision の紹介Kuninobu SaSaki
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Kynetics
 

What's hot (20)

Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなの
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラ
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSE
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門
 
1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Automatic Mixed Precision の紹介
Automatic Mixed Precision の紹介Automatic Mixed Precision の紹介
Automatic Mixed Precision の紹介
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 

Similar to A Peek into Google's Edge TPU

Go & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and ErrorsGo & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and ErrorsYoshiki Shibukawa
 
Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5Dwika Sudrajat
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...Edge AI and Vision Alliance
 
APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?Martin Gutenbrunner
 
Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020Lorenzo Miniero
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devsITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devsITCamp
 
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0César Hernández
 
Paving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferencedayPaving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferencedayCésar Hernández
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityScyllaDB
 
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo OmuraPreferred Networks
 
Flutter Festival - Session 1
Flutter Festival - Session 1Flutter Festival - Session 1
Flutter Festival - Session 1AmanVerma36049
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
Machine Learning in Google I/O 19
Machine Learning in Google I/O 19Machine Learning in Google I/O 19
Machine Learning in Google I/O 19Jeongkyu Shin
 
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" Peter Hlavaty
 
44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...
44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...
44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...44CON
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptxFEG
 

Similar to A Peek into Google's Edge TPU (20)

Go & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and ErrorsGo & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and Errors
 
Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
 
APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devsITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
 
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
 
Paving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferencedayPaving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferenceday
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & Observability
 
20190423 meetup japan_public
20190423 meetup japan_public20190423 meetup japan_public
20190423 meetup japan_public
 
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
 
Flutter Festival - Session 1
Flutter Festival - Session 1Flutter Festival - Session 1
Flutter Festival - Session 1
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
Machine Learning in Google I/O 19
Machine Learning in Google I/O 19Machine Learning in Google I/O 19
Machine Learning in Google I/O 19
 
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
 
44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...
44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...
44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowKoan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphonesKoan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchKoan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08Koan-Sin Tan
 

More from Koan-Sin Tan (14)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Recently uploaded (20)

Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

A Peek into Google's Edge TPU

  • 1. A Peek into Google’s Edge TPU Koan-Sin Tan freedom@computer.org April 18th, 2019 Hsinchu Coding Serfs Meeting 1
  • 2. Who Am I? • An old programmer, learned to use “open source” stuff on VAX-11/780 running 4.3BSD before the term “open source” was coined • TensorFlow Contributor • Search “Koan-Sin" at https://github.com/ tensorflow/tensorflow/releases • PRs, https://github.com/tensorflow/ tensorflow/pulls? utf8=%E2%9C%93&q=is%3Apr+author %3Afreedomtan+ • Contributing to TensorFlow is quite easy. There are many typos :-) • Interested in using NN on edge devices. so learned TFLite • label_image for TFLite 2
  • 4. Google Edge TPU • Announced in Google Next 2018 (July, 2018) • Available to general developers right before TensorFlow Dev Summit 2019 (Mar, 2019) • USB: Coral Accelerator • Dev Board: Coral Dev Board • More are coming, e.g., PCI-E Accelerator and SOM • Supported framework: TFLite https://coral.withgoogle.com/products/ 4
  • 5. • Updates released on April 11th, 2019 • Compiler: removed the restriction for specific architectures • New TensorFlow Lite C++ API • Updated Python API, mainly for multiple Edge TPUs • Updated Mendel OS and Mendel Management Tool (MDT) tool • Environmental Sensor Board, https://coral.withgoogle.com/products/ environmental/ https://developers.googleblog.com/2019/04/updates-from-coral-new-compiler-and.html https://coral.withgoogle.com/news/updates-04-2019/ !5
  • 6. biology hobbyist in Edge TPU team? !6 https://en.wikipedia.org/wiki/Coral https://en.wikipedia.org/wiki/Charles_Darwin https://en.wikipedia.org/wiki/HMS_Beagle https://en.wikipedia.org/wiki/Gregor_Mendel
  • 7. Coral USB Accelerator • USB 3.1 (gen 1) port and cable (SuperSpeed, 5Gb/s transfer speed) • MobileNet V1 1.0 224 quantized: ~ 4.3 MiB, • Recommended operating conditions • • https://coral.withgoogle.com/tutorials/accelerator-datasheet/ • https://coral.withgoogle.com/tutorials/accelerator/ 4.3 * 106 * 8/(5 * 109 ) ≈ 70μs Operating frequency Max ambient temperature Default 35°C Maximum 25°C • Software environment • Linux computer with a USB Port • Debian 6.0 or higher, or any derivative thereof (such as Ubuntu 10.0+) • System architecture of either x86_64 or ARM64 with ARMv8 instruction set • Some caveats • USB 2.0 hurts • With newer Ubuntu, you have to modify the installation script • actually, ARMv7 also works 7
  • 9. !9
  • 10. Coral Dev Board • Edge TPU Module (SOM) ◦ NXP i.MX 8M SOC (Quad-core Cortex-A53, plus Cortex-M4F) ◦ Google Edge TPU ML accelerator coprocessor ◦ Cryptographic coprocessor ◦ Wi-Fi 2x2 MIMO (802.11b/g/n/ac 2.4/5GHz) ◦ Bluetooth 4.1 ◦ 8GB eMMC ◦ 1GB LPDDR4 • USB connections ◦ USB Type-C power port (5V DC) ◦ USB 3.0 Type-C OTG port ◦ USB 3.0 Type-A host port ◦ USB 2.0 Micro-B serial console port • Audio connections ◦ 3.5mm audio jack (CTIA compliant) ◦ Digital PDM microphone (x2) ◦ 2.54mm 4-pin terminal for stereo speakers • Video connections ◦ HDMI 2.0a (full size) ◦ 39-pin FFC connector for MIPI DSI display (4-lane) ◦ 24-pin FFC connector for MIPI CSI-2 camera (4-lane) • MicroSD card slot • Gigabit Ethernet port • 40-pin GPIO expansion header • Supports Mendel Linux (derivative of Debian) https://coral.withgoogle.com/tutorials/devboard-datasheet/ https://www.blog.google/products/google-cloud/bringing-intelligence-to-the-edge-with-cloud-iot/10
  • 11. Mendel Linux? • https://pypi.org/project/ mendel-development-tool/ • https:// coral.googlesource.com/ mdt.git • 404, several weeks ago • now it’s there • actually, there are lots more information at https:// coral.googlesource.com/, let’s look at them later https://pypi.org/project/mendel-development-tool/ 11
  • 12. Mendel Linux • It’s Debian-based one, apt tools can tell us many things • And take a look at /etc/apt/sources.list. Yup, it’s there • https://packages.cloud.google.com/apt/dists/mendel-bsp- enterprise-beaker/main • https://packages.cloud.google.com/apt/dists/mendel- beaker/main !12
  • 15. Let’s start from the first demo • USB getting started guide: • https://coral.withgoogle.com/tutorials/accelerator/ • BasicEngine->{ClassificationEngine, DetectionEngine}, ImprintingEngine • BasicEngine is single line • from edgetpu.swig.edgetpu_cpp_wrapper import BasicEngine • swig: yes, the > 20 yo SWIG • _edgetpu_cpp_wrapper.so !15 ClassificationEngine DetectionEngine BasicEngine ImprintingEngine
  • 16. ClassifyWithImage(img, threshold=0.1, top_k=3, resample=Image.NEAREST) ClassifyWithInputTensor(input_tensor, threshold=0.0, top_k=3) __dict__ … ClassificationEngine RunInference(input) get_input_tensor_shape() get_all_output_tensors_sizes() get_num_of_output_tensors() get_output_tensor_size() required_input_array_size() total_output_array_size() model_path() get_raw_output() get_inference_time() device_path() __dict__ … BasicEngine What are in Engines • BasicEngine • input and output related • Classification • still I/O related • classification specific: resizing input image and what to output 16
  • 17. performance! • no existing way to reproduce those numbers • classify_image.py uses ClassificationEngine.ClassifyWithImage() • ClassifyWithImage() —> ClassifyWithInputTensors() —> RunInference() • preprocessing: image resize time • post-processing: top_k and finding labels/ classes • BasicEngine.get_inference_time() returns something I cannot understand • modified label_image.py (and object_detection) for TFLite • quite close https://github.com/freedomtan/edge_tpu_python_scripts 17
  • 18. numbers in a git repo • numbers and scripts • 18 inception_v1_224_quant.tflite 412.79 inception_v1_224_quant_edget pu.tflite 4.00 inception_v4_299_quant.tflite 3328.34 inception_v4_299_quant_edget pu.tflite 100.33 mobilenet_ssd_v1_coco_quant _postprocess.tflite 391.34 mobilenet_ssd_v1_coco_quant _postprocess_edgetpu.tflite 14.83 mobilenet_ssd_v2_coco_quant _postprocess.tflite 355.48 mobilenet_ssd_v2_coco_quant _postprocess_edgetpu.tflite 16.92 mobilenet_ssd_v2_face_quant _postprocess.tflite 369.02 mobilenet_ssd_v2_face_quant _postprocess_edgetpu.tflite 7.78 mobilenet_v1_1.0_224_quant.t flite 184.99 mobilenet_v1_1.0_224_quant_ edgetpu.tflite 2.22 mobilenet_v2_1.0_224_quant.t flite 160.94 mobilenet_v2_1.0_224_quant_ edgetpu.tflite 2.56 • benchmarks/basic_engine_benchmarks.py[Added - diff] • benchmarks/classification_benchmarks.py[Added - diff] • benchmarks/detection_benchmarks.py[Added - diff] • benchmarks/imprinting_benchmarks.py[Added - diff] • benchmarks/multiple_tpus_performance_analysis.py[Added - diff] • benchmarks/reference/basic_engine_reference_aarch64.csv[Added - diff] • benchmarks/reference/basic_engine_reference_rp3b+.csv[Added - diff] • benchmarks/reference/basic_engine_reference_rp3b.csv[Added - diff] • benchmarks/reference/basic_engine_reference_x86_64.csv[Added - diff] • benchmarks/reference/classification_reference_aarch64.csv[Added - diff] • benchmarks/reference/classification_reference_rp3b+.csv[Added - diff] • benchmarks/reference/classification_reference_rp3b.csv[Added - diff] • benchmarks/reference/classification_reference_x86_64.csv[Added - diff] • benchmarks/reference/detection_reference_aarch64.csv[Added - diff] • benchmarks/reference/detection_reference_rp3b+.csv[Added - diff] • benchmarks/reference/detection_reference_rp3b.csv[Added - diff] • benchmarks/reference/detection_reference_x86_64.csv[Added - diff] • benchmarks/reference/imprinting_reference_aarch64.csv[Added - diff] • benchmarks/reference/imprinting_reference_rp3b+.csv[Added - diff] • benchmarks/reference/imprinting_reference_rp3b.csv[Added - diff] • benchmarks/reference/imprinting_reference_x86_64.csv[Added - diff] https://coral.googlesource.com/edgetpu/+/refs/heads/release-chef
  • 19. Comparing with NCS 2 !19 device MobileNet V1 1.0/224 MobileNet V2 1.0/224 Inception V3 ResNet 50 SqueezeNet 1.1 MobileNet V1 0.25/128 SSD MobileNet V1 COCO SSD MobileNet V2 COCO Coral: Edge TPU 2.74 2.87 43.27 42.41 1.90 1.11 10.05 12.48 NCS 2 (fp16) 12.11 14.87 52.25 33.1 3.99 4.08 23.53 39.11 iPhone Xs Max (Neural Engine accelerated, fp16) 1.74 2.15 8.65 6.91 1.75 1.16 Mobilenet V1/V2 and SSD Mobilenet V1/V2 are quite good • Edge TPU: my scripts, https://github.com/freedomtan/edge_tpu_python_scripts • NCS 2: ./benchmark_app-d MYRIAD -niter 50 -nireq 10 .. • iPhone Xs Max: my CoreML benchmark, https://github.com/freedomtan/coremlbenchmark
  • 20. 0 2 4 6 8 10 12 14 time(ms) Mobilenet V1: Edge TPU and NCS2 ncs2 mobilenet_v1_0.25 ncs2 mobilenet_v1_0.5 ncs2 mobilenet_v1_0.75 ncs2 mobilenet_v1_1.0 coral mobilenet_v1_0.25 coral mobilenet_v1_0.5 coral mobilenet_v1_0.75 coral mobilenet_v1_1.0 Mobilenet V1 on EdgeTPU and NCS2 20 inference time size=128x128 size=160x160 size=192x192 size=224x224 ncs2 mobilenet_v1_0 .25 3.83 3.95 4.06 4.4 ncs2 mobilenet_v1_0 .5 4.98 4.86 5.51 6.51 ncs2 mobilenet_v1_0 .75 6.04 6.67 7.93 9.4 ncs2 mobilenet_v1_1 .0 7.43 8.68 10.13 12.2 coral mobilenet_v1_0 .25 1.07 1.24 1.30 1.47 coral mobilenet_v1_0 .5 1.16 1.40 1.53 1.95 coral mobilenet_v1_0 .75 1.29 1.70 1.80 2.16 coral mobilenet_v1_1 .0 1.50 1.95 2.15 2.85
  • 21. https://www.tensorflow.org/lite/images/convert/workflow.svg https://coral.withgoogle.com/docs/edgetpu/models-intro/• It’s said Edge TPU supports TFLite • well, not running TFLite models directly Edge TPU’s canned model !21
  • 22. Edge TPU’s canned model • What do you mean by single custom op The compiler creates a single custom op for all Edge TPU compatible ops; anything else stays the same https://coral.withgoogle.com/docs/edgetpu/models-intro/ 22 MobileNet V1 1×224×224×3 1×1001 edgetpu-custom-op input Softmax 1×300×300×3 1×1917×91 1×10×4 1×10 1×10 1 edgetpu-custom-op TFLite_Detection_PostProcess 3 1917×4 normalized_input_image_tensor TFLite_Detection_PostProcess TFLite_Detection_PostProcess:1 TFLite_Detection_PostProcess:2 TFLite_Detection_PostProcess:3 SSD MobileNet V1
  • 23. Beyond Python • _edgetpu_cpp_wrapper.so • TensorFlow Lite runtime and others • let’s take a look at _wrap_new_BasicEngine: aiy::BasicEngine::BasicEngine() • aiy::BasicEngine::RunInference() —> aiy::BasicEngine::RunInferenceHelper() —> tflite::Interpreter::Invoke() • unresolved edgetpu::EdgeTpuManager::GetSingleton() • libedgetpu.so • OpenSSL, Edge TPU context, communicating with the Edge TPU via USB or PCI • edgetpu::EdgeTpuManager::GetSingleton() • platforms::darwinn::tflite::EdgeTpuManagerDirect::GetSingleton() !23
  • 24. Edge TPU C++ API • Released on April 11th, 2019 • binaries for x86_64, aarch64, and armeabi-v7a • a simple header file • two simple examples • some doc at https://coral.withgoogle.com/docs/edgetpu/api-cpp/ • Native build on Dev Board • the Dev Board is a quad-CA53 board, surely we can build code on it • a small aarch64 patch https://github.com/tensorflow/tensorflow/commit/5520a9d82e5, https://github.com/tensorflow/tensorflow/pull/16175 • https://github.com/freedomtan/edgetpu-native, label_image for tflite ported !24
  • 25. Edge TPU C++ API •class EdgeTpuManager •static EdgeTpuManager* GetSingleton(); •3 different std::unique_ptr<EdgeTpuContext> NewEdgeTpuContext() •std::vector<DeviceEnumerationRecord> EnumerateEdgeTpu() •TfLiteStatus SetVerbosity(int verbosity) •std::string Version() • let’s take a look at ‘-v’ logs • https://drive.google.com/ drive/folders/1- MhGIgWHuhbKM6XrhPqyuLJ DzoLD1t2g?usp=sharing • in short, USB ones seem to have more overhead 25 https://github.com/freedomtan/edgetpu-native/blob/label_image/libedgetpu/edgetpu.h#L110- L158
  • 26. 1×224×224×3 1×1×1×1024 1×1×1×1024 1×1×1×5 1×5 1×5 edgetpu-custom-op L2Normalization Conv2D weights 5×1×1×1024 bias 5 Reshape Softmax input Output Imprinting Engine • Yes, let’s check what it is • The Imprinting Engine implements a low-shot learning technique called ‘Imprinted Weights’ [1][2] • Can be used to retrain classifiers on-device (either on USB Accelerator or Dev Board), no back-propagation gradient involved. • Why? • Transfer-learning happens on-device, at near-realtime speed. • You don't need to recompile the model. • Limitations • Training data size is limited to a max of 200 images per class. • It is most suitable only for datasets that have a small inner class variation. • The last fully-connected layer runs on the CPU, not the Edge TPU. So it will be slightly less efficient than running a pre- compiled on Edge TPU. • if you are interested in it, check the paper and aiy::learn::imprinting::ImprintingEngine::Train(un signed char const*, int, int) 26 [1] https://coral.withgoogle.com/docs/edgetpu/retrain-classification-ondevice/ [2] https://arxiv.org/abs/1712.07136 1×224×224×3 1×1×1×1024 edgetpu-custom-op input AvgPool
  • 27. PCIe device? • it’s Linux • `uname -a`: Linux hopeful-nexus 4.9.51-imx #1 SMP PREEMPT Thu Jan 31 01:58:26 UTC 2019 aarch64 GNU/Linux • there is /proc/config.gz • $ zcat /proc/config.gz | grep -i edge • CONFIG_SND_GOOGLE_EDGETPU_CARD=y !27
  • 28. PCIe Device • apex driver is in gasket • https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ drivers/staging/gasket • It’s upstreamed last year already • https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/ drivers/staging/gasket/apex_driver.c !28
  • 29. Global Unichip Corp USB Vendor id 0x1a6e = “Global Unichip Corp” PCI Vendor id 0x1ac1 = “Global Unichip Corp” !29
  • 31. MCU on USB Accelerator !31 https://www.seeedstudio.com/Coral-USB-Accelerator-p-2899.html
  • 32. Power Consumption of the USB Accelerator • 4.94 x 0.18 ~= 0.9 W • running Mobilenet-SSD https://twitter.com/exsiva/status/1108692847719407616 32
  • 33. Architecture of Edge TPU? • Nope, I didn’t read it. Just FYR • https://patents.google.com/ patent/US20190050717A1/ 33
  • 34. Concluding Remarks • Edge TPU is quite good for small models that you can converted to canned ones • Quantized UINT8 • not so good for some common larger models, e.g., Inception V3 and ResNet 50 • your USB and CPU could be problems • on-device re-training looks promising • NCS 2 supports much more models for now • How about NVIDIA Jetson Nano? Dunno, let’s wait and see. I don’t believe GPU will win in the on long run. !34