Google Edge TPUでTensorFlow Liteを使った時に何をやっているのかを妄想してみる

Google Edge TPUで 
TensorFlow Liteを使った時に 
何をやっているのかを妄想してみる 
 
TFUG ハード部：Jetson Nano, Edge TPU & TF Lite micro 特集
@ Google 
 
作成：2019/6/2 
Slideshareにて公開：2019/6/10 
@Vengineer

ブログ (2007年～) : Vengineerの戯言 
　http://blogs.yahoo.co.jp/verification_engineer 
 
SlideShare :  
　https://www.slideshare.net/ssuser479fa3 
 
 
Twitter (2009年～) : 
＠Vengineer 
ソースコード解析職人

Google Edge TPUの死角であった 
Online Compilerから 
 
Offline Compilerに！ 
 
https://coral.withgoogle.com/docs/edgetpu/compiler/

TensorFlow公式のモデルを使って変換してみた
：Hosted Models から以下の4個 
 
　・mnasnet_0.5_224.tflite 
　・mobilenet_v2_1.0_224_quant.tflite 
　・inception_v1_244_quant.tflite 
　・detect.tflite

AutoML mobile models : mnasnet_0.5_224.tflite 
　　 
Edge TPU Compiler version 1.0.249710469 
INFO: Initialized TensorFlow Lite runtime. 
Invalid model: mnasnet_0.5_224.tflite 
Model not quantized 
 
　量子化されていないのは、ダメ！ 
　　=> quantization-aware training 
Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference

mobilenet_v2_1.0_224_quant.tflite 
　　 
　・入力モデルサイズ : 3.41MB 
　・出力モデルサイズ : 3.89MB 
　・avaiable for caching : 6.53MB 
　・On-chip memory : 3.75MB 
　・Off-chip memory : 0.00B 
 
　・Subgraph : 1、Ops : 65

inception_v1_224_quant.tflite 
　　 
　・Off-chip memory : 182.19KB 
 

Parameter data caching 
引用 
The Edge TPU has roughly 8 MB of SRAM that
can cache the model's parameter data.  
 
However, a small amount of the RAM is first reserved for the
model's inference executable, so the parameter data uses
whatever space remains after that.

モデルのパラメータ用に内部にSRAM
を8MBぐらい持っている 
 
ただし、最初の方はモデル用に使う
ので、8MB全部をモデルのパラメータ
用には使えない。

 
　・Off-chip memory : 182.19KB 
 
セーフ
アウト

続く 
 
Naturally, saving the parameter data on the Edge
TPU RAM enables faster inferencing speed
compared to fetching the parameter data from
external memory. 
 
　=> たぶん、ホスト側のシステムメモリ

モデルは、どう変換されるのか？ 
 
　・Subgraph : 1、Ops : 65 
 

mobilenet_v2_1.0_224_quant_edgeput.tflite inception_v1_224_quant_edgeput.tflite

std::unique_ptr<tflite::Interpreter> BuildEdgeTpuInterpreter( 
const tflite::FlatBufferModel& model,  
edgetpu::EdgeTpuContext* edgetpu_context) {  
tflite::ops::builtin::BuiltinOpResolver resolver;  
resolver.AddCustom (edgetpu::kCustomOp, edgetpu::RegisterCustomOp());  
std::unique_ptr<tflite::Interpreter> interpreter;  
if (tflite::InterpreterBuilder(model, resolver)(&interpreter) != kTfLiteOk) {  
std::cerr << "Failed to build interpreter." << std::endl;  
} 
// Bind given context with interpreter.  
interpreter->SetExternalContext(kTfLiteEdgeTpuContext, edgetpu_context);  
interpreter->SetNumThreads(1);  
if (interpreter->AllocateTensors() != kTfLiteOk) {  
std::cerr << "Failed to allocate tensors." << std::endl;  
} 
return interpreter;  
} 
 
https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/edgetpu/cpp/examples/utils.cc#181

// EdgeTPU custom op. 
static const char kCustomOp[] = "edgetpu-custom-op"; 
 
　　TensorFlow XLAと同じように 
 
　1つのOpにまとめて、 
 
　中でなんかやっているようです。妄想 
 
https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/libedgetpu/edgetpu.h#95

Object Detection : detect.tflite 
　　 
 

Model successfully compiled but not all operations are
supported by the Edge TPU. A percentage of the model will
instead run on the CPU, which is slower. If possible, consider
updating your model to use only operations supported by the
Edge TPU.  
 
For details, visit g.co/coral/model-reqs. 
Number of operations that will run on Edge TPU: 63 
Number of operations that will run on CPU: 1

detect_edgetpu.log 
　　 
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU 
RESHAPE 13 Mapped to Edge TPU 
LOGISTIC 1 Mapped to Edge TPU 
CUSTOM 1 Operation is working on an
unsupported data type 
CONCATENATION 2 Mapped to Edge TPU 
CONV_2D 34 Mapped to Edge TPU 
 
Currently, the Edge TPU compiler cannot partition the model more than once, so
as soon as an unsupported operation occurs, that operation and everything after it
executes on the CPU, even if supported operations occur later.

@PINTO03091 さんの 
https://twitter.com/PINTO03091/status/1135098223989682176  
から裏付けです。

"TFLite_Detection_PostProcess" がCPU
側へオフロードされています

@PINTO03091 さんの 
https://twitter.com/PINTO03091/status/1135099238352711681  
から裏付けです。 
"TFLite_Detection_PostProcess" がCPU側へオフロードされています

https://coral.withgoogle.com/docs/edgetpu/models-intro/#model-requirements

まとめ 
 
　Google Edge TPUを利用するには、量子化して、 
　　 
　・モデル+パラメータは、8MB以下に 
 
　・モデルは、Edge TPUがサポートしている Op だけに 
 
　これを守らないモデルは、性能が出ないよ。 
 
　量子化 => quantization-aware training

＠PINTO03091 さん 
　いろいろと、ありがとうございました

あたしは、 
ディープラーニング職人ではありません 
コンピュータエンジニアです 
 
 
ありがとうございました
＠Vengineer 
ソースコード解析職人

Google Edge TPUでTensorFlow Liteを使った時に何をやっているのかを妄想してみる

Recommended

Recommended

More Related Content

More from Mr. Vengineer

More from Mr. Vengineer (20)

Google Edge TPUでTensorFlow Liteを使った時に何をやっているのかを妄想してみる