TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
Introduction to TensorFlow Lite
1. What I Know about
TensorFlow Lite
Koan-Sin Tan
freedom@computer.org
Hsinchu Coding Serfs Meeting
Dec 7th, 2017
2. • We heard Android NN and TensorFlow Lite back in Google I/
O 2017
• My COSCUP 2017 slide deck “TensorFlow on Android”
• https://www.slideshare.net/kstan2/tensorflow-on-
android
• People knew a bit about Android NN API before it was
announced and released
• No information about TensorFlow Lite, at least to me,
before it was released in Nov
3. tf-lite and android NN in
Google I/O
• New TensorFlow runtime
• Optimized for mobile and
embedded apps
• Runs TensorFlow models on
device
• Leverage Android NN API
• Soon to be open sourced
from Google I/O 2017 video
4. Actual Android NN API
• Announced with Android 8.1 Preview 1
• Available to developer in NDK
• yes, NDK
• The Android Neural Networks API (NNAPI)
is an Android C API designed for running
computationally intensive operations for
machine learning on mobile devices
• NNAPI is designed to provide a base layer
of functionality for higher-level machine
learning frameworks (such as TensorFlow
Lite, Caffe2, or others) that build and train
neural networks
• The API is available on all devices running
Android 8.1 (API level 27) or higher.
https://developer.android.com/ndk/images/nnapi/nnapi_architecture.png
5. Android NN on Pixel 2
• Only the CPU fallback is available
• Actually, you can see Android NN API related in AOSP
after Oreo MR1 (8.1) release already
• user level code, see https://android.googlesource.com/
platform/frameworks/ml/+/oreo-mr1-release
• HAL, see https://android.googlesource.com/platform/
hardware/interfaces/+/oreo-mr1-release/
neuralnetworks/
6. TensorFlow Lite
• TensorFlow Lite is TensorFlow’s lightweight solution for
mobile and embedded devices
• It enables on-device machine learning inference with low
latency and a small binary size
• Low latency techniques: optimizing the kernels for mobile
apps, pre-fused activations, and quantized kernels that
allow smaller and faster (fixed-point math) models
• TensorFlow Lite also supports hardware acceleration with
the Android Neural Networks API
https://www.tensorflow.org/mobile/tflite/
7. What does TensorFlow Lite
contain?
• a set of core operators, both quantized and float, which have been tuned for mobile platforms
• pre-fused activations and biases to further enhance performance and quantized accuracy
• using custom operations in models also supported
• a new model file format, based on FlatBuffers
• the primary difference is that FlatBuffers does not need a parsing/unpacking step to a secondary
representation before you can access data
• the code footprint of FlatBuffers is an order of magnitude smaller than protocol buffers
• a new mobile-optimized interpreter,
• key goals: keeping apps lean and fast.
• a static graph ordering and a custom (less-dynamic) memory allocator to ensure minimal load,
initialization, and execution latency
• an interface to Android NN API if available
https://www.tensorflow.org/mobile/tflite/
8. why a new mobile-specific
library?
• Innovation at the silicon layer is enabling new possibilities for hardware
acceleration, and frameworks such as the Android Neural Networks API
make it easy to leverage these
• Recent advances in real-time computer-vision and spoken language
understanding have led to mobile-optimized benchmark models being open
sourced (e.g. MobileNets, SqueezeNet)
• Widely-available smart appliances create new possibilities for on-device
intelligence
• Interest in stronger user data privacy paradigms where user data does not
need to leave the mobile device
• Ability to serve ‘offline’ use cases, where the device does not need to be
connected to a network
https://www.tensorflow.org/mobile/tflite/
9. • A set of core operators, both quantized and float, many of which have been tuned for mobile platforms. These
can be used to create and run custom models. Developers can also write their own custom operators and
use them in models
• A new FlatBuffers-based model file format
• On-device interpreter with kernels optimized for faster execution on mobile
• TensorFlow converter to convert TensorFlow-trained models to the TensorFlow Lite format.
• Smaller in size: TensorFlow Lite is smaller than 300KB when all supported operators are linked and less than
200KB when using only the operators needed for supporting InceptionV3 and Mobilenet
• Pre-tested models
• Inception V3, MobileNet, On Device Smart Reply
• Quantized versions of the MobileNet model, which runs faster than the non-quantized (float) version on CPU.
• New Android demo app to illustrate the use of TensorFlow Lite with a quantized MobileNet model for object
classification
• Java and C++ API support
https://www.tensorflow.org/mobile/tflite/
10. • Java API: A convenience
wrapper around the C++ API
on Android
• C++ API: Loads the
TensorFlow Lite Model File
and invokes the Interpreter.
The same library is available
on both Android and iOS
https://www.tensorflow.org/mobile/tflite/
11. • Let $TF_ROOT be root of tensorflow
• source of tf-lite: ${TF_ROOT}/tensorflow/contrib/lite/
• https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/README.md
• examples
• two for Android, two for iOS
• APIs: ${TF_ROOT}/tensorflow/contrib/lite/g3doc/apis.md, https://github.com/tensorflow/
tensorflow/blob/master/tensorflow/contrib/lite/g3doc/apis.md
• no benchmark_model: well there is one, https://github.com/tensorflow/tensorflow/blob/master/
tensorflow/contrib/lite/tools/benchmark_model.cc
• it’s incomplete
• no command line label_image (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/
examples/label_image)
12. • model: .tflite model
• resolver: if no custom ops, builtin
op resolve is enough
• interpreter: we need it to compute
the graph
• interpreter->AllocateTensor():
allocate stuff for you, e.g., input
tensor(s)
• fill the input
• interpreter->Invoke(): run the graph
• process the output
tflite::FlatBufferModel model(path_to_model);
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
// Resize input tensors, if desired.
interpreter->AllocateTensors();
float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.
interpreter->Invoke();
float* output = interpreter->type_output_tensor<float>(0);
13. beyond basic stuff
• https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/interpreter.h
• const char* GetInputName(int index): https://github.com/tensorflow/tensorflow/blob/master/
tensorflow/contrib/lite/interpreter.h#L157
• const char* GetOutputName(int index): https://github.com/tensorflow/tensorflow/blob/
master/tensorflow/contrib/lite/interpreter.h#L166
• int tensors_size(): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/
lite/interpreter.h#L171
• TfLiteTensor* tensor(int tensor_index)
• int nodes_size(): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/
lite/interpreter.h#L174
• const std::pair<TfLiteNode, TfLiteRegistration>* node_and_registration(int node_index)
• Yes, we should be able to enumerate/traverse tensors and nodes
15. • logs of running label_image
• https://drive.google.com/file/d/11LAI_b1fVOM2GxOT_gOpOMpASaqe5m_U/view?
usp=sharing
• builtin state dump function
• void PrintInterpreterState(Interpreter* interpreter): https://github.com/tensorflow/tensorflow/
blob/master/tensorflow/contrib/lite/optional_debug_tools.h#L25
• https://github.com/freedomtan/tensorflow/blob/label_image_tflite_pr/tensorflow/contrib/lite/
examples/label_image/label_image.cc#L147
• TF operations --> TF Lite operations is not trivial
• https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/
tf_ops_compatibility.md
• https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/
nnapi_delegate.cc
16. speed of quantized one
• It seems it's much better than naive quantization as we saw before
• On Nexus 9 (MobileNet 1.0/224)
• Quantized
• ./label_image -t 2: ~ 160 ms
• ./label_image -t 2 -c 100: ~ 60 ms
• Floating point
• ./label_image -t 2 -f 1 -m ./mobilenet_v1_1.0_224.tflite: ~ 300 ms
• ./label_image -t 2 -c 100 -f 1 -m ./mobilenet_v1_1.0_224.tflite: ~ 82 ms
• TFLiteCameraDemo: 130 - 180 ms
• Pixel 2
• TFLiteCameraDemo: ~ 100 ms
• didn’t see significant difference, w/ or w/o Android NN runtime
18. Fake Quantiztion
• How hard can it be? How much time is needed?
• Several pre-tested models are available
• https://github.com/tensorflow/tensorflow/blob/master/
tensorflow/contrib/lite/g3doc/models.md
• but only one of them (https://storage.googleapis.com/
download.tensorflow.org/models/tflite/
mobilenet_v1_224_android_quant_2017_11_08.zip) is quantized
one
• as we can guess from related docs, retrain is kinda required to
get accuracy back
19. • BLAS part: eigen (http://eigen.tuxfamily.org/) and
gemmlowp (https://github.com/google/gemmlowp)