2. • disclaimer: opinions are my own
• feel free to interrupt me if you have any questions during the presentation
• questions could be Taiwanese, English, or Mandarin
3. • Used open source before the term “open
source” is used
• A software guy, learned to use Unix and open
source software on VAX-11/780 running 4.3BSD
• Used to be a programming language junkie
• Worked on various system software, e.g., CPU
scheduling and power management of non-
CPU components
• Recently, on NN performance on edge devices
related stuff
• Contributed from time to time to TensorFlow Lite
• started a command line label_image for TFLite
who i am
https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg
4. VAX 11/780 CPU consists of TTL ICs
https://en.wikipedia.org/wiki/Transistor%E2%80%93transistor_logic https://en.wikipedia.org/wiki/7400-series_integrated_circuits
5. Why TFLite?
• TensorFlow Lite
• TensorFlow is one of the most popular machine learning frameworks
• TFLite: a lightweight runtime for edge devices
• originally mobile devices —> mobile and IoT/embedded devices
• could be accelerated by GPU, DSP, or ASIC accelerator
• How about PyTorch?
• yes, it is popular, but not on mobile devices yet
• Yes, there are other open source NN frameworks. No one is as comprehensive as TF Lite, as far as I can tell
• See my talk slide deck at COSCUP 2019 for more discussion, https://www.slideshare.net/kstan2/status-
quo-of-tensor-flow-lite-on-edge-devices-coscup-2019
6. Outline
• Overview of TFLite on Android and iOS devices,
• TFLite metadata and TFLite Android code generator,
• Some new features: CoreML delegate and XNNPACK delegate
7. What is TensorFlow Lite
• TensorFlow Lite is a cross-platform framework for deploying ML on mobile
devices and embedded systems
• Mobile devices -> mobile and IoT/embedded devices
• TFLite for Android and iOS
• TFLu: TFLite micro for micro-controllers
8. Why ML on Edge devices
• Low latency & close knit interactions
• “There is an old network saying: Bandwidth problems can be cured with money.
Latency problems are harder because the speed of light is fixed — you can't bribe
God.” -- David D. Clark,
• network connectivity
• you probably heard “always-on” back from 3G days, you know that’s not true in
the 5G era
• privacy preserving
• sensors
9. from TF Dev Summit 2020, https://youtu.be/27Zx-4GOQA8
10. • We’ll talk about
• TFLite metadata and codegen which are in tflite support library
• two delegates which enable using hardware capabilities
• What others you may want to dig into
• quantization, fixed point, integer
• ARM dot product instruction, Apple A13 matrix operations in CPUs (yes, CPU)
• GPU delegate started quantized models couple month ago
• GPUs usually support fp16 first
• new MLIR-based runtimes, such as TFRT and IREE
• I’ll talk a little bit about TFRT tomorrow
11. So how to start using TFLite
• TFLite actually has two main parts
• interpreter: loads and runs a model on various hardware
• converter: converts TF models to a TFLite specific format to be used by the
interpreter
• see https://www.tensorflow.org/lite/guide for more introduction materials
• There is a good guide on how to load a model and do inference on devices
using TFLite interpreter, in Java, Swift, Objective-C, C++, and Python
• https://www.tensorflow.org/lite/guide/inference
12. load and run a model in C++
other APIs are wrappers around C++ code
https://www.tensorflow.org/lite/guide/inference
14. TFLite Metadata
• before TFLite Metadata was introduced, when we load and run a model
• it’s user’s/developer’s responsibility to figure out what input tensors and output tensors are. E.g.,
• we know image a classifier usually expects preprocessed (resizing, cropping, padding, etc.) and normalized ([0,
1] or [-1, 1]) data
• label file is not included
• in TFLite metadata, there are three parts in the schema:
• Model information - Overall description of the model as well as items such as licence terms.
See ModelMetadata.
• Input information - Description of the inputs and pre-processing required such as normalization.
See SubGraphMetadata.input_tensor_metadata.
• Output information - Description of the output and post-processing required such as mapping to labels.
See SubGraphMetadata.output_tensor_metadata.
https://www.tensorflow.org/lite/convert/metadata
15. • Supported Input / Output types
• Feature - Numbers which are unsigned integers or float32.
• Image - Metadata currently supports RGB and greyscale images.
• Bounding box - Rectangular shape bounding boxes. The schema supports a
variety of numbering schemes.
• Pack the associated files, e.g.,
• label file(s)
• Normalization and quantization parameters
16. • With example at https://
www.tensorflow.org/lite/convert/
metadata, we can create a image
classifier with
• image input, and
• label output
https://www.tensorflow.org/lite/convert/metadata
19. My exercise to use Android CameraX and TFLite codegen in Kotlin
• To test TFLite metadata and codegen, I need an Android app that can
• grab camera inputs and
• convert them into Android Bitmap to feed into the generated model
wrapper.
• Since I know nothing about Android Camera and Kotlin, I started this from the
CameraX tutorial. It seems quite easy.
• https://github.com/freedomtan/CameraxTFLite
24. What is a TFLite delegate?
• “A TensorFlow Lite delegate is a way to delegate part or all of graph execution to another executor.”
• Why delegates?
• running computation-intensive NN models on mobile devices is resource demanding for mobile CPUs,
processing power and energy consumption could be problems
• and matrix-multiplication which is there core of convolution and fully connected ops is highly parallel
• Thus, some devices have hardware accelerators, such as GPU or DSP, that provide better performance
and higher energy efficiency thru Android NNAPI
• To use NNAPI, TFLite has an NNAPI delegate from the very beginning. Then, there are GPU delegates
(GL ES, OpenCL, and Metal for now. Vulkan one is coming) and others.
• my COSCUP 2019 slide deck on how NNAPI and GPU delegates work , https://www.slideshare.net/
kstan2/tflite-nnapi-and-gpu-delegates
25. XNNPACK and CoreML Delegates
• “XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM,
WebAssembly, and x86 platforms.”
• “XNNPACK is not intended for direct use by deep learning practitioners and researchers; instead it
provides low-level performance primitives for accelerating high-level machine learning frameworks, such
as TensorFlow Lite, TensorFlow.js, PyTorch, and MediaPipe.", https://github.com/google/XNNPACK
• NNPACK —> QNNPACK —> XNNPACK
• In TFLite, there is a XNNPACK delegate
• CoreML is Apple’s machine learning framework
• the only formal way to use Neural Engine, Apple’s NN accelerator started from A11
• nope, CoreML cannot use A11 Neural Engine, https://www.slideshare.net/kstan2/why-you-cannot-
use-neural-engine-to-run-your-nn-models-on-a11-devices
26. • convolution is at the core of current
neural network models
• How convolution is implemented either
in SW or HW
• “direct convolution”: 6- or 7-layer
nested for loops,
• im2col, then GEMM,
• other transforms, e.g., Winograd
• XNNPACK found a way to efficiently
reuse GEMM
XNNPACK
https://arxiv.org/pdf/1907.02129.pdf
27. Using XNNPACK in label_image.cc
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/
label_image.cc#L109-L116
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/evaluation/utils.h#L64-L88
29. Concluding remarks
• TFLite is getting more and more mature and comprehensive
• If you haven’t started using it, you may want to start with TFLite metadata and
Android code generators
• nope, there is no iOS code generator (yet)
• To speed up execution of NN models, use TFL delegates
• note that not all accelerators are created equal
• some are fp only; some are int/quant only