2. Quick Intro
• Caffe 2
• 2nd generation of Caffe, which was the most popular deep learning
framework (before TensorFlow) from Berkeley
• What's the difference? Caffe2 improves Caffe 1.0 in a series of directions:
• first-class support for large-scale distributed training
• mobile deployment
• new hardware support (in addition to CPU and CUDA)
• flexibility for future directions such as quantized computation
• stress tested by the vast scale of Facebook applications
https://caffe2.ai/docs/caffe-migration.html
3. Caffe2 on Android
• Official Android demo
• https://caffe2.ai/docs/AI-Camera-demo-android.html, https://github.com/caffe2/
AICamera
• SqueezeNet 1.1:
• 5.8/5.7 fps on Samsung S7 and Google Pixel
• not very impressive
• OpenGL backend
• https://www.facebook.com/Caffe2AI/videos/126340488008269/
• up to 6X speedup (24 FPS) compared to CPU on high-end Android devices (e.g.
Galaxy S8) for style transfer models
5. • Tensorflow Lite is also looking for the possibility of
OpenGL ES backend
• https://github.com/tensorflow/tensorflow/issues/16189
6. What can we use on
Android now
https://github.com/caffe2/caffe2/tree/master/caffe2/mobile/contrib
7. Caffe2 backends for
Android I know
• ARM CPU:
• NNPACK, Eigen: quite mature
• OpenGL ES:
• OpenGL: not actively maintained (?)
• ARM Compute Library (GL ES part): newly added, still growing
• NEON, and OpenCL
• NNAPI: not fully integrated yet.
8. How to build
• > scripts/build_android.sh
• With that, no test command line binary test
• Caffe 2 has some tests and a simple command line benchmark tool
called speed_benchmark
> scripts/build_android.sh -DBUILD_TEST -DBUILD_BINARY
• then we can get build_android/bin/speed_benchmark and
other test binaries
• Pytorch has a good tutorial on using it, http://pytorch.org/tutorials/
advanced/super_resolution_with_caffe2.html
9. Some results
• > ./speed_benchmark --input_file input.blobproto --input
data --init_net init_net.pb --net predict_net.pb --
caffe2_log_level=0
01-06 23:15:42.073 32623 32623 I native : [I net_simple.cc:101] Starting benchmark.
01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:102] Running warmup runs.
01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:112] Main runs.
01-06 23:15:43.805 32623 32623 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter:
173.15. Iters per second: 5.77535
10. Some results
• ARM Compute Library backend: Caffe2 addend a Compute Libarry backend on in the end of Februrary 2018. With some tweaks, it's
possible to run SqueezeNet 1.1 faster than CPU (NNPAC) with OpenGL
01-04 03:41:38.297 25523 25523 I native : [I gl_model_test.h:52] [C2DEBUG] Benchmarking OpenGL Net
01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:104] Starting benchmark.
01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:105] Running warmup runs.
01-04 03:41:38.796 25523 25523 I native : [I net_gl.cc:121] Main runs.
01-04 03:41:43.107 25523 25523 I native : [I net_gl.cc:134] [C2DEBUG] Main run finished. Milliseconds per iter: 43.1077. Iters per
second: 23.1977
01-04 03:41:43.110 25523 25523 I native : [I gl_model_test.h:66] [C2DEBUG] Benchmarking CPU Net
01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:101] Starting benchmark.
01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:102] Running warmup runs.
01-04 03:41:43.768 25523 25523 I native : [I net_simple.cc:112] Main runs.
01-04 03:41:50.229 25523 25523 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter: 64.6136. Iters per
second: 15.4766
11. Comparing with TF Lite
• cmake is easier than bazel :-)
• Relatively large, or say comprehensive. If you want to enable something like on-device learning. It's
easier to start with TFLite.
• binary could be large
• Code looks cleaning
• Review process, or say, software engineering not as rigid as TensorFlow
• TF has a larger team (?)
• See, https://www.oreilly.com/ideas/how-the-tensorflow-team-handles-open-source-support
• Some interesting code,
• The Observer design pattern could be used to measure performance, https://en.wikipedia.org/wiki/
Observer_pattern
• https://github.com/caffe2/caffe2/tree/master/caffe2/observers