Caffe2 on Android

Caffe2 on Android
Koan-Sin Tan

freedom@computer.org

March 8th, 2018

Hsinchu Coding Serfs Meeting

Quick Intro
• Caffe 2

• 2nd generation of Caffe, which was the most popular deep learning
framework (before TensorFlow) from Berkeley

• What's the difference? Caffe2 improves Caffe 1.0 in a series of directions:

• first-class support for large-scale distributed training

• mobile deployment
• new hardware support (in addition to CPU and CUDA)

• flexibility for future directions such as quantized computation

• stress tested by the vast scale of Facebook applications
https://caffe2.ai/docs/caffe-migration.html

Caffe2 on Android
• Official Android demo

• https://caffe2.ai/docs/AI-Camera-demo-android.html, https://github.com/caffe2/
AICamera

• SqueezeNet 1.1:

• 5.8/5.7 fps on Samsung S7 and Google Pixel

• not very impressive

• OpenGL backend

• https://www.facebook.com/Caffe2AI/videos/126340488008269/

• up to 6X speedup (24 FPS) compared to CPU on high-end Android devices (e.g.
Galaxy S8) for style transfer models

https://trends.google.com/trends/explore?q=tensorﬂow,caﬀe2

• Tensorflow Lite is also looking for the possibility of
OpenGL ES backend

• https://github.com/tensorflow/tensorflow/issues/16189

What can we use on
Android now
https://github.com/caffe2/caffe2/tree/master/caffe2/mobile/contrib

Caffe2 backends for
Android I know
• ARM CPU:

• NNPACK, Eigen: quite mature

• OpenGL ES:

• OpenGL: not actively maintained (?)

• ARM Compute Library (GL ES part): newly added, still growing

• NEON, and OpenCL

• NNAPI: not fully integrated yet.

How to build
• > scripts/build_android.sh
• With that, no test command line binary test

• Caﬀe 2 has some tests and a simple command line benchmark tool
called speed_benchmark
> scripts/build_android.sh -DBUILD_TEST -DBUILD_BINARY

• then we can get build_android/bin/speed_benchmark and
other test binaries

• Pytorch has a good tutorial on using it, http://pytorch.org/tutorials/
advanced/super_resolution_with_caﬀe2.html

Some results
• > ./speed_benchmark --input_ﬁle input.blobproto --input
data --init_net init_net.pb --net predict_net.pb --
caﬀe2_log_level=0

01-06 23:15:42.073 32623 32623 I native : [I net_simple.cc:101] Starting benchmark.
01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:102] Running warmup runs.
01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:112] Main runs.
01-06 23:15:43.805 32623 32623 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter:
173.15. Iters per second: 5.77535

Some results
• ARM Compute Library backend: Caffe2 addend a Compute Libarry backend on in the end of Februrary 2018. With some tweaks, it's
possible to run SqueezeNet 1.1 faster than CPU (NNPAC) with OpenGL

01-04 03:41:38.297 25523 25523 I native : [I gl_model_test.h:52] [C2DEBUG] Benchmarking OpenGL Net

01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:104] Starting benchmark.

01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:105] Running warmup runs.

01-04 03:41:38.796 25523 25523 I native : [I net_gl.cc:121] Main runs.

01-04 03:41:43.107 25523 25523 I native : [I net_gl.cc:134] [C2DEBUG] Main run finished. Milliseconds per iter: 43.1077. Iters per
second: 23.1977

01-04 03:41:43.110 25523 25523 I native : [I gl_model_test.h:66] [C2DEBUG] Benchmarking CPU Net

01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:101] Starting benchmark.

01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:102] Running warmup runs.

01-04 03:41:43.768 25523 25523 I native : [I net_simple.cc:112] Main runs.

01-04 03:41:50.229 25523 25523 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter: 64.6136. Iters per
second: 15.4766

Comparing with TF Lite
• cmake is easier than bazel :-)

• Relatively large, or say comprehensive. If you want to enable something like on-device learning. It's
easier to start with TFLite.

• binary could be large

• Code looks cleaning

• Review process, or say, software engineering not as rigid as TensorFlow

• TF has a larger team (?)

• See, https://www.oreilly.com/ideas/how-the-tensorflow-team-handles-open-source-support

• Some interesting code,

• The Observer design pattern could be used to measure performance, https://en.wikipedia.org/wiki/
Observer_pattern

• https://github.com/caffe2/caffe2/tree/master/caffe2/observers

Caffe2 on Android

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Caffe2 on Android

Similar to Caffe2 on Android (20)

Recently uploaded

Recently uploaded (20)

Caffe2 on Android