11. Interpreter
Kernel
TensorFlow Lite Model File
.tflite
TensorFlow Lite => Android Neural Networks API
C++ API
Java API
Android
Neural Networks API
Android App
Hardware
CPU/GPU/DSP/Custom
デフォルトは、CPU
Custom : Pixel Visual Core (Google)
12. Interpreter
Kernel
Java API
run
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/NativeInterpreterWrapper.java
.tflite ファイルは、JavaのInterpreterクラス
(NativeInterpreterWrapperクラス)が生成さ
れたとき、内部的なC++ APIを介して、C++
コード内で読み込まれる
private static native long[] run(...);
17. public void runForMultipleInputsOutputs(
@NotNull Object[] inputs, @NotNull Map<Integer, Object> outputs) {
if (wrapper == null) {
throw new IllegalStateException("The Interpreter has already been closed.");
}
Tensor[] tensors = wrapper.run(inputs);
if (outputs == null || tensors == null || outputs.size() > tensors.length) {
throw new IllegalArgumentException("Outputs do not match with model outputs.");
}
final int size = tensors.length;
for (Integer idx : outputs.keySet()) {
if (idx == null || idx < 0 || idx >= size) {
throw new IllegalArgumentException(
String.format("Invalid index of output %d (should be in range [0, %d))", idx, size));
}
tensors[idx].copyTo(outputs.get(idx));
}
}
NativeInterpreterWrapperクラスのrunメソッドを呼び出す
18. tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/NativeInterpreterWrapper.java
Tensor[] run(Object[] inputs) {
int[] dataTypes = new int[inputs.length];
Object[] sizes = new Object[inputs.length];
int[] numsOfBytes = new int[inputs.length];
for (int i = 0; i < inputs.length; ++i) {
DataType dataType = dataTypeOf(inputs[i]);
dataTypes[i] = dataType.getNumber();
if (dataType == DataType.BYTEBUFFER) {
ByteBuffer buffer = (ByteBuffer) inputs[i];
numsOfBytes[i] = buffer.limit();
sizes[i] = getInputDims(interpreterHandle, i, numsOfBytes[i]);
} else if (isNonEmptyArray(inputs[i])) {
int[] dims = shapeOf(inputs[i]);
sizes[i] = dims;
numsOfBytes[i] = dataType.elemByteSize() * numElements(dims);
}
NativeInterpreterWrapperクラスのrunメソッドを呼び出す
23. Interpreter
Kernel
C++ API (Android NDK API)
Java_org_tensorflow_lite_NativeInterpreterWrapper_run
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
tensorflow/contrib/lite/java/src/main/native/nativeinterpreterwrapper_jni.cc
interpreter->Invoke()
34. void AddOpsAndParams(tflite::Interpreter* interpreter,
ANeuralNetworksModel* nn_model, uint32_t next_id) {
// 途中略
ANeuralNetworksExecution* execution = nullptr;
CHECK_NN(ANeuralNetworksExecution_create(nn_compiled_model_, &execution));
// Currently perform deep copy of input buffer
for (size_t i = 0; i < interpreter->inputs().size(); i++) {
int input = interpreter->inputs()[i];
// TODO(aselle): Is this what we want or do we want input instead?
// TODO(aselle): This should be called setInputValue maybe to be cons.
TfLiteTensor* tensor = interpreter->tensor(input);
CHECK_NN(ANeuralNetworksExecution_setInput(
execution, i, nullptr, tensor->data.raw, tensor->bytes));
}
C++モデル : AddOpsAndParams関数
入力データ
35. for (size_t i = 0; i < interpreter->outputs().size(); i++) {
int output = interpreter->outputs()[i];
TfLiteTensor* tensor = interpreter->tensor(output);
CHECK_NN(ANeuralNetworksExecution_setOutput(
execution, i, nullptr, tensor->data.raw, tensor->bytes));
}
// Currently use blocking compute.
ANeuralNetworksEvent* event = nullptr;
CHECK_NN(ANeuralNetworksExecution_startCompute(execution, &event));
CHECK_NN(ANeuralNetworksEvent_wait(event));
ANeuralNetworksEvent_free(event);
ANeuralNetworksExecution_free(execution);
return kTfLiteOk;
}
出力データ
推論開始
終了待ち
データ開放
C++モデル : AddOpsAndParams関数 (続き)
37. Interpreter
Kernel
Android Neuratl Networks API
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
https://android.googlesource.com/platform/frameworks/ml
Android 8.1
公開されているものは、CPU版のみ
https://developer.android.com/ndk/guides/neuralnetworks/index.html
58. Sample Driver
nn/driver/sample/
// Base class used to create sample drivers for the NN HAL. This class
// provides some implementation of the more common functions.
//
class SampleDriver : public V1_0::IDevice {
public:
SampleDriver(const char* name) : mName(name) {}
~SampleDriver() override {}
Return<ErrorStatus> prepareModel(const Model& model,
const sp<IPreparedModelCallback>& callback) override;
Return<DeviceStatus> getStatus() override;
int run();
protected:
std::string mName;
};
64. Mobile Machine Learning Hardware at ARM:
A Systems-on-Chip (SoC) Perspective
Yuhao Zhu, Department of Computer Science, University of Rochester
Matthew Mattina, Machine Learning & AI, ARM Research
Paul Whatmough, Machine Learning & AI, ARM Research
CNN Accelarator と
CPU Cluster (L3) は、
ACP にて接続されている
https://arxiv.org/abs/1801.06274
65. ソフトウェア・スタックのポイント
The key of such a programming interface is a clear
abstraction that allows applications to execute DNN jobs
efficiently on (one of many) hardware accelerators, or fall
back to execution on a CPU or GPU.
The AndroidNN API provides an example of this principle,
by abstracting common DNN kernels such as convolution,
and scheduling execution through a hardware abstraction
layer (HAL).
66. Arm NN SDK & Arm ML Processor
Downloads,
resources,
and documentation
Available March 2018.
引用:https://developer.arm.com/products/processors/machine-learning/arm-nn
73. Interpreter
Kernel
TensorFlow Lite Model File
.tflite
TensorFlow Lite => Android Neural Networks API
C++ API
Java API
Android
Neural Networks API
Android App
Hardware
CPU/GPU/DSP/Custom
デフォルトは、CPU
: ARM Cortex-A (NEON)
GPU : ARM Mali (Compute Library)
Custom : Pixel Visual Core (Google)
Kirin 970 (Huawei)
Helio P60 (MediaTek)
Snapdragon 845 (Qualcomm)