SlideShare a Scribd company logo
1 of 31
Download to read offline
A Sneak Peek of
MLIR in TensorFlow
Koan-Sin Tan

freedom@computer.org

Hsinchu Coding Serfs Meeting

July 11th, 2019
Why MLIR
https://medium.com/tensorflow/mlir-a-new-intermediate-representation-
and-compiler-framework-beba999ed18d
• MLIR is intended to be a hybrid IR which can support multiple different requirements in a unified infrastructure. For
example, this includes:
• The ability to represent all TensorFlow graphs, including dynamic shapes, the user-extensible op ecosystem,
TensorFlow variables, etc.
• Optimizations and transformations typically done on a TensorFlow graph, e.g. in Grappler.
• Quantization and other graph transformations done on a TensorFlow graph or the TF Lite representation.
• Representation of kernels for ML operations in a form suitable for optimization.
• Ability to host high-performance-computing-style loop optimizations across kernels (fusion, loop interchange,
tiling, etc.) and to transform memory layouts of data.
• Code generation "lowering" transformations such as DMA insertion, explicit cache management, memory tiling,
and vectorization for 1D and 2D register architectures.
• Ability to represent target-specific operations, e.g. the MXU on TPUs.
• non-goals:
• low level machine code generation algorithms (like register allocation and instruction scheduling)
• MLIR as a source language that end-users would themselves write kernels in analogous to CUDA C++
https://github.com/tensorflow/mlir/blob/master/README.md
• Entire TensorFlow graph: nope, the “tf” dialect isn’t public yet
• Initial MLIR for in TensorFLow repo on June 28th, 2019
• Early TF, TFLite and XLA support: floating point MobilenetV1 TF pb ! TFLite flatbuffer works
• No, quantized ones don’t work yet although many components are there
• Simple quant, fxp, affine, and vector code is there
• So it’s possible to start exploring tiling and other techniques with affine, vector, and other dialects
• more GPU supports, including Vulkan SPIR-V
• Low-level code generation
• MLIR relies on LLVM and other existing backends
• Where to start
• MLIR’s git repo has
• links to 3 slide deck, one of them is a tutorial in Euro-LLVM 2019
• Docs for Toy lang and linear algebra dialect
• TensorFlow MLIR: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir
TF .pb -> TFLite .tflite
• build TensorFlow MLIR related binaries

bazel build --config opt tensorflow/compiler/mlir/...
• get your model, e.g., 

wget http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz
• convert it

./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-
output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_frozen.pb --tf-input-arrays=input -o /tmp/foo.tflite
• yes, it works like a charm. Nope, not for quantized one? neither

./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_QUINT8 -tf-
output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input -o /tmp/
bar.tflite
nor

./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-
output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input -o /tmp/
bar.tflite —tf-inference-type=TF_QUINT8
works
How the converter works?
• Import from GraphDef, in .pb or .pbtxt format, into MLIR

• Raise control-flow graph. Converts TF Control Flow dialect to TF dialect.

• The Canonicalization pass iteratively applies canonicalization transformations in a
greedy way until no further changes occur. Canonicalization includes constant
folding.

• The Legalize pass converts TensorFlow operations to TensorFlow Lite ones. The
operations that cannot be mapped to TensorFlow Lite dialect are left as TensorFlow
operations. Unsupported op handling follows the proposed TFLite mechanism.

• Optimizations are performed in both the TF & TFLite dialect; aiming for small size
and high performance (among the core value proposition of TensorFlow Lite models).

• The Export pass writes out TensorFlow Lite FlatBuffer format. This pass operates on
MLIR TensorFlow Lite dialect and is simple/direct translation.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/
README.md
tf-mlir-translate
• graphdef —> mlir

$ ./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --help
OVERVIEW: MLIR translation driver
USAGE: tf-mlir-translate [options] <input file>
OPTIONS:
Color Options:
--color - Use colors in output (default=autodetect)
General options:
--mlir-max-pattern-match-iterations=<uint> - Max number of iterations scanning the functions for pattern match
--mlir-pretty-debuginfo - Print pretty debug info in MLIR output
--mlir-print-debuginfo - Print debug info in MLIR output
-o=<filename> - Output filename
--remarks-yaml-string-table -
Translation to perform
--deserialize-spirv - deserialize-spirv
--graphdef-to-mlir - graphdef-to-mlir
--graphdef-to-splatted-mlir - graphdef-to-splatted-mlir
--mlir-to-graphdef - mlir-to-graphdef
--mlir-to-llvmir - mlir-to-llvmir
--mlir-to-nvvmir - mlir-to-nvvmir
--serialize-spirv - serialize-spirv
--test-only-mlir-to-tf-nodedef - test-only-mlir-to-tf-nodedef
--tf-debug-info=<string> - Path to the debug info file of the input graph def.
--tf-inference-type=<string> - Sets the type of real-number arrays in the output file. Only allows float and quantized types
--tf-input-arrays=<string> - Input tensor names, separated by ','
--tf-input-data-types=<string> - Input tensor data types, separated by ','
--tf-input-max-values=<string> - Sets the upper bound of the input data. Separated by ','; Each entry in the list should match
an entry in -tf-input-arrays. This is used when -tf-inference-type is a quantized type.
--tf-input-min-values=<string> - Sets the lower bound of the input data. Separated by ','; Each entry in the list should match
an entry in -tf-input-arrays. This is used when -tf-inference-type is a quantized type.
--tf-input-shapes=<string> - Input tensor shapes. Shapes for different tensors are separated by ':', and dimension sizes
for the same tensor are separated by ','
--tf-output-arrays=<string> - Output tensor names, separated by ','
--tf-prune-unused-nodes - Prune unused nodes in the input graphdef
--time-trace-granularity=<uint> - Minimum time granularity (in microseconds) traced by time profile
_tf dialect
./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --graphdef-to-mlir -tf-input-
shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/
mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input |less
func @main(%arg0: tensor<1x224x224x3xf32>) -> tensor<1x1001xf32>
attributes {tf.entry_function = {inputs = "input", outputs = "MobilenetV1/Predictions/Reshape_1"}} {
%0:2 = "_tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/
beta", value = opaque<"tf",
"0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020
73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3234335C3335305C3233355C3237375C3234345C3
330305C303335405C323134395C3337353D685F5C3333315C323736315A5C3333303F5C3232305C3232305C303137405C3235325C33
37375C273F5C3331315C32373523405C3231315C3336325C3237335C3237365C3336345C3230365C3234315C3237375C32373054655
C3237375C3237345C3333375C30323140695C3236355C30303040795C3233375C3237373F5C3230346F393F485C3333314D40515C33
32335C3237345C3237375C3230325C3234305C303335405C3233335C3230365C3233353E5C323633525C3337373F5C3030355C32343
25C3032315C3237375C3232305C3230332A5C323734405C3331355C725C3330305C3332345C3230335040235C3336325C3030375C33
30305C3237355C6E5C303235405C323735295C32323440515C3030345C3334325C3237365C333037465C303334405C3236375C33313
15C3236343F5C3232305C3233365C3237335C323736655C3330325440220A"> : tensor<32xf32>} : () ->
(tensor<32xf32>, !_tf.control)
%1:2 = "_tf.Identity"(%0#0) {T = "tfdtype$DT_FLOAT", _class = ["loc:@MobilenetV1/Conv2d_0/BatchNorm/
beta"], device = "", name = "MobilenetV1/Conv2d_0/BatchNorm/beta/read"} : (tensor<32xf32>) ->
(tensor<32xf32>, !_tf.control)
%2:2 = "_tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/
gamma", value = opaque<"tf",
"0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020
73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3330315C3030305C3031373F5C333332776E3F5C3
233365C3334305C3230323F5C30303445643F675C3334344D3F2E345C3031363F425C3032325C3234363F5C313737595C3332353E62
5C303137773F4B5C3334355C3233323E5C3332365C3030326F3F5C323035515C3230303F5C323431665C303236405C3232335C32313
65C3032343F5C3231355C323235753F295C3230345C3232373F3F5C3337305C3236363F5C3237365C323736213F5C3332305C333630
5C323036405C3030345C3237355C3334343E5C3337305C22743F5C3235325C3233355C6E3F5C3031305C3031375C3233323F685C333
5315C3232373F5C3233365C3235317E3F5C303337435C3234343F675C3235326A3F5C32333752733F5C3235325C3335325C3232313F
77565C3233313F5C3030355C3032326C3F5C32313053573F220A"> : tensor<32xf32>} : () -> (tensor<32xf32>, !
_tf.control)
TensorFlow Dialects
#include "tensorflow/compiler/mlir/tensorflow/ir/control_flow_ops.h"
#include "tensorflow/compiler/mlir/tensorflow/ir/tf_executor.h"
#include "tensorflow/compiler/mlir/tensorflow/ir/tf_ops.h"
using namespace mlir;
// Static initialization for TF dialect registration.
static DialectRegistration<TFControlFlow::TFControlFlowDialect>
TFControlFlowOps;
static DialectRegistration<TF::TensorFlowDialect> TFOps;
static DialectRegistration<tf_executor::TensorFlowExecutorDialect>
TfExecutorDialect;
tensorflow/compiler/mlir/tensorflow/ir/dialect_registration.cc
TensorFlow Dialects
• More on TensorFlow dialects:

• tf: the main dialect, representing the regular operations in a TensorFlow graph (the ones that
don’t have special contract with the executor).

• tf_executor:  dialect that represents the execution model of the TensorFlow executor (e.g.,
control dependencies, deadness propagation)

• _tf: It's said in the TensorFlow MLIR open source announcement mail thread, https://
groups.google.com/a/tensorflow.org/forum/#!topic/mlir/xe522DD4ZYA, that control flow
dialect "_tf" is temporary.

• "One intent of this design is that TensorFlow 2.x features can choose to target just the tf
dialect, allowing us to phase out the tf_executor dialect in subsequent TensorFlow releases. The
combination of the two dialects allows to represent arbitrary existing TensorFlow graphs." [1]

[1] "https://github.com/tensorflow/community/pull/115
tf dialect
./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --graphdef-to-mlir -tf-input-
shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/
mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input | ./bazel-bin/tensorflow/compiler/mlir/tf-
opt --tf-raise-control-flow |less
func @main(%arg0: tensor<1x224x224x3xf32>) -> tensor<1x1001xf32>
attributes {tf.entry_function = {inputs = "input", outputs = "MobilenetV1/Predictions/Reshape_1"}} {
%cst = "tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/
beta", value = opaque<"tf",
"0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020
73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3234335C3335305C3233355C3237375C3234345C3
330305C303335405C323134395C3337353D685F5C3333315C323736315A5C3333303F5C3232305C3232305C303137405C3235325C33
37375C273F5C3331315C32373523405C3231315C3336325C3237335C3237365C3336345C3230365C3234315C3237375C32373054655
C3237375C3237345C3333375C30323140695C3236355C30303040795C3233375C3237373F5C3230346F393F485C3333314D40515C33
32335C3237345C3237375C3230325C3234305C303335405C3233335C3230365C3233353E5C323633525C3337373F5C3030355C32343
25C3032315C3237375C3232305C3230332A5C323734405C3331355C725C3330305C3332345C3230335040235C3336325C3030375C33
30305C3237355C6E5C303235405C323735295C32323440515C3030345C3334325C3237365C333037465C303334405C3236375C33313
15C3236343F5C3232305C3233365C3237335C323736655C3330325440220A"> : tensor<32xf32>} : () -> tensor<32xf32>
%0 = "tf.Identity"(%cst) {T = "tfdtype$DT_FLOAT", _class = ["loc:@MobilenetV1/Conv2d_0/BatchNorm/beta"],
device = "", name = "MobilenetV1/Conv2d_0/BatchNorm/beta/read"} : (tensor<32xf32>) -> tensor<32xf32>
%cst_0 = "tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/
gamma", value = opaque<"tf",
"0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020
73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3330315C3030305C3031373F5C333332776E3F5C3
233365C3334305C3230323F5C30303445643F675C3334344D3F2E345C3031363F425C3032325C3234363F5C313737595C3332353E62
5C303137773F4B5C3334355C3233323E5C3332365C3030326F3F5C323035515C3230303F5C323431665C303236405C3232335C32313
65C3032343F5C3231355C323235753F295C3230345C3232373F3F5C3337305C3236363F5C3237365C323736213F5C3332305C333630
5C323036405C3030345C3237355C3334343E5C3337305C22743F5C3235325C3233355C6E3F5C3031305C3031375C3233323F685C333
5315C3232373F5C3233365C3235317E3F5C303337435C3234343F675C3235326A3F5C32333752733F5C3235325C3335325C3232313F
77565C3233313F5C3030355C3032326C3F5C32313053573F220A"> : tensor<32xf32>} : () -> tensor<32xf32>
Leaky ReLU
• a LeakyReLU example

func @teatLeakyReLU(%1: tensor<*xf32>) -> tensor<*xf32> {
%2 = "tf.LeakyRelu"(%1) { alpha = 0.1 : f32 } : (tensor<*xf32>) -> tensor<*xf32>
return %2 : tensor<*xf32>
}
• round trip

$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_leaky_relu.mli
func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> {
%0 = "tf.LeakyRelu"(%arg0) {alpha = 1.000000e-01 : f32} : (tensor<*xf32>) -> tensor<*xf32>
return %0 : tensor<*xf32>
}
Leaky ReLU w/ alpha = 1.0
• a LeakyReLU example

func @teatLeakyReLU(%1: tensor<*xf32>) -> tensor<*xf32> {
%2 = "tf.LeakyRelu"(%1) { alpha = 1.0 : f32 } : (tensor<*xf32>) -> tensor<*xf32>
return %2 : tensor<*xf32>
}
• round trip

$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_leaky_relu.mli
func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> {
%0 = "tf.LeakyRelu"(%arg0) {alpha = 1.000000e+00 : f32} : (tensor<*xf32>) -> tensor<*xf32>
return %0 : tensor<*xf32>
}
• constant folding

$ bazel-bin/tensorflow/compiler/mlir/tf-opt --test-constant-fold ~/work/mlir/test_leaky_relu.mli
func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> {
return %arg0 : tensor<*xf32>
}
• canonicalization

$ bazel-bin/tensorflow/compiler/mlir/tf-opt —canonicalize ~/work/mlir/test_leaky_relu.mli
func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> {
return %arg0 : tensor<*xf32>
}
Leaky ReLU Legalization
• a LeakyReLU, alpha = 0.1

func @teatLeakyReLU(%1: tensor<*xf32>) -> tensor<*xf32> {
%2 = "tf.LeakyRelu"(%1) { alpha = 0.1 : f32 } : (tensor<*xf32>) -> tensor<*xf32>
return %2 : tensor<*xf32>
}
• Leaky ReLU legalization, alpha = 0.1

$ bazel-bin/tensorflow/compiler/mlir/tf-opt --tfl-legalize-tf ~/work/mlir/test_leaky_relu.mli
func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> {
%0 = “tfl.leaky_relu”(%arg0) {alpha = 1.000000e-01 : f32} : (tensor<*xf32>) -> tensor<*xf32>
return %0 : tensor<*xf32>
}
• Leaky ReLU legalization, alpha = 1.0

$ bazel-bin/tensorflow/compiler/mlir/tf-opt --tfl-legalize-tf ~/work/mlir/test_leaky_relu.mli
func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> {
return %arg0 : tensor<*xf32>
}
tf —> tfl: Conv2D+BiasAdd+Relu
! conv_2d
tf.FakeQuant()
• simple FakeQuant

func @testValidFakeQuantWithMinMaxArgs(%arg0: tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32> {
%0 = "tf.FakeQuantWithMinMaxArgs"(%arg0) {max = 1.000000e+00 : f32, min = -1.000000e+00 : f32, num_bits = 3 : i64} : (tensor<8x8x8x8xf32>)
-> tensor<8x8x8x8xf32>
return %0 : tensor<8x8x8x8xf32>
}
• legalize to tfl

$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_fake_quant.mlir --tfl-legalize-tf
func @testValidFakeQuantWithMinMaxArgs(%arg0: tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32> {
%0 = "tfl.quantize"(%arg0) {qtype = tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>} : (tensor<8x8x8x8xf32>) ->
tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>
%1 = "tfl.dequantize"(%0) : (tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>) -> tensor<8x8x8x8xf32>
return %1 : tensor<8x8x8x8xf32>
}
• --tfl-post-quantize

$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_fake_quant.mlir --tfl-legalize-tf --tfl-post-quantize
func @testValidFakeQuantWithMinMaxArgs(%arg0: tensor<8x8x8x8xf32>) -> tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>> {
%0 = "tfl.quantize"(%arg0) {qtype = tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>} : (tensor<8x8x8x8xf32>) ->
tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>
return %0 : tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>
}
TFLite Native Quantization
• Take input min/max information and set the ArrayInfo (which really is
InputOrOutputArrayInfo).

• In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize
nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q
-> tfl.DQ).

• Hardcode logic/propagation needs to happen here.

• Run TF constant folding.

• In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).

• Run quantization pass that take (tfl.DQ (for both input and weights) -> op ->
tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with
(constant_quant).
https://github.com/tensorflow/mlir/blob/master/g3doc/Quantization.md#tflite-native-quantization
tfl passes
namespace mlir {
class FunctionPassBase;
class ModulePassBase;
namespace TFL {
// Creates an instance of the TensorFlow Lite dialect LegalizeTF pass.
FunctionPassBase *CreateLegalizeTFPass();
// Creates an instance of the TensorFlow Lite dialect Optimize pass.
FunctionPassBase *CreateOptimizePass();
// Creates an instance of the TensorFlow Lite dialect PrepareTF pass.
FunctionPassBase *CreatePrepareTFPass();
// Creates an instance of the TensorFlow Lite dialect LowerStaticTensorList
// pass.
ModulePassBase *CreateLowerStaticTensorListPass();
// Creates an instance of the TensorFlow Lite dialect Quantize pass.
FunctionPassBase *CreateQuantizePass();
// Creates an instance of the TensorFlow Lite dialect PrepareQuantize pass.
FunctionPassBase *CreatePrepareQuantizePass();
// Creates a instance of the TensorFlow Lite dialect PostQuantize pass.
FunctionPassBase *CreatePostQuantizePass(bool emit_quant_adaptor_ops);
} // namespace TFL
} // namespace mlir
quantization passes
• prepare-quantize

• Applies prepare quantization on the model in TFL dialect. This pass runs before
the quantization pass and propagate the quantization parameter across ops.
This step is necessary for post-training quantization and also making the
quantization rule for some operations in the quantization-aware training
quantization simpler.

• quantize

• tensorflow/compiler/mlir/lite/transforms/quantize.cc

• tensorflow/compiler/mlir/lite/transforms/quantize_patterns.td

• post-quantize

• Remove Quantization Adaptor Ops
TFL optimization
• activation into convolution

• an add op adding a constant value to a convolution op with constant
bias

• a mul op multiplying a constant value to a convolution op with
constant filter and bias

• quantize/dequantize

• fully connected with add

tensorflow/compiler/mlir/lite/transforms/optimize.cc
tensorflow/compiler/mlir/lite/transforms/optimize_patterns.td
control flow: tf.If()
func @main(%arg0: tensor<i1>, %arg1: tensor<1xf32>, %arg2: tensor<1xf32>) -> tensor<1xf32> {
%0 = "tf.Placeholder.input"(%arg0) : (tensor<i1>) -> tensor<i1>
%1 = "tf.Placeholder.input"(%arg1) : (tensor<1xf32>) -> tensor<1xf32>
%2 = "tf.Placeholder.input"(%arg2) : (tensor<1xf32>) -> tensor<1xf32>
%3 = "tf.If"(%0, %1, %2) {
else_branch = @testIfElse, then_branch = @testIfThen
} : (tensor<i1>, tensor<1xf32>, tensor<1xf32>) -> tensor<1xf32>
return %1 : tensor<1xf32>
}
func @testIfThen(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> {
return %arg0 : tensor<*xf32>
}
func @testIfElse(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> {
return %arg1 : tensor<*xf32>
}
tf.If() not legalized
$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_tf_if_main.mlir —tfl-legalize-tf
func @main(%arg0: tensor<i1>, %arg1: tensor<1xf32>, %arg2: tensor<1xf32>) -> tensor<1xf32> {
%0 = "tfl.pseudo_input"(%arg0) : (tensor<i1>) -> tensor<i1>
%1 = "tfl.pseudo_input"(%arg1) : (tensor<1xf32>) -> tensor<1xf32>
%2 = "tfl.pseudo_input"(%arg2) : (tensor<1xf32>) -> tensor<1xf32>
%3 = "tf.If"(%0, %1, %2) {
else_branch = @testIfElse, then_branch = @testIfThen
} : (tensor<i1>, tensor<1xf32>, tensor<1xf32>) -> tensor<1xf32>
return %1 : tensor<1xf32>
}
func @testIfThen(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> {
return %arg0 : tensor<*xf32>
}
func @testIfElse(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> {
return %arg1 : tensor<*xf32>
}
no tfl.if()?
• yes, there is no tfl.if() of equivalent in 

tensorflow/compiler/mlir/lite/ir/tfl_ops.{cc, h, td}
• however, we can convert the mlir in previous page to
TFLite flatbuffer, because there is

CustomOptionsOffset Translator::CreateIfOpCustomOptions(mlir::TF::IfOp op) {
int then_subgraph_index = subgraph_index_map_.at(op.getThen().str());
int else_subgraph_index = subgraph_index_map_.at(op.getElse().str());
auto flex_builder = absl::make_unique<flexbuffers::Builder>();
flex_builder->Map([&]() {
flex_builder->Int("then_subgraph_index", then_subgraph_index);
flex_builder->Int("else_subgraph_index", else_subgraph_index);
});
flex_builder->Finish();
return builder_.CreateVector(flex_builder->GetBuffer());
}
tensorflow/compiler/mlir/lite/flatbuffer_translate.cc
flatbuffer_translate --mlir-to-tflite-
flatbuffer
$ bazel-bin/tensorflow/compiler/
mlir/tf-opt ~/work/mlir/
test_tf_if_main.mlir --tfl-
legalize-tf | bazel-bin/
tensorflow/compiler/mlir/lite/
flatbuffer_translate --mlir-to-
tflite-flatbuffer | ./bazel-bin/
tensorflow/compiler/mlir/lite/
flatbuffer_to_string -
XLA
• simple div in TensoFlow

func @div(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> {
%0 = "tf.Div"(%arg0, %arg1) : (tensor<4xi32>, tensor<4xi32>) -> tensor<4xi32>
return %0 : tensor<4xi32>
}
• legalize to xla

$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/div.mlir —xla-legalize-tf
func @div(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> {
%0 = xla.div %arg0, %arg1 : tensor<4xi32>
return %0 : tensor<4xi32>
}
• legalize to standard mlir

$ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/div.mlir --xla-legalize-tf --xla-legalize-to-std
func @div(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> {
%0 = divis %arg0, %arg1 : tensor<4xi32>
return %0 : tensor<4xi32>
}
Recap: MLIR for TF and TFLite
• Conversion of Floating point models

• Infrastructure for quantized models is there

• Custom ops, such as the if control-flow could be done for mlir ->
flatbuffer

• How about LSTM? It seems something like OpHint [1] is not there yet

• XLA: some ops work

[1] https://www.tensorflow.org/api_docs/python/tf/lite/OpHint
Existing passes in MLIR repo in
early June
• Affine transformations: https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/Affine.md
• dma: https://docker.pkg.github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md#dma_start-
operation, https://docker.pkg.github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md#dma_wait-
operation
• Canonicalize: converting into a canonical form, https://github.com/tensorflow/mlir/blob/master/g3doc/
Canonicalization.md
• CSE
• Fixed point math: currently only two uniformly quantized optimizations supported
• Quant: convert const, convert to training time simulated values to quantize/dequantize cast
• https://github.com/tensorflow/mlir/blob/master/g3doc/Quantization.md
• Linalg dialect opts: https://github.com/tensorflow/mlir/blob/master/g3doc/Tutorials/Linalg/
• lower-affine, lower-to-llvm
• memref: memref is a mlir data type: https://github.com/tensorflow/mlir/blob/master/g3doc/
LangRef.md#memref-type
More Passes in TensorFlow
Using other passes?
• GPU: nvvmir, spirv, ..

• for codegen and other purposes

• linalg, affine, memref:

• tiling, polyhedral etc.

• NO, not yet

• MLIR is incremental. Things won’t happen overnight.
Fin

More Related Content

What's hot

Building Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARMBuilding Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARMSherif Mousa
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたMITSUNARI Shigeo
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data typesRowena Cornejo
 
Constexpr 中3女子テクニック
Constexpr 中3女子テクニックConstexpr 中3女子テクニック
Constexpr 中3女子テクニックGenya Murakami
 
FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料一路 川染
 
Vivado hls勉強会5(axi4 stream)
Vivado hls勉強会5(axi4 stream)Vivado hls勉強会5(axi4 stream)
Vivado hls勉強会5(axi4 stream)marsee101
 
Intel i7 Technologies
Intel i7 TechnologiesIntel i7 Technologies
Intel i7 TechnologiesBibhu Biswal
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architecturesAmit Kumar Rathi
 
Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage jemin lee
 
CSRを自動生成する!
CSRを自動生成する!CSRを自動生成する!
CSRを自動生成する!Taichi Ishitani
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetJames Wernicke
 
4章 Linuxカーネル - 割り込み・例外 3
4章 Linuxカーネル - 割り込み・例外 34章 Linuxカーネル - 割り込み・例外 3
4章 Linuxカーネル - 割り込み・例外 3mao999
 
Vivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリームVivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリームmarsee101
 
Heap exploitation
Heap exploitationHeap exploitation
Heap exploitationAngel Boy
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V IntroductionYi-Hsiu Hsu
 
Process' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxProcess' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxVarun Mahajan
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Intellectual property in vlsi
Intellectual property in vlsiIntellectual property in vlsi
Intellectual property in vlsiSaransh Choudhary
 

What's hot (20)

Building Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARMBuilding Embedded Linux Full Tutorial for ARM
Building Embedded Linux Full Tutorial for ARM
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみた
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
 
Constexpr 中3女子テクニック
Constexpr 中3女子テクニックConstexpr 中3女子テクニック
Constexpr 中3女子テクニック
 
FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料
 
Vivado hls勉強会5(axi4 stream)
Vivado hls勉強会5(axi4 stream)Vivado hls勉強会5(axi4 stream)
Vivado hls勉強会5(axi4 stream)
 
Intel i7 Technologies
Intel i7 TechnologiesIntel i7 Technologies
Intel i7 Technologies
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architectures
 
Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage
 
CSRを自動生成する!
CSRを自動生成する!CSRを自動生成する!
CSRを自動生成する!
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over Ethernet
 
4章 Linuxカーネル - 割り込み・例外 3
4章 Linuxカーネル - 割り込み・例外 34章 Linuxカーネル - 割り込み・例外 3
4章 Linuxカーネル - 割り込み・例外 3
 
Vivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリームVivado hlsのシミュレーションとhlsストリーム
Vivado hlsのシミュレーションとhlsストリーム
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Heap exploitation
Heap exploitationHeap exploitation
Heap exploitation
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
Process' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxProcess' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/Linux
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Intellectual property in vlsi
Intellectual property in vlsiIntellectual property in vlsi
Intellectual property in vlsi
 

Similar to A Sneak Peek of MLIR in TensorFlow

running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in ProductionMatthias Feys
 
TensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik PantTensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik PantDevatanu Banerjee
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Koan-Sin Tan
 
Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Sam Witteveen
 
190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pubJaewook. Kang
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
 
XML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLCXML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLCFeri Handoyo
 
Terraform + ansible talk
Terraform + ansible talkTerraform + ansible talk
Terraform + ansible talkJames Strong
 
Introduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxIntroduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxJanagi Raman S
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 

Similar to A Sneak Peek of MLIR in TensorFlow (20)

A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
TensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik PantTensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik Pant
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
 
Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017
 
190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
TensorFlow for HPC?
TensorFlow for HPC?TensorFlow for HPC?
TensorFlow for HPC?
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
Edge and ai
Edge and aiEdge and ai
Edge and ai
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
XML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLCXML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLC
 
Terraform + ansible talk
Terraform + ansible talkTerraform + ansible talk
Terraform + ansible talk
 
Introduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxIntroduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptx
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPUKoan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphonesKoan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchKoan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08Koan-Sin Tan
 

More from Koan-Sin Tan (14)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

A Sneak Peek of MLIR in TensorFlow

  • 1. A Sneak Peek of MLIR in TensorFlow Koan-Sin Tan freedom@computer.org Hsinchu Coding Serfs Meeting July 11th, 2019
  • 3. • MLIR is intended to be a hybrid IR which can support multiple different requirements in a unified infrastructure. For example, this includes: • The ability to represent all TensorFlow graphs, including dynamic shapes, the user-extensible op ecosystem, TensorFlow variables, etc. • Optimizations and transformations typically done on a TensorFlow graph, e.g. in Grappler. • Quantization and other graph transformations done on a TensorFlow graph or the TF Lite representation. • Representation of kernels for ML operations in a form suitable for optimization. • Ability to host high-performance-computing-style loop optimizations across kernels (fusion, loop interchange, tiling, etc.) and to transform memory layouts of data. • Code generation "lowering" transformations such as DMA insertion, explicit cache management, memory tiling, and vectorization for 1D and 2D register architectures. • Ability to represent target-specific operations, e.g. the MXU on TPUs. • non-goals: • low level machine code generation algorithms (like register allocation and instruction scheduling) • MLIR as a source language that end-users would themselves write kernels in analogous to CUDA C++ https://github.com/tensorflow/mlir/blob/master/README.md
  • 4. • Entire TensorFlow graph: nope, the “tf” dialect isn’t public yet • Initial MLIR for in TensorFLow repo on June 28th, 2019 • Early TF, TFLite and XLA support: floating point MobilenetV1 TF pb ! TFLite flatbuffer works • No, quantized ones don’t work yet although many components are there • Simple quant, fxp, affine, and vector code is there • So it’s possible to start exploring tiling and other techniques with affine, vector, and other dialects • more GPU supports, including Vulkan SPIR-V • Low-level code generation • MLIR relies on LLVM and other existing backends • Where to start • MLIR’s git repo has • links to 3 slide deck, one of them is a tutorial in Euro-LLVM 2019 • Docs for Toy lang and linear algebra dialect • TensorFlow MLIR: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir
  • 5. TF .pb -> TFLite .tflite • build TensorFlow MLIR related binaries bazel build --config opt tensorflow/compiler/mlir/... • get your model, e.g., wget http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz • convert it ./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf- output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_frozen.pb --tf-input-arrays=input -o /tmp/foo.tflite • yes, it works like a charm. Nope, not for quantized one? neither ./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_QUINT8 -tf- output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input -o /tmp/ bar.tflite nor ./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf- output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input -o /tmp/ bar.tflite —tf-inference-type=TF_QUINT8 works
  • 6. How the converter works? • Import from GraphDef, in .pb or .pbtxt format, into MLIR • Raise control-flow graph. Converts TF Control Flow dialect to TF dialect. • The Canonicalization pass iteratively applies canonicalization transformations in a greedy way until no further changes occur. Canonicalization includes constant folding. • The Legalize pass converts TensorFlow operations to TensorFlow Lite ones. The operations that cannot be mapped to TensorFlow Lite dialect are left as TensorFlow operations. Unsupported op handling follows the proposed TFLite mechanism. • Optimizations are performed in both the TF & TFLite dialect; aiming for small size and high performance (among the core value proposition of TensorFlow Lite models). • The Export pass writes out TensorFlow Lite FlatBuffer format. This pass operates on MLIR TensorFlow Lite dialect and is simple/direct translation. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/ README.md
  • 7. tf-mlir-translate • graphdef —> mlir $ ./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --help OVERVIEW: MLIR translation driver USAGE: tf-mlir-translate [options] <input file> OPTIONS: Color Options: --color - Use colors in output (default=autodetect) General options: --mlir-max-pattern-match-iterations=<uint> - Max number of iterations scanning the functions for pattern match --mlir-pretty-debuginfo - Print pretty debug info in MLIR output --mlir-print-debuginfo - Print debug info in MLIR output -o=<filename> - Output filename --remarks-yaml-string-table - Translation to perform --deserialize-spirv - deserialize-spirv --graphdef-to-mlir - graphdef-to-mlir --graphdef-to-splatted-mlir - graphdef-to-splatted-mlir --mlir-to-graphdef - mlir-to-graphdef --mlir-to-llvmir - mlir-to-llvmir --mlir-to-nvvmir - mlir-to-nvvmir --serialize-spirv - serialize-spirv --test-only-mlir-to-tf-nodedef - test-only-mlir-to-tf-nodedef --tf-debug-info=<string> - Path to the debug info file of the input graph def. --tf-inference-type=<string> - Sets the type of real-number arrays in the output file. Only allows float and quantized types --tf-input-arrays=<string> - Input tensor names, separated by ',' --tf-input-data-types=<string> - Input tensor data types, separated by ',' --tf-input-max-values=<string> - Sets the upper bound of the input data. Separated by ','; Each entry in the list should match an entry in -tf-input-arrays. This is used when -tf-inference-type is a quantized type. --tf-input-min-values=<string> - Sets the lower bound of the input data. Separated by ','; Each entry in the list should match an entry in -tf-input-arrays. This is used when -tf-inference-type is a quantized type. --tf-input-shapes=<string> - Input tensor shapes. Shapes for different tensors are separated by ':', and dimension sizes for the same tensor are separated by ',' --tf-output-arrays=<string> - Output tensor names, separated by ',' --tf-prune-unused-nodes - Prune unused nodes in the input graphdef --time-trace-granularity=<uint> - Minimum time granularity (in microseconds) traced by time profile
  • 8. _tf dialect ./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --graphdef-to-mlir -tf-input- shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/ mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input |less func @main(%arg0: tensor<1x224x224x3xf32>) -> tensor<1x1001xf32> attributes {tf.entry_function = {inputs = "input", outputs = "MobilenetV1/Predictions/Reshape_1"}} { %0:2 = "_tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/ beta", value = opaque<"tf", "0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020 73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3234335C3335305C3233355C3237375C3234345C3 330305C303335405C323134395C3337353D685F5C3333315C323736315A5C3333303F5C3232305C3232305C303137405C3235325C33 37375C273F5C3331315C32373523405C3231315C3336325C3237335C3237365C3336345C3230365C3234315C3237375C32373054655 C3237375C3237345C3333375C30323140695C3236355C30303040795C3233375C3237373F5C3230346F393F485C3333314D40515C33 32335C3237345C3237375C3230325C3234305C303335405C3233335C3230365C3233353E5C323633525C3337373F5C3030355C32343 25C3032315C3237375C3232305C3230332A5C323734405C3331355C725C3330305C3332345C3230335040235C3336325C3030375C33 30305C3237355C6E5C303235405C323735295C32323440515C3030345C3334325C3237365C333037465C303334405C3236375C33313 15C3236343F5C3232305C3233365C3237335C323736655C3330325440220A"> : tensor<32xf32>} : () -> (tensor<32xf32>, !_tf.control) %1:2 = "_tf.Identity"(%0#0) {T = "tfdtype$DT_FLOAT", _class = ["loc:@MobilenetV1/Conv2d_0/BatchNorm/ beta"], device = "", name = "MobilenetV1/Conv2d_0/BatchNorm/beta/read"} : (tensor<32xf32>) -> (tensor<32xf32>, !_tf.control) %2:2 = "_tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/ gamma", value = opaque<"tf", "0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020 73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3330315C3030305C3031373F5C333332776E3F5C3 233365C3334305C3230323F5C30303445643F675C3334344D3F2E345C3031363F425C3032325C3234363F5C313737595C3332353E62 5C303137773F4B5C3334355C3233323E5C3332365C3030326F3F5C323035515C3230303F5C323431665C303236405C3232335C32313 65C3032343F5C3231355C323235753F295C3230345C3232373F3F5C3337305C3236363F5C3237365C323736213F5C3332305C333630 5C323036405C3030345C3237355C3334343E5C3337305C22743F5C3235325C3233355C6E3F5C3031305C3031375C3233323F685C333 5315C3232373F5C3233365C3235317E3F5C303337435C3234343F675C3235326A3F5C32333752733F5C3235325C3335325C3232313F 77565C3233313F5C3030355C3032326C3F5C32313053573F220A"> : tensor<32xf32>} : () -> (tensor<32xf32>, ! _tf.control)
  • 9. TensorFlow Dialects #include "tensorflow/compiler/mlir/tensorflow/ir/control_flow_ops.h" #include "tensorflow/compiler/mlir/tensorflow/ir/tf_executor.h" #include "tensorflow/compiler/mlir/tensorflow/ir/tf_ops.h" using namespace mlir; // Static initialization for TF dialect registration. static DialectRegistration<TFControlFlow::TFControlFlowDialect> TFControlFlowOps; static DialectRegistration<TF::TensorFlowDialect> TFOps; static DialectRegistration<tf_executor::TensorFlowExecutorDialect> TfExecutorDialect; tensorflow/compiler/mlir/tensorflow/ir/dialect_registration.cc
  • 10. TensorFlow Dialects • More on TensorFlow dialects: • tf: the main dialect, representing the regular operations in a TensorFlow graph (the ones that don’t have special contract with the executor). • tf_executor:  dialect that represents the execution model of the TensorFlow executor (e.g., control dependencies, deadness propagation) • _tf: It's said in the TensorFlow MLIR open source announcement mail thread, https:// groups.google.com/a/tensorflow.org/forum/#!topic/mlir/xe522DD4ZYA, that control flow dialect "_tf" is temporary. • "One intent of this design is that TensorFlow 2.x features can choose to target just the tf dialect, allowing us to phase out the tf_executor dialect in subsequent TensorFlow releases. The combination of the two dialects allows to represent arbitrary existing TensorFlow graphs." [1] [1] "https://github.com/tensorflow/community/pull/115
  • 11. tf dialect ./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --graphdef-to-mlir -tf-input- shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/ mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input | ./bazel-bin/tensorflow/compiler/mlir/tf- opt --tf-raise-control-flow |less func @main(%arg0: tensor<1x224x224x3xf32>) -> tensor<1x1001xf32> attributes {tf.entry_function = {inputs = "input", outputs = "MobilenetV1/Predictions/Reshape_1"}} { %cst = "tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/ beta", value = opaque<"tf", "0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020 73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3234335C3335305C3233355C3237375C3234345C3 330305C303335405C323134395C3337353D685F5C3333315C323736315A5C3333303F5C3232305C3232305C303137405C3235325C33 37375C273F5C3331315C32373523405C3231315C3336325C3237335C3237365C3336345C3230365C3234315C3237375C32373054655 C3237375C3237345C3333375C30323140695C3236355C30303040795C3233375C3237373F5C3230346F393F485C3333314D40515C33 32335C3237345C3237375C3230325C3234305C303335405C3233335C3230365C3233353E5C323633525C3337373F5C3030355C32343 25C3032315C3237375C3232305C3230332A5C323734405C3331355C725C3330305C3332345C3230335040235C3336325C3030375C33 30305C3237355C6E5C303235405C323735295C32323440515C3030345C3334325C3237365C333037465C303334405C3236375C33313 15C3236343F5C3232305C3233365C3237335C323736655C3330325440220A"> : tensor<32xf32>} : () -> tensor<32xf32> %0 = "tf.Identity"(%cst) {T = "tfdtype$DT_FLOAT", _class = ["loc:@MobilenetV1/Conv2d_0/BatchNorm/beta"], device = "", name = "MobilenetV1/Conv2d_0/BatchNorm/beta/read"} : (tensor<32xf32>) -> tensor<32xf32> %cst_0 = "tf.Const"() {device = "", dtype = "tfdtype$DT_FLOAT", name = "MobilenetV1/Conv2d_0/BatchNorm/ gamma", value = opaque<"tf", "0x746674656E736F722464747970653A2044545F464C4F41540A74656E736F725F7368617065207B0A202064696D207B0A20202020 73697A653A2033320A20207D0A7D0A74656E736F725F636F6E74656E743A20225C3330315C3030305C3031373F5C333332776E3F5C3 233365C3334305C3230323F5C30303445643F675C3334344D3F2E345C3031363F425C3032325C3234363F5C313737595C3332353E62 5C303137773F4B5C3334355C3233323E5C3332365C3030326F3F5C323035515C3230303F5C323431665C303236405C3232335C32313 65C3032343F5C3231355C323235753F295C3230345C3232373F3F5C3337305C3236363F5C3237365C323736213F5C3332305C333630 5C323036405C3030345C3237355C3334343E5C3337305C22743F5C3235325C3233355C6E3F5C3031305C3031375C3233323F685C333 5315C3232373F5C3233365C3235317E3F5C303337435C3234343F675C3235326A3F5C32333752733F5C3235325C3335325C3232313F 77565C3233313F5C3030355C3032326C3F5C32313053573F220A"> : tensor<32xf32>} : () -> tensor<32xf32>
  • 12. Leaky ReLU • a LeakyReLU example func @teatLeakyReLU(%1: tensor<*xf32>) -> tensor<*xf32> { %2 = "tf.LeakyRelu"(%1) { alpha = 0.1 : f32 } : (tensor<*xf32>) -> tensor<*xf32> return %2 : tensor<*xf32> } • round trip $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_leaky_relu.mli func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> { %0 = "tf.LeakyRelu"(%arg0) {alpha = 1.000000e-01 : f32} : (tensor<*xf32>) -> tensor<*xf32> return %0 : tensor<*xf32> }
  • 13. Leaky ReLU w/ alpha = 1.0 • a LeakyReLU example func @teatLeakyReLU(%1: tensor<*xf32>) -> tensor<*xf32> { %2 = "tf.LeakyRelu"(%1) { alpha = 1.0 : f32 } : (tensor<*xf32>) -> tensor<*xf32> return %2 : tensor<*xf32> } • round trip $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_leaky_relu.mli func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> { %0 = "tf.LeakyRelu"(%arg0) {alpha = 1.000000e+00 : f32} : (tensor<*xf32>) -> tensor<*xf32> return %0 : tensor<*xf32> } • constant folding $ bazel-bin/tensorflow/compiler/mlir/tf-opt --test-constant-fold ~/work/mlir/test_leaky_relu.mli func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> { return %arg0 : tensor<*xf32> } • canonicalization $ bazel-bin/tensorflow/compiler/mlir/tf-opt —canonicalize ~/work/mlir/test_leaky_relu.mli func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> { return %arg0 : tensor<*xf32> }
  • 14. Leaky ReLU Legalization • a LeakyReLU, alpha = 0.1 func @teatLeakyReLU(%1: tensor<*xf32>) -> tensor<*xf32> { %2 = "tf.LeakyRelu"(%1) { alpha = 0.1 : f32 } : (tensor<*xf32>) -> tensor<*xf32> return %2 : tensor<*xf32> } • Leaky ReLU legalization, alpha = 0.1 $ bazel-bin/tensorflow/compiler/mlir/tf-opt --tfl-legalize-tf ~/work/mlir/test_leaky_relu.mli func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> { %0 = “tfl.leaky_relu”(%arg0) {alpha = 1.000000e-01 : f32} : (tensor<*xf32>) -> tensor<*xf32> return %0 : tensor<*xf32> } • Leaky ReLU legalization, alpha = 1.0 $ bazel-bin/tensorflow/compiler/mlir/tf-opt --tfl-legalize-tf ~/work/mlir/test_leaky_relu.mli func @teatLeakyReLU(%arg0: tensor<*xf32>) -> tensor<*xf32> { return %arg0 : tensor<*xf32> }
  • 15. tf —> tfl: Conv2D+BiasAdd+Relu ! conv_2d
  • 16. tf.FakeQuant() • simple FakeQuant func @testValidFakeQuantWithMinMaxArgs(%arg0: tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32> { %0 = "tf.FakeQuantWithMinMaxArgs"(%arg0) {max = 1.000000e+00 : f32, min = -1.000000e+00 : f32, num_bits = 3 : i64} : (tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32> return %0 : tensor<8x8x8x8xf32> } • legalize to tfl $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_fake_quant.mlir --tfl-legalize-tf func @testValidFakeQuantWithMinMaxArgs(%arg0: tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32> { %0 = "tfl.quantize"(%arg0) {qtype = tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>} : (tensor<8x8x8x8xf32>) -> tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>> %1 = "tfl.dequantize"(%0) : (tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>) -> tensor<8x8x8x8xf32> return %1 : tensor<8x8x8x8xf32> } • --tfl-post-quantize $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_fake_quant.mlir --tfl-legalize-tf --tfl-post-quantize func @testValidFakeQuantWithMinMaxArgs(%arg0: tensor<8x8x8x8xf32>) -> tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>> { %0 = "tfl.quantize"(%arg0) {qtype = tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>>} : (tensor<8x8x8x8xf32>) -> tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>> return %0 : tensor<8x8x8x8x!quant.uniform<u8:f32, 0.0078431372549019607:128>> }
  • 17. TFLite Native Quantization • Take input min/max information and set the ArrayInfo (which really is InputOrOutputArrayInfo). • In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q -> tfl.DQ). • Hardcode logic/propagation needs to happen here. • Run TF constant folding. • In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ). • Run quantization pass that take (tfl.DQ (for both input and weights) -> op -> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with (constant_quant). https://github.com/tensorflow/mlir/blob/master/g3doc/Quantization.md#tflite-native-quantization
  • 18. tfl passes namespace mlir { class FunctionPassBase; class ModulePassBase; namespace TFL { // Creates an instance of the TensorFlow Lite dialect LegalizeTF pass. FunctionPassBase *CreateLegalizeTFPass(); // Creates an instance of the TensorFlow Lite dialect Optimize pass. FunctionPassBase *CreateOptimizePass(); // Creates an instance of the TensorFlow Lite dialect PrepareTF pass. FunctionPassBase *CreatePrepareTFPass(); // Creates an instance of the TensorFlow Lite dialect LowerStaticTensorList // pass. ModulePassBase *CreateLowerStaticTensorListPass(); // Creates an instance of the TensorFlow Lite dialect Quantize pass. FunctionPassBase *CreateQuantizePass(); // Creates an instance of the TensorFlow Lite dialect PrepareQuantize pass. FunctionPassBase *CreatePrepareQuantizePass(); // Creates a instance of the TensorFlow Lite dialect PostQuantize pass. FunctionPassBase *CreatePostQuantizePass(bool emit_quant_adaptor_ops); } // namespace TFL } // namespace mlir
  • 19. quantization passes • prepare-quantize • Applies prepare quantization on the model in TFL dialect. This pass runs before the quantization pass and propagate the quantization parameter across ops. This step is necessary for post-training quantization and also making the quantization rule for some operations in the quantization-aware training quantization simpler. • quantize • tensorflow/compiler/mlir/lite/transforms/quantize.cc • tensorflow/compiler/mlir/lite/transforms/quantize_patterns.td • post-quantize • Remove Quantization Adaptor Ops
  • 20. TFL optimization • activation into convolution • an add op adding a constant value to a convolution op with constant bias • a mul op multiplying a constant value to a convolution op with constant filter and bias • quantize/dequantize • fully connected with add tensorflow/compiler/mlir/lite/transforms/optimize.cc tensorflow/compiler/mlir/lite/transforms/optimize_patterns.td
  • 21. control flow: tf.If() func @main(%arg0: tensor<i1>, %arg1: tensor<1xf32>, %arg2: tensor<1xf32>) -> tensor<1xf32> { %0 = "tf.Placeholder.input"(%arg0) : (tensor<i1>) -> tensor<i1> %1 = "tf.Placeholder.input"(%arg1) : (tensor<1xf32>) -> tensor<1xf32> %2 = "tf.Placeholder.input"(%arg2) : (tensor<1xf32>) -> tensor<1xf32> %3 = "tf.If"(%0, %1, %2) { else_branch = @testIfElse, then_branch = @testIfThen } : (tensor<i1>, tensor<1xf32>, tensor<1xf32>) -> tensor<1xf32> return %1 : tensor<1xf32> } func @testIfThen(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> { return %arg0 : tensor<*xf32> } func @testIfElse(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> { return %arg1 : tensor<*xf32> }
  • 22. tf.If() not legalized $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/test_tf_if_main.mlir —tfl-legalize-tf func @main(%arg0: tensor<i1>, %arg1: tensor<1xf32>, %arg2: tensor<1xf32>) -> tensor<1xf32> { %0 = "tfl.pseudo_input"(%arg0) : (tensor<i1>) -> tensor<i1> %1 = "tfl.pseudo_input"(%arg1) : (tensor<1xf32>) -> tensor<1xf32> %2 = "tfl.pseudo_input"(%arg2) : (tensor<1xf32>) -> tensor<1xf32> %3 = "tf.If"(%0, %1, %2) { else_branch = @testIfElse, then_branch = @testIfThen } : (tensor<i1>, tensor<1xf32>, tensor<1xf32>) -> tensor<1xf32> return %1 : tensor<1xf32> } func @testIfThen(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> { return %arg0 : tensor<*xf32> } func @testIfElse(%arg0: tensor<*xf32>, %arg1: tensor<*xf32>) -> tensor<*xf32> { return %arg1 : tensor<*xf32> }
  • 23. no tfl.if()? • yes, there is no tfl.if() of equivalent in tensorflow/compiler/mlir/lite/ir/tfl_ops.{cc, h, td} • however, we can convert the mlir in previous page to TFLite flatbuffer, because there is CustomOptionsOffset Translator::CreateIfOpCustomOptions(mlir::TF::IfOp op) { int then_subgraph_index = subgraph_index_map_.at(op.getThen().str()); int else_subgraph_index = subgraph_index_map_.at(op.getElse().str()); auto flex_builder = absl::make_unique<flexbuffers::Builder>(); flex_builder->Map([&]() { flex_builder->Int("then_subgraph_index", then_subgraph_index); flex_builder->Int("else_subgraph_index", else_subgraph_index); }); flex_builder->Finish(); return builder_.CreateVector(flex_builder->GetBuffer()); } tensorflow/compiler/mlir/lite/flatbuffer_translate.cc
  • 24. flatbuffer_translate --mlir-to-tflite- flatbuffer $ bazel-bin/tensorflow/compiler/ mlir/tf-opt ~/work/mlir/ test_tf_if_main.mlir --tfl- legalize-tf | bazel-bin/ tensorflow/compiler/mlir/lite/ flatbuffer_translate --mlir-to- tflite-flatbuffer | ./bazel-bin/ tensorflow/compiler/mlir/lite/ flatbuffer_to_string -
  • 25. XLA • simple div in TensoFlow func @div(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> { %0 = "tf.Div"(%arg0, %arg1) : (tensor<4xi32>, tensor<4xi32>) -> tensor<4xi32> return %0 : tensor<4xi32> } • legalize to xla $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/div.mlir —xla-legalize-tf func @div(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> { %0 = xla.div %arg0, %arg1 : tensor<4xi32> return %0 : tensor<4xi32> } • legalize to standard mlir $ bazel-bin/tensorflow/compiler/mlir/tf-opt ~/work/mlir/div.mlir --xla-legalize-tf --xla-legalize-to-std func @div(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> { %0 = divis %arg0, %arg1 : tensor<4xi32> return %0 : tensor<4xi32> }
  • 26. Recap: MLIR for TF and TFLite • Conversion of Floating point models • Infrastructure for quantized models is there • Custom ops, such as the if control-flow could be done for mlir -> flatbuffer • How about LSTM? It seems something like OpHint [1] is not there yet • XLA: some ops work [1] https://www.tensorflow.org/api_docs/python/tf/lite/OpHint
  • 27. Existing passes in MLIR repo in early June
  • 28. • Affine transformations: https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/Affine.md • dma: https://docker.pkg.github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md#dma_start- operation, https://docker.pkg.github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md#dma_wait- operation • Canonicalize: converting into a canonical form, https://github.com/tensorflow/mlir/blob/master/g3doc/ Canonicalization.md • CSE • Fixed point math: currently only two uniformly quantized optimizations supported • Quant: convert const, convert to training time simulated values to quantize/dequantize cast • https://github.com/tensorflow/mlir/blob/master/g3doc/Quantization.md • Linalg dialect opts: https://github.com/tensorflow/mlir/blob/master/g3doc/Tutorials/Linalg/ • lower-affine, lower-to-llvm • memref: memref is a mlir data type: https://github.com/tensorflow/mlir/blob/master/g3doc/ LangRef.md#memref-type
  • 29. More Passes in TensorFlow
  • 30. Using other passes? • GPU: nvvmir, spirv, .. • for codegen and other purposes • linalg, affine, memref: • tiling, polyhedral etc. • NO, not yet • MLIR is incremental. Things won’t happen overnight.
  • 31. Fin