SlideShare a Scribd company logo
1 of 31
Download to read offline
Koan-Sin Tan,
freedom@computer.org
COSCUP, Aug 2nd, 2020
TensorFlow Runtime
A Peek into the Future of TensorFlow
1
• disclaimer: opinions are my own

• feel free to interrupt me if you have any questions during the presentation

• questions could be Taiwanese, English, or Mandarin

• most of TFRT materials are adapted from TFRT deep dive in MLIR design meeting [1] and TFRT docs [2]

• code around Aug 1, 2020 (git commit ecf1c20 [3])

[1] TFRT Deep Dive,  slides - recording, https://mlir.llvm.org/talks/

[2] https://github.com/tensorflow/runtime/tree/master/documents

[3] https://github.com/tensorflow/runtime/commit/ecf1c20
2
• Used open source before the term “open
source” is used
• A software guy, learned to use Unix and open
source software on VAX-11/780 running 4.3BSD
• Used to be a programming language junkie
• Worked on various system software, e.g., CPU
scheduling and power management of non-
CPU components
• Recently, on NN performance on edge devices
related stuff
• Contributed from time to time to TensorFlow Lite
• started a command line label_image for TFLite
who i am
https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg
3
What is TFRT
• TensorFlow Runtime (TFRT) is one of the two new MLIR runtimes emerged in 2020 so far. 

• The other one is Intermediate Representation Execution Environment, IREE. It seems
so far tfrt has better design documentation

• Both of them have mobile / edge environment in mind. 

• I didn’t see mobile accelerated code in TFRT yet. 

• IREE has some Vulkan related code and some simple code works on Android already

• ResNet GPU inference is 28% faster with TFRT

• https://github.com/tensorflow/runtime, https://youtu.be/15tiQoPpuZ8
4
Build it
• if you follow the instructions described in README.md, it should just work. At least on x86_64 linux.

• however, it’s not tested for non Linux environment yet

• ssize_t and int64_t

• on Mac OS X: ssize_t: long, int64_t: long long
• current code mixed the use of ssize_t and int64_t

• test: one the acclaimed features of TFRT, like MLIR, is its use of 

LLVM FileCheck

• my hacks, shape related (ssize_t) tests not fixed yet

• it’s not tested on non-x86 platforms, such as aarch64, either 

•
5
• The three key directories under the TFRT root directory are

• lib: Contains core TFRT infrastructure code

• backends: Contains device specific infrastructure and op/kernel implementations

• include: Contains public header files for core TFRT infrastructure
6
Walking thru the tutorial
• unfortunately, it seems it’s not easy to jump directly into source code without having
some background knowledge

• so we’ll walk thru the tutorial [1]

• What are in the tutorial

• print hello world

• print integer

• adding kernels

[1] https://github.com/tensorflow/runtime/blob/master/documents/tutorial.md
7
using tfrt and tfrt_test
hello.mlir
func @hello() {
%chain = tfrt.new.chain
// Create a string containing "hello world" and store it in %hello.
%hello = "tfrt_test.get_string"() { string_attr = "hello world" } : () -> !tfrt.string
// Print the string in %hello.
"tfrt_test.print_string"(%hello, %chain) : (!tfrt.string, !tfrt.chain) -> !tfrt.chain
tfrt.return
}
The ‘@hello function above shows how to create and print a string. The text after each ‘:’ specifies the types involved:

• ()->!tfrt.string means that tfrt_test.get_string takes no arguments and returns a !tfrt.string. tfrt is a
MLIR dialect prefix (or namespace) for TFRT

• (!tfrt.string, !tfrt.chain) -> !tfrt.chain means that tfrt_test.print_string takes two arguments (!
tfrt.string and !tfrt.chain) and returns a !tfrt.chain. chain [1] is a TFRT abstraction to manage dependencies

[1] https://github.com/tensorflow/runtime/blob/master/documents/explicit_dependency.md
8
hello world in MLIR
func @stringconstant() -> !llvm<"[12 x i8]"> {
%1 = llvm.constant("Hello world!") : !llvm<"i8*">
// CHECK: ret [12 x i8] c"Hello world!"
llvm.return %1 : !llvm<"i8*">
}
func @main() {
%0 = llvm.constant(0) : !llvm.i64
%1 = call @stringconstant() : () -> !llvm<"[12 x i8]">
%2 = llvm.getelementptr %1[%0] : (!llvm<"[12 x i8]">, !llvm.i64) -> !llvm<"i8*">
%3 = llvm.bitcast %2 : !llvm<"i8*"> to !llvm<"i8*">
%32 = llvm.call @puts(%2) : (!llvm<"i8*">) -> !llvm.i32
return
}
func @puts(!llvm<"i8*">) -> !llvm.i32
• MLIR “standard dialect” doesn’t have I/O functions 

• there is LLVM dialect, of course we can use LLVM to call standard libc
function
9
Hello integer
func @hello_integers() {
%chain = tfrt.new.chain
// Create an integer containing 42.
%forty_two = tfrt.constant.i32 42
// Print 42.
tfrt.print.i32 %forty_two, %chain
tfrt.return
}
• as stated in the tutorial, we can run other functions in the same modular

• we can turn to more basic ones, such as integers or floating point numbers

• @hello_integers shows how to create and print integers

• This example does not have the verbose type information we saw in @hello because there are
custom parsers for the tfrt.constant.i32 and tfrt.print.32 kernels in
basic_kernels.td
10
basic_kernels.td
• .td (table description?) files are for LLVM TableGen

[1] TableGen, https://llvm.org/docs/TableGen/
class ConstantOp<string suffix, Type baseType, Attr attr>
: TFRT_Op<"constant." # suffix, [NoSideEffect]> {
let summary = "host executor constant value constructor";
let arguments = (ins attr:$value);
let results = (outs baseType);
}
class PrintOp<string suffix, Type type> : TFRT_Op<"print." # suffix> {
let summary = "tfrt.print operation";
let description = [{
An operation takes a number input and a chain input.
It prints the number to stdout and returns a chain output.
The chain input must be the second operand.
Example:
%2 = tfrt.print.i32 %0, %1
}];
let arguments = (ins type, TFRT_ChainType);
let results = (outs TFRT_ChainType);
let assemblyFormat = "operands attr-dict";
let verifier = ?;
}
https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L376-L390
https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L58-L64
11
Define kernels
12
user defined kernels
func @print_coordinate() {
%chain = tfrt.new.chain
%two = tfrt.constant.i32 2
%four = tfrt.constant.i32 4
%coordinate = "my.create_coordinate"(%two, %four) : (i32, i32) -> !my.coordinate
"my.print_coordinate"(%coordinate, %chain) : (!my.coordinate, !tfrt.chain) -> !tfrt.chain
tfrt.return
}
coordinate.mlir shows several TFRT features:

• MLIR types that begin with exclamation mark (!) are user-defined types like !my.coordinate,
compared to built-in types like i32

• Kernels are just C++ functions with a name in MLIR: my.print_coordinate is the MLIR name for
the C++ PrintCoordinate function

• Kernels may pass arbitrary user-defined types: my.create_coordinate passes a custom
Coordinate struct to my.print_coordinate 13
to dig into some code we need
more system information
14
Host Runtime
15
• TensorFlow user passes into TFRT a
TensorFlow graph created via high-level
TensorFlow APIs, and

• TFRT then calls the MLIR-based graph
compiler to optimize and lower the
graph into BEF, a Binary Executable
Format for TFRT graph execution (MLIR
is the compiler infrastructure that we
use to represent TFRT host programs). 

• The blue arrows in the simplified
TensorFlow training stack diagram
show this flow.
16
• In the README.md we are told to build two
binaries: tfrt_translate and bef_excutor

• tfrt_translate

• The tfrt_translate program does round trip
translation between MLIR and BEF, similar
to an assembler and disassembler.

• bef_executor

• The bef_executor program is the
execution driver of BEF files. It reads in a
BEF file, sets up runtime, and
asynchronously executes function(s) in
that file.
17
TFRT Host Runtime
• Foundation of TFRT: schedules work on the host and devices

• Clean separation between host and device runtimes:

• Host runtime does not know anything about devices, just their runtimes (sets of kernels) 

• Key design points:

• Fully asynchronous - kernel executions can not block

• Excellent error propagation in the presence of asynchrony

• Performance as a first-class concern, for graph and eager

• Outline:

• Common runtime infrastructure

• Graph execution

• Op-by-op execution (“eager”)
18
• Container for data or resources

• Not Tensor specific

• A “future” type, fulfilled with exactly one value, or an error

• Lock-free, low memory overhead, type erased, reference
counted	 

• Helper class AsyncValueRef<T> provides type safety when
contained type is known
• AsyncValues enable efficient asynchronous compute

• Asynchronous functions return unavailable AsyncValues
• Caller can schedule dependent
computations with AsyncValue::AndThen()
• Caller need not block until AsyncValue
becomes available
Key Abstraction: AsyncValue 

https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/async_value.h
19
Kernels
• Kernel: unit of computation scheduled by the runtime

• Similar to kernel concept in current TensorFlow

• Kernels accept AsyncValue inputs and produce AsyncValue output

• Runtime coordinates dataflow of AsyncValues between kernels

• Outputs may not be immediately available, unlike current TensorFlow

• Runtime generally does not understand kernel semantics
//	Kernel	that	adds	two	integers.	
//	AsyncKernelFrame	holds	the	kernel’s	arguments	and	results.	
static	void	TFRTAdd(AsyncKernelFrame*	frame)	{	
		//	Fetch	the	kernel’s	0th	argument.	
		AsyncValue*	arg1	=	frame->GetArgAt(0);	
		//	Fetch	the	kernel’s	1st	argument.	
		AsyncValue*	arg2	=	frame->GetArgAt(1);	
		int	v1	=	arg1->get<int>();	
		int	v2	=	arg2->get<int>();	
		//	Set	the	kernel’s	0th	result.	
		frame->EmplaceResultAt<int>(0,	v1	+	v2);	
}	
https://github.com/tensorflow/runtime/blob/master/documents/tfrt_host_runtime_design.md
https://github.com/tensorflow/runtime/blob/master/lib/basic_kernels/integer_kernels.cc#L39-L45
https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/kernel_utils.h#L61-L149
20
Host Program
• Host programs encode a dataflow graph

• Similar to GraphDef in current TensorFlow

• Expressed in MLIR. Typically compiler generated

• Designed for low-level dispatch efficiency

• Designed for compiler transformations and analysis, e.g., 

• Use dataflow analysis for buffer reuse
func @sample_function() -> i32 {
%one = tfrt.constant.i32 1 // Make AsyncValue with value 1
%two = tfrt.constant.i32 2 // Make AsyncValue with value 2
%three = tfrt.add.i32 %one, %two // Make AsyncValue with value 3 (1+2)
%ch0 = tfrt.new.chain
tfrt.print.i32 %three, %ch0 // Print AsyncValue %three
tfrt.return %three : i32 // Return AsyncValue %three
}
21
TFRT Binary Executable Format (BEF)
• BEF encodes a hardware-specific lowered graph
function

• Primary interface between compiler and runtime 

• Designed for efficient execution

• Low overhead: execute program by reading mmap’d
byte array 

• Persistent and stable: Compile once offline, run
many times 

online. Great for inference use-cases 

• Composed of sections, similar to ELF. Each section
has its own format 

• Extensible: BEF is versioned, reader ignores unknown
sections, new versions may define new sections 
 https://github.com/tensorflow/runtime/blob/master/documents/binary_executable_format.md
22
BEF Executor
• BEF Executor evaluates a BEF dataflow graph “executor” style:

• Not a bytecode-like interpreter: no concept of program counter

• “Strict” execution by default: run a kernel only when all its inputs are available

• Executor features:

• Lock-free: atomics instead of mutexes

• Non-blocking: defer dependent work with AsyncValue::AndThen

• Supports “non-strict” execution: may run a kernel when some of its
inputs are available

• Good for efficiently forwarding unavailable inputs to outputs

• Key concepts:

• BEF: dataflow graph

• Kernel: dataflow node

• AsyncValues: dataflow edge
https://github.com/tensorflow/runtime/blob/master/lib/bef_executor/bef_interpreter.cc#L223-L25423
Host Runtime Summary 

24
How about Core Runtime?
• Surely, we can do similar walkthrough, but that will takes more time

• Two things

• Op Execution API, Execute()

• BEF Executor can handle it too
void CoreRuntime::Impl::Execute(const ExecutionContext& exec_ctx,
string_view op_name, OpHandler* op_handler,
MutableArrayRef<TensorHandle> arguments,
const OpAttrsRef& attrs,
MutableArrayRef<TensorHandle> results,
AsyncValueRef<Chain>* chain) {
// Ask the op_handler to execute the op. If successful, we're done.
auto op_handle = op_handler->MakeOp(op_name);
if (op_handle) {
op_handle.get()(exec_ctx, arguments, attrs, results, chain);
return;
}
// Otherwise, we fail with an 'unknown op' error.
auto err =
EmitErrorAsync(exec_ctx, "op '" + op_name.str() + "' is not supported");
for (auto& result : results) result = TensorHandle(err.CopyRef());
if (chain) *chain = std::move(err);
}
25
https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/core_runtime.cc#L124-L143
https://github.com/tensorflow/runtime/blob/master/documents/
tfrt_op_by_op_execution_design.md
BEF Executor for “op” graph
• corert.executeop

• sample
26
https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/kernels.cc
func @example() -> !tfrt.chain {
%cpu = corert.get_op_handler("cpu")
// Create TensorHandles
%lhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] }
%rhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] }
%result = corert.executeop(%cpu) "test.add" (%lhs, %rhs)
%ch0 = tfrt.new.chain
%ch1 = corert.print_tensorhandle(%result, %ch0)
tfrt.return %ch1 : !tfrt.chain
}
func @example() -> !tfrt.chain {
%ch0 = tfrt.new.chain
%cpu = corert.get_op_handler %ch0 "cpu"
// Create TensorHandles
%lhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] } : 1
%rhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] } : 1
%result = corert.executeop(%cpu) "test.add" (%lhs, %rhs) : 1
%ch1 = "corert.print_tensorhandle"(%result, %ch0) : (!corert.tensorhandle, !tfrt.chain) -> !tfrt.chain
tfrt.return %ch1 : !tfrt.chain
}
Device Runtime
CPU
27
//===----------------------------------------------------------------------===//
// CPU Relu kernels
//===----------------------------------------------------------------------===//
// Computes B = Relu(A).
template <typename T>
static AsyncValueRef<Chain> Relu(const DenseHostTensor& A, DenseHostTensor* B,
const ExecutionContext& exec_ctx) {
auto fn = [](auto& a, auto& b) { return a.cwiseMax(static_cast<T>(0)); };
return ::tfrt::compat::UnaryEigenKernelAsync<T, T>(A, B, std::move(fn),
exec_ctx);
}
//===----------------------------------------------------------------------===//
// CPU BiasAdd kernels
//===----------------------------------------------------------------------===//
// A special case of tf.add where bias is restricted to be 1-D.
// Currently only support NHWC data format.
template <typename T, size_t RANK>
static AsyncValueRef<Chain> BiasAdd(const DenseHostTensor& input,
const DenseHostTensor& bias,
DenseHostTensor* output,
const ExecutionContext& exec_ctx) {
DHTIndexableView<T, RANK> input_view(&input);
MutableDHTIndexableView<T, RANK> output_view(output);
DHTIndexableView<T, 1> bias_view(&bias);
const auto& shape_input = input_view.FixedShape();
const auto& shape_bias = bias_view.FixedShape();
const auto& shape_output = output_view.FixedShape();
if (shape_input != shape_output) {
return EmitErrorAsync(exec_ctx, "unexpected output shape");
}
if (shape_bias[0] != shape_input[RANK - 1]) {
return EmitErrorAsync(exec_ctx, "bias shape does not match input shape");
}
// Reshape bias to the shape of input. Broadcast along the last axis of input.
Eigen::array<Eigen::Index, RANK> reshape_dims;
Eigen::array<Eigen::Index, RANK> broadcast_dims;
for (size_t i = 0; i < RANK - 1; ++i) {
reshape_dims[i] = static_cast<Eigen::Index>(1);
broadcast_dims[i] = static_cast<Eigen::Index>(shape_input[i]);
}
reshape_dims[RANK - 1] = static_cast<Eigen::Index>(shape_bias[0]);
broadcast_dims[RANK - 1] = static_cast<Eigen::Index>(1);
auto input_t = AsEigenConstTensor(input_view);
auto bias_t = AsEigenConstTensor(bias_view);
auto output_t = AsEigenTensor(output_view);
auto expr = input_t + bias_t.reshape(reshape_dims).broadcast(broadcast_dims);
return AsyncAssign(
exec_ctx.host()->GetOrCreateSharedContext<EigenHostContext>(),
std::move(output_t), std::move(expr),
KeepBuffers::alive(&input, &bias, output));
}
https://github.com/tensorflow/runtime/blob/master/backends/cpu/lib/kernels/cpu_kernels.h
Dialects we can see now
• tfrt: we know what this is for

• tfrt_test: to test tfrt

• tfrt_data: tf.data, to deal with input pipeline

• tfrt_dht: dense host tensor

• corert: Core Runtime, eager execution

• ts: tensor shape

• coo: COOrdinate list sparse tensor

• eigen: wrapper around the eigen library

• btf: binary tensor format

• cuda: you know what cuda means :-)
28
Concluding Remarks
• MLIR related talks and publications, https://mlir.llvm.org/talks/

• We scratched the surface of TFRT host runtime and core runtime. There are more details

• threading model: thread pool / work queue,

• memory allocation: tcmalloc for server, other small allocators for embedded systems,

• non-strict execution, and

• registers: BEF executor is a register machine

• we didn’t touch other important components such as device runtimes, eps. the GPU
part, and distributed environment
29
Fin
30
Device Runtime Design Principles 

• A thin wrapper of low-level (driver) APIs, exposing device capabilities to graph compiler

• Memory Allocation

• Async host <-> device transfer, and kernel execution

• Dependency management

• Focus on mechanism instead of policy

• E.g. No built-in special-purpose streams for GPU support:
• For pure eager execution, can default to one stream for everything 

• For tf.function execution, compiler can pick streams
31

More Related Content

What's hot

Tiny ML for spark Fun Edge
Tiny ML for spark Fun EdgeTiny ML for spark Fun Edge
Tiny ML for spark Fun Edge艾鍗科技
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateLinaro
 
20分でわかるgVisor入門
20分でわかるgVisor入門20分でわかるgVisor入門
20分でわかるgVisor入門Shuji Yamada
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
C# 8.0 非同期ストリーム
C# 8.0 非同期ストリームC# 8.0 非同期ストリーム
C# 8.0 非同期ストリーム信之 岩永
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
CMake - Introduction and best practices
CMake - Introduction and best practicesCMake - Introduction and best practices
CMake - Introduction and best practicesDaniel Pfeifer
 
モバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組み
モバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組みモバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組み
モバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組みMorioImai
 
.NET Core 3.0時代のメモリ管理
.NET Core 3.0時代のメモリ管理.NET Core 3.0時代のメモリ管理
.NET Core 3.0時代のメモリ管理KageShiron
 
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime RipardKernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime RipardAnne Nicolas
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv
 
オトナのDocker入門
オトナのDocker入門オトナのDocker入門
オトナのDocker入門Tsukasa Kato
 
「Docker +VLAN 環境」アプリケーション実行環境の構築
「Docker +VLAN 環境」アプリケーション実行環境の構築「Docker +VLAN 環境」アプリケーション実行環境の構築
「Docker +VLAN 環境」アプリケーション実行環境の構築Fuva Brain
 
Cours Génie Logiciel 2016
Cours Génie Logiciel 2016Cours Génie Logiciel 2016
Cours Génie Logiciel 2016Erradi Mohamed
 
Linux Porting
Linux PortingLinux Porting
Linux PortingChamp Yen
 

What's hot (20)

Tiny ML for spark Fun Edge
Tiny ML for spark Fun EdgeTiny ML for spark Fun Edge
Tiny ML for spark Fun Edge
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack Update
 
20分でわかるgVisor入門
20分でわかるgVisor入門20分でわかるgVisor入門
20分でわかるgVisor入門
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
C# 8.0 非同期ストリーム
C# 8.0 非同期ストリームC# 8.0 非同期ストリーム
C# 8.0 非同期ストリーム
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
CMake - Introduction and best practices
CMake - Introduction and best practicesCMake - Introduction and best practices
CMake - Introduction and best practices
 
モバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組み
モバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組みモバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組み
モバイルゲームの「大規模な開発」かつ「高頻度の更新」を実現するための開発環境整備の取り組み
 
.NET Core 3.0時代のメモリ管理
.NET Core 3.0時代のメモリ管理.NET Core 3.0時代のメモリ管理
.NET Core 3.0時代のメモリ管理
 
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime RipardKernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
オトナのDocker入門
オトナのDocker入門オトナのDocker入門
オトナのDocker入門
 
C#で速度を極めるいろは
C#で速度を極めるいろはC#で速度を極めるいろは
C#で速度を極めるいろは
 
「Docker +VLAN 環境」アプリケーション実行環境の構築
「Docker +VLAN 環境」アプリケーション実行環境の構築「Docker +VLAN 環境」アプリケーション実行環境の構築
「Docker +VLAN 環境」アプリケーション実行環境の構築
 
Effective CMake
Effective CMakeEffective CMake
Effective CMake
 
Cours Génie Logiciel 2016
Cours Génie Logiciel 2016Cours Génie Logiciel 2016
Cours Génie Logiciel 2016
 
Linux Porting
Linux PortingLinux Porting
Linux Porting
 

Similar to A Peek into TFRT

A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowKoan-Sin Tan
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internalHyunghun Cho
 
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...ICSM 2011
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Koan-Sin Tan
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Foundation
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 

Similar to A Peek into TFRT (20)

A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
 
Os lectures
Os lecturesOs lectures
Os lectures
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Threads
ThreadsThreads
Threads
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
 
freertos-proj.pdf
freertos-proj.pdffreertos-proj.pdf
freertos-proj.pdf
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internal
 
2004 ugm-tips-tricks
2004 ugm-tips-tricks2004 ugm-tips-tricks
2004 ugm-tips-tricks
 
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
 
C++ Advanced Features
C++ Advanced FeaturesC++ Advanced Features
C++ Advanced Features
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 
Standard Library Functions
Standard Library FunctionsStandard Library Functions
Standard Library Functions
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPUKoan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphonesKoan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchKoan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08Koan-Sin Tan
 

More from Koan-Sin Tan (15)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

A Peek into TFRT

  • 1. Koan-Sin Tan, freedom@computer.org COSCUP, Aug 2nd, 2020 TensorFlow Runtime A Peek into the Future of TensorFlow 1
  • 2. • disclaimer: opinions are my own • feel free to interrupt me if you have any questions during the presentation • questions could be Taiwanese, English, or Mandarin • most of TFRT materials are adapted from TFRT deep dive in MLIR design meeting [1] and TFRT docs [2] • code around Aug 1, 2020 (git commit ecf1c20 [3]) [1] TFRT Deep Dive,  slides - recording, https://mlir.llvm.org/talks/ [2] https://github.com/tensorflow/runtime/tree/master/documents [3] https://github.com/tensorflow/runtime/commit/ecf1c20 2
  • 3. • Used open source before the term “open source” is used • A software guy, learned to use Unix and open source software on VAX-11/780 running 4.3BSD • Used to be a programming language junkie • Worked on various system software, e.g., CPU scheduling and power management of non- CPU components • Recently, on NN performance on edge devices related stuff • Contributed from time to time to TensorFlow Lite • started a command line label_image for TFLite who i am https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg 3
  • 4. What is TFRT • TensorFlow Runtime (TFRT) is one of the two new MLIR runtimes emerged in 2020 so far. • The other one is Intermediate Representation Execution Environment, IREE. It seems so far tfrt has better design documentation • Both of them have mobile / edge environment in mind. • I didn’t see mobile accelerated code in TFRT yet. • IREE has some Vulkan related code and some simple code works on Android already • ResNet GPU inference is 28% faster with TFRT • https://github.com/tensorflow/runtime, https://youtu.be/15tiQoPpuZ8 4
  • 5. Build it • if you follow the instructions described in README.md, it should just work. At least on x86_64 linux. • however, it’s not tested for non Linux environment yet • ssize_t and int64_t • on Mac OS X: ssize_t: long, int64_t: long long • current code mixed the use of ssize_t and int64_t • test: one the acclaimed features of TFRT, like MLIR, is its use of 
 LLVM FileCheck • my hacks, shape related (ssize_t) tests not fixed yet • it’s not tested on non-x86 platforms, such as aarch64, either 
 • 5
  • 6. • The three key directories under the TFRT root directory are • lib: Contains core TFRT infrastructure code • backends: Contains device specific infrastructure and op/kernel implementations • include: Contains public header files for core TFRT infrastructure 6
  • 7. Walking thru the tutorial • unfortunately, it seems it’s not easy to jump directly into source code without having some background knowledge • so we’ll walk thru the tutorial [1] • What are in the tutorial • print hello world • print integer • adding kernels [1] https://github.com/tensorflow/runtime/blob/master/documents/tutorial.md 7
  • 8. using tfrt and tfrt_test hello.mlir func @hello() { %chain = tfrt.new.chain // Create a string containing "hello world" and store it in %hello. %hello = "tfrt_test.get_string"() { string_attr = "hello world" } : () -> !tfrt.string // Print the string in %hello. "tfrt_test.print_string"(%hello, %chain) : (!tfrt.string, !tfrt.chain) -> !tfrt.chain tfrt.return } The ‘@hello function above shows how to create and print a string. The text after each ‘:’ specifies the types involved: • ()->!tfrt.string means that tfrt_test.get_string takes no arguments and returns a !tfrt.string. tfrt is a MLIR dialect prefix (or namespace) for TFRT • (!tfrt.string, !tfrt.chain) -> !tfrt.chain means that tfrt_test.print_string takes two arguments (! tfrt.string and !tfrt.chain) and returns a !tfrt.chain. chain [1] is a TFRT abstraction to manage dependencies [1] https://github.com/tensorflow/runtime/blob/master/documents/explicit_dependency.md 8
  • 9. hello world in MLIR func @stringconstant() -> !llvm<"[12 x i8]"> { %1 = llvm.constant("Hello world!") : !llvm<"i8*"> // CHECK: ret [12 x i8] c"Hello world!" llvm.return %1 : !llvm<"i8*"> } func @main() { %0 = llvm.constant(0) : !llvm.i64 %1 = call @stringconstant() : () -> !llvm<"[12 x i8]"> %2 = llvm.getelementptr %1[%0] : (!llvm<"[12 x i8]">, !llvm.i64) -> !llvm<"i8*"> %3 = llvm.bitcast %2 : !llvm<"i8*"> to !llvm<"i8*"> %32 = llvm.call @puts(%2) : (!llvm<"i8*">) -> !llvm.i32 return } func @puts(!llvm<"i8*">) -> !llvm.i32 • MLIR “standard dialect” doesn’t have I/O functions • there is LLVM dialect, of course we can use LLVM to call standard libc function 9
  • 10. Hello integer func @hello_integers() { %chain = tfrt.new.chain // Create an integer containing 42. %forty_two = tfrt.constant.i32 42 // Print 42. tfrt.print.i32 %forty_two, %chain tfrt.return } • as stated in the tutorial, we can run other functions in the same modular • we can turn to more basic ones, such as integers or floating point numbers • @hello_integers shows how to create and print integers • This example does not have the verbose type information we saw in @hello because there are custom parsers for the tfrt.constant.i32 and tfrt.print.32 kernels in basic_kernels.td 10
  • 11. basic_kernels.td • .td (table description?) files are for LLVM TableGen [1] TableGen, https://llvm.org/docs/TableGen/ class ConstantOp<string suffix, Type baseType, Attr attr> : TFRT_Op<"constant." # suffix, [NoSideEffect]> { let summary = "host executor constant value constructor"; let arguments = (ins attr:$value); let results = (outs baseType); } class PrintOp<string suffix, Type type> : TFRT_Op<"print." # suffix> { let summary = "tfrt.print operation"; let description = [{ An operation takes a number input and a chain input. It prints the number to stdout and returns a chain output. The chain input must be the second operand. Example: %2 = tfrt.print.i32 %0, %1 }]; let arguments = (ins type, TFRT_ChainType); let results = (outs TFRT_ChainType); let assemblyFormat = "operands attr-dict"; let verifier = ?; } https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L376-L390 https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L58-L64 11
  • 13. user defined kernels func @print_coordinate() { %chain = tfrt.new.chain %two = tfrt.constant.i32 2 %four = tfrt.constant.i32 4 %coordinate = "my.create_coordinate"(%two, %four) : (i32, i32) -> !my.coordinate "my.print_coordinate"(%coordinate, %chain) : (!my.coordinate, !tfrt.chain) -> !tfrt.chain tfrt.return } coordinate.mlir shows several TFRT features: • MLIR types that begin with exclamation mark (!) are user-defined types like !my.coordinate, compared to built-in types like i32 • Kernels are just C++ functions with a name in MLIR: my.print_coordinate is the MLIR name for the C++ PrintCoordinate function • Kernels may pass arbitrary user-defined types: my.create_coordinate passes a custom Coordinate struct to my.print_coordinate 13
  • 14. to dig into some code we need more system information 14
  • 16. • TensorFlow user passes into TFRT a TensorFlow graph created via high-level TensorFlow APIs, and • TFRT then calls the MLIR-based graph compiler to optimize and lower the graph into BEF, a Binary Executable Format for TFRT graph execution (MLIR is the compiler infrastructure that we use to represent TFRT host programs). • The blue arrows in the simplified TensorFlow training stack diagram show this flow. 16
  • 17. • In the README.md we are told to build two binaries: tfrt_translate and bef_excutor • tfrt_translate • The tfrt_translate program does round trip translation between MLIR and BEF, similar to an assembler and disassembler. • bef_executor • The bef_executor program is the execution driver of BEF files. It reads in a BEF file, sets up runtime, and asynchronously executes function(s) in that file. 17
  • 18. TFRT Host Runtime • Foundation of TFRT: schedules work on the host and devices • Clean separation between host and device runtimes: • Host runtime does not know anything about devices, just their runtimes (sets of kernels) • Key design points: • Fully asynchronous - kernel executions can not block • Excellent error propagation in the presence of asynchrony • Performance as a first-class concern, for graph and eager • Outline: • Common runtime infrastructure • Graph execution • Op-by-op execution (“eager”) 18
  • 19. • Container for data or resources • Not Tensor specific • A “future” type, fulfilled with exactly one value, or an error • Lock-free, low memory overhead, type erased, reference counted • Helper class AsyncValueRef<T> provides type safety when contained type is known • AsyncValues enable efficient asynchronous compute • Asynchronous functions return unavailable AsyncValues • Caller can schedule dependent computations with AsyncValue::AndThen() • Caller need not block until AsyncValue becomes available Key Abstraction: AsyncValue https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/async_value.h 19
  • 20. Kernels • Kernel: unit of computation scheduled by the runtime • Similar to kernel concept in current TensorFlow • Kernels accept AsyncValue inputs and produce AsyncValue output • Runtime coordinates dataflow of AsyncValues between kernels • Outputs may not be immediately available, unlike current TensorFlow • Runtime generally does not understand kernel semantics // Kernel that adds two integers. // AsyncKernelFrame holds the kernel’s arguments and results. static void TFRTAdd(AsyncKernelFrame* frame) { // Fetch the kernel’s 0th argument. AsyncValue* arg1 = frame->GetArgAt(0); // Fetch the kernel’s 1st argument. AsyncValue* arg2 = frame->GetArgAt(1); int v1 = arg1->get<int>(); int v2 = arg2->get<int>(); // Set the kernel’s 0th result. frame->EmplaceResultAt<int>(0, v1 + v2); } https://github.com/tensorflow/runtime/blob/master/documents/tfrt_host_runtime_design.md https://github.com/tensorflow/runtime/blob/master/lib/basic_kernels/integer_kernels.cc#L39-L45 https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/kernel_utils.h#L61-L149 20
  • 21. Host Program • Host programs encode a dataflow graph • Similar to GraphDef in current TensorFlow • Expressed in MLIR. Typically compiler generated • Designed for low-level dispatch efficiency • Designed for compiler transformations and analysis, e.g., • Use dataflow analysis for buffer reuse func @sample_function() -> i32 { %one = tfrt.constant.i32 1 // Make AsyncValue with value 1 %two = tfrt.constant.i32 2 // Make AsyncValue with value 2 %three = tfrt.add.i32 %one, %two // Make AsyncValue with value 3 (1+2) %ch0 = tfrt.new.chain tfrt.print.i32 %three, %ch0 // Print AsyncValue %three tfrt.return %three : i32 // Return AsyncValue %three } 21
  • 22. TFRT Binary Executable Format (BEF) • BEF encodes a hardware-specific lowered graph function • Primary interface between compiler and runtime 
 • Designed for efficient execution • Low overhead: execute program by reading mmap’d byte array 
 • Persistent and stable: Compile once offline, run many times 
 online. Great for inference use-cases 
 • Composed of sections, similar to ELF. Each section has its own format 
 • Extensible: BEF is versioned, reader ignores unknown sections, new versions may define new sections 
 https://github.com/tensorflow/runtime/blob/master/documents/binary_executable_format.md 22
  • 23. BEF Executor • BEF Executor evaluates a BEF dataflow graph “executor” style: • Not a bytecode-like interpreter: no concept of program counter • “Strict” execution by default: run a kernel only when all its inputs are available • Executor features: • Lock-free: atomics instead of mutexes • Non-blocking: defer dependent work with AsyncValue::AndThen • Supports “non-strict” execution: may run a kernel when some of its inputs are available • Good for efficiently forwarding unavailable inputs to outputs • Key concepts: • BEF: dataflow graph • Kernel: dataflow node • AsyncValues: dataflow edge https://github.com/tensorflow/runtime/blob/master/lib/bef_executor/bef_interpreter.cc#L223-L25423
  • 25. How about Core Runtime? • Surely, we can do similar walkthrough, but that will takes more time • Two things • Op Execution API, Execute() • BEF Executor can handle it too void CoreRuntime::Impl::Execute(const ExecutionContext& exec_ctx, string_view op_name, OpHandler* op_handler, MutableArrayRef<TensorHandle> arguments, const OpAttrsRef& attrs, MutableArrayRef<TensorHandle> results, AsyncValueRef<Chain>* chain) { // Ask the op_handler to execute the op. If successful, we're done. auto op_handle = op_handler->MakeOp(op_name); if (op_handle) { op_handle.get()(exec_ctx, arguments, attrs, results, chain); return; } // Otherwise, we fail with an 'unknown op' error. auto err = EmitErrorAsync(exec_ctx, "op '" + op_name.str() + "' is not supported"); for (auto& result : results) result = TensorHandle(err.CopyRef()); if (chain) *chain = std::move(err); } 25 https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/core_runtime.cc#L124-L143 https://github.com/tensorflow/runtime/blob/master/documents/ tfrt_op_by_op_execution_design.md
  • 26. BEF Executor for “op” graph • corert.executeop • sample 26 https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/kernels.cc func @example() -> !tfrt.chain { %cpu = corert.get_op_handler("cpu") // Create TensorHandles %lhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] } %rhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] } %result = corert.executeop(%cpu) "test.add" (%lhs, %rhs) %ch0 = tfrt.new.chain %ch1 = corert.print_tensorhandle(%result, %ch0) tfrt.return %ch1 : !tfrt.chain } func @example() -> !tfrt.chain { %ch0 = tfrt.new.chain %cpu = corert.get_op_handler %ch0 "cpu" // Create TensorHandles %lhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] } : 1 %rhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] } : 1 %result = corert.executeop(%cpu) "test.add" (%lhs, %rhs) : 1 %ch1 = "corert.print_tensorhandle"(%result, %ch0) : (!corert.tensorhandle, !tfrt.chain) -> !tfrt.chain tfrt.return %ch1 : !tfrt.chain }
  • 27. Device Runtime CPU 27 //===----------------------------------------------------------------------===// // CPU Relu kernels //===----------------------------------------------------------------------===// // Computes B = Relu(A). template <typename T> static AsyncValueRef<Chain> Relu(const DenseHostTensor& A, DenseHostTensor* B, const ExecutionContext& exec_ctx) { auto fn = [](auto& a, auto& b) { return a.cwiseMax(static_cast<T>(0)); }; return ::tfrt::compat::UnaryEigenKernelAsync<T, T>(A, B, std::move(fn), exec_ctx); } //===----------------------------------------------------------------------===// // CPU BiasAdd kernels //===----------------------------------------------------------------------===// // A special case of tf.add where bias is restricted to be 1-D. // Currently only support NHWC data format. template <typename T, size_t RANK> static AsyncValueRef<Chain> BiasAdd(const DenseHostTensor& input, const DenseHostTensor& bias, DenseHostTensor* output, const ExecutionContext& exec_ctx) { DHTIndexableView<T, RANK> input_view(&input); MutableDHTIndexableView<T, RANK> output_view(output); DHTIndexableView<T, 1> bias_view(&bias); const auto& shape_input = input_view.FixedShape(); const auto& shape_bias = bias_view.FixedShape(); const auto& shape_output = output_view.FixedShape(); if (shape_input != shape_output) { return EmitErrorAsync(exec_ctx, "unexpected output shape"); } if (shape_bias[0] != shape_input[RANK - 1]) { return EmitErrorAsync(exec_ctx, "bias shape does not match input shape"); } // Reshape bias to the shape of input. Broadcast along the last axis of input. Eigen::array<Eigen::Index, RANK> reshape_dims; Eigen::array<Eigen::Index, RANK> broadcast_dims; for (size_t i = 0; i < RANK - 1; ++i) { reshape_dims[i] = static_cast<Eigen::Index>(1); broadcast_dims[i] = static_cast<Eigen::Index>(shape_input[i]); } reshape_dims[RANK - 1] = static_cast<Eigen::Index>(shape_bias[0]); broadcast_dims[RANK - 1] = static_cast<Eigen::Index>(1); auto input_t = AsEigenConstTensor(input_view); auto bias_t = AsEigenConstTensor(bias_view); auto output_t = AsEigenTensor(output_view); auto expr = input_t + bias_t.reshape(reshape_dims).broadcast(broadcast_dims); return AsyncAssign( exec_ctx.host()->GetOrCreateSharedContext<EigenHostContext>(), std::move(output_t), std::move(expr), KeepBuffers::alive(&input, &bias, output)); } https://github.com/tensorflow/runtime/blob/master/backends/cpu/lib/kernels/cpu_kernels.h
  • 28. Dialects we can see now • tfrt: we know what this is for • tfrt_test: to test tfrt • tfrt_data: tf.data, to deal with input pipeline • tfrt_dht: dense host tensor • corert: Core Runtime, eager execution • ts: tensor shape • coo: COOrdinate list sparse tensor • eigen: wrapper around the eigen library • btf: binary tensor format • cuda: you know what cuda means :-) 28
  • 29. Concluding Remarks • MLIR related talks and publications, https://mlir.llvm.org/talks/ • We scratched the surface of TFRT host runtime and core runtime. There are more details • threading model: thread pool / work queue, • memory allocation: tcmalloc for server, other small allocators for embedded systems, • non-strict execution, and • registers: BEF executor is a register machine • we didn’t touch other important components such as device runtimes, eps. the GPU part, and distributed environment 29
  • 31. Device Runtime Design Principles • A thin wrapper of low-level (driver) APIs, exposing device capabilities to graph compiler • Memory Allocation • Async host <-> device transfer, and kernel execution • Dependency management • Focus on mechanism instead of policy • E.g. No built-in special-purpose streams for GPU support: • For pure eager execution, can default to one stream for everything • For tf.function execution, compiler can pick streams 31