SlideShare a Scribd company logo
1 of 60
Download to read offline
Intel Nervana Graph
とは
@Vengineer
2017/05/22
2017/07/01更新
いつものように
ソースコードの中を
探ってみました
ブログ : Vengineerの戯言
http://blogs.yahoo.co.jp/verification_engineer
Twitter : @Vengineer
FPGAマガジン (No.16/17)
FPGAコミュニティのススメ
http://fpga.cqpub.co.jp/
自己紹介
SlideShare
https://www.slideshare.net/ssuser479fa3
この資料は、
各社の公開情報を
Google君で検索したものを
まとめたものです。
ご利用は、自己責任でお願いします
2016年8月9日、Intelは
Nervana Systemsを
3.5億ドル以上で買収
創立2年のスタートアップで、投資家から2500万ドル近くを調達していた
ということは、投資家は2年で10倍で売り抜けたということ
2年間で3億ドル
Softbank GroupのARM買収は240億ポンドなので、ざっくり 1/100
引用
:http://jp.techcrunch.com/2016/08/10/20160809intel-buys-deep-learning-startup-nervana-systems-f
or-a-reported-350-million/
Nervana Graph Compiler
引用:https://www.nervanasys.com/intel-nervana-graph-preview-release/
・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet
・Nervana Graph
・Transformers : CPU / GPU (CUDA)
Lowering
TensorFlow
グラフ
XLAグラフに変換
コード生成
JIT or AOT
LLVMを
利用
Lowering
TensorFlow XLA
CPU
GPU(CUDA)
Nervana Graph Compiler
と
TensorFlow XLA
何か同じじゃん
出ましたよ
https://www.intelnervana.com/intel-nervana-graph-and-neon-3-0-updates/
The connection between the XLA and
Intel Nervana Graph APIs was quite
straightforward given the similar
projects’ intent for a compact and
explicit intermediate representation.
While today the XLA/Intel Nervana
Graph integration is at a pre-alpha level,
we’d love for people to take it for a spin
and kick the tires. We’re working on
ironing out known performance issues and
improving op and backend support.
Intel Nervana Graph Beta : 2017/6/22
Intel neon
neon
https://github.com/NervanaSystems/neon
最新バージョンは、v1.9
ARMのNEONと同じ名前だけど
neon is
Intel Nervana's reference
deep learning framework committed
to best performance on all hardware
Datasets
Images: MNIST, CIFAR-10, ImageNet 1K,
PASCAL VOC, Mini-Places2
Text: IMDB, Penn Treebank,
Shakespeare Text, bAbI, Hutter-prize
Video: UCF101
Others: flickr8k, flickr30k, COCO
neon vs cuDNN 4
“Not so fast, FFT”: Winograd (March 3, 2016)
引用:https://www.nervanasys.com/winograd/
cuDNN 5
Optimizing Recurrent Neural Networks in
cuDNN 5 (April 6, 2016)
https://devblogs.nvidia.com/parallelforall/optimizing-recurren
t-neural-networks-cudnn-5/
Faster forward
and backward convolutions
using the Winograd
convolution algorithm;
Winogradで高速化!
Fast Algorithms
for Convolutional Neural Networks
Andrew Lavin, Scott Gray
https://arxiv.org/abs/1509.09308
Going beyond full utilization: The inside scoop
on Nervana’s Winograd kernels (June 29, 2016)
https://www.nervanasys.com/winograd-2/
neon v1.3 vs cuDNN v5.1
Still not slowing down: Benchmarking optimized
Winograd implementations (July 25, 2016)
引用:https://www.nervanasys.com/winograd-3/
vs cuDNN v4 vs cuDNN v5.1
Scott Gray さん
https://twitter.com/scottgray76
High-Performance GPU kernels for deep learning
• Fast matrix multiply for small minibatches
• Direct convolution leveraging GEMM advances
• Even faster convolution with Winograd
Nervana (2014年10月 〜 2017年7月)
現在は、Open AI所属 (〜 2017年7月) 
引用
:http://on-demand.gputechconf.com/gtc/2016/presentation/s6485-scott-gray-gpu-programming-deep-learnin
g.pdf
Intel Nervana
Graph Compiler
Nervana Graph Compiler
引用:https://www.nervanasys.com/intel-nervana-graph-preview-release/
・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet
・Nervana Graph
・Transformers : CPU / GPU (CUDA)
Lowering
Graph Compilerの位置づけ
引用:http://pc.watch.impress.co.jp/docs/news/1034408.html
MKL-DNN Support
Mar 23, 2017 :Intelに買収された後
To install with Intel MKL-DNN support, first download
MKL-DNN from [here]
・(https://github.com/01org/mkl-dnn) and follow the
installation instructions
・there to install MKL-DNN. Set environment variable
MKLDNN_ROOT to point to
・the installated location and follow the rest of the
steps to install Ngraph
引用:https://github.com/NervanaSystems/ngraph/commit/f3b7306214f40b4c1b4c40e3e223080797afb382
Transformer API
・CPU と GPU をサポート
Memory usage optimization passes
Transformers allow users to register an included
set of optional compiler passes
for debug and visualization.
・GPU
automatic kernel fusion/compounding
for increased performance
・LLVMのPassのような仕組み
引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
グラフを生成する
・Nervana Graph構造
Data Dependencies
Initializers
Non-data Control Dependencies
・General properties of ops
・Op Hierarchy
・Ops influencing evaluation
・Derivatives
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/building_graphs.rst
例題
import ngraph as ng
import ngraph.transformers as ngt
x = ng.placeholder(())
x_plus_one = x + 1
transformer = ngt.make_transformer()
plus_one = transformer.computation(x_plus_one, x)
for i in range(5):
print(plus_one(i))
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/overview.rst
将来サポートするもの?
・Nervana Graph serialization/deserialization
・Further improvements/abstractions to graph
composability for usability/optimization
・Distributed, heterogeneous backend target support
・C APIs for interoperability to enable other languages
to create/execute graphs
・Better debugging
・Support for model deployment
引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
コレ以降、
Intel Nervana Graph Compilerの
ソースコードを探っていいきます
ngraph
https://github.com/NervanaSystems/ngraph
Caffeでの例
from __future__ import print_function
import ngraph.transformers as ngt
from ngraph.frontends.caffe.cf_importer.importer import
parse_prototxt
model = "sum.prototxt"
op_map = parse_prototxt(model,verbose=True)
op = op_map.get("D")
res = ngt.make_transformer().computation(op)()
print("Result is:",res)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/caffe.rst
TensorFlowでの例
x = tf.constant(1.)
y = tf.constant(2.)
f = x + y
importer = TFImporter()
importer.import_graph_def(tf.Session().graph_def)
f_ng = importer.get_op_handle(f)
transformer = ngt.make_transformer()
f_result = transformer.computation(f_ng)()
print(f_result)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/tensorflow.rst
Transformers
Transformers are used to convert the Op graph into a backend
specific executable format. Once the graph has been defined,
one or more computations are created using a transformer.
Computations are handles to executable objects created by
the transformer, which can be called to evaluate a subset of
the entire graph. All transformers must implement a common
abstract interface allowing users to easily switch between
backends without altering their computation graph definition.
サポートしているバックエンド
・CPUs (via NumPy)
・NVIDIA GPUs (via PyCUDA)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Transformersの生成
1)、デフォルト
from ngraph.transformers import make_transformer
transformer = make_transformer()
2)、ファクトリを利用
import ngraph.transformers as ngt
available_transformers = ngt.transformer_choices()
if 'gpu' in available_transformers:
factory = ngt.make_transformer_factory('gpu')
ngt.set_transformer_factory(factory)
transformer = ngt.make_transformer()
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computations
Computation objects are created by the transformer and
provide an interface to evaluate a subset of the graph. The
format of the executable used for evaluation depends on the
transformer that created the computation. For example the
CPU transformer generates python NumPy code which is called
to evaluate the computation, while the GPU transformer
generates a series of CUDA kernels which can be called to
evaluate the computation.
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computationsの生成
import ngraph as ng
a = ng.constant(4)
b = ng.placeholder(())
c = ng.placeholder(())
d = ng.multiply(a, b)
e = ng.add(d, c)
example_comp = transformer.computation(e, b, c)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computationsの実行
example_comp = transformer.computation(e, b, c)
 result_e = eの戻り値
 b = 第一引数
 c = 第二引数
result_e = example_comp(2, 7) : b = 2, c = 7
result_e = (4 * b) + c => ( 4*2 ) + 7 = 15
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computationsの実行
複数の戻り値
example_comp2 = transformer.computation([d, e], b, c)
 result_d = dの戻り値, result_e = eの戻り値
 b = 第一引数
 c = 第二引数
result_d, result_e = example_comp2(2, 7)
result_d = (4 * b) = (4 * 2) = 8
result_e = (4 * b) + c => (4 * 2) + 7 = 15
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Transformerの実装
・Transformerの生成
・Computationの生成
・Transformerの初期化
Transformer Passes
Intialization Computation
Tensor Description Initialization
Computation Transformation
・Computationの実行
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_implementation.rst
Transformerの実装
base.py : Transformer_ABC_Meta
base.py : Transformer (ベース)
cputransform.py : CPUTransformer
gputransform.py : GPUTransformer
hetrtransform.py : HetrTransformer
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
Transformer_ABC_Metaクラス
class Transformer_ABC_Meta(abc.ABCMeta):
"""
metaclass for the backend objects
takes care of registering all the backend subclasses
"""
def __init__(cls, name, bases, dict_):
if not hasattr(cls, 'transformers'):
# First possible transformer class sets things up
cls.transformers = {}
# If this transformer has a transformer_name, register it
transformer_name = getattr(cls, 'transformer_name', None)
if transformer_name is not None:
cls.transformers[transformer_name] = cls
super(Transformer_ABC_Meta, cls).__init__(name, bases, dict_)
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
Transformerクラス
class Transformer(with_metaclass(Transformer_ABC_Meta, object)):
"""
Produce an executable version of op-graphs.
Computations are subsets of Ops to compute. The transformer determines storage
allocation and transforms the computations and allocations into functions.
Arguments:
fusion (bool): Whether to combine sequences of operations into one operation.
**kwargs: Args for related classes.
Attributes:
computations (:obj:`set` of :class:`Computation`): The set of requested computations.
all_results (:obj:`set` of :class:`ngraph.op_graph.op_graph.Op`): A root set of Ops that
need to be computed.
finalized (bool): True when transformation has been performed.
initialized (bool): True when variables have been initialized/restored.
fusion (bool): True when fusion was enabled.
device_buffers (set): Set of handles for storage allocations.
"""
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
Computationの実装
base.py : Computation (ベース)
cputransform.py : CPUComputation
gputransform.py : GPUComputation
hetrtransform.py : HetrComputation
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
Computationクラス
class Computation(NameableValue):
"""
A handle for a computation function.
Arguments:
transformer (obj:`Transformer`): The associated transformer.
returns: If an Op, return the value
of the Op, if sequence of Ops, return the sequence of values, if
a set return a map, if None, return None.
*args: AllocationOps marked input will be arguments to the function.
**kwargs: Args for related classes.
"""
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
Computationクラス
def __init__(self, transformer, computation, **kwargs):
super(Computation, self).__init__(**kwargs)
self.transformer = transformer
self.computation = computation
self.computation_name = None
self.executor = None
self.send_nodes = []
self.recv_nodes = []
self.scatter_send_nodes = []
self.scatter_recv_nodes = []
self.gather_send_nodes = []
self.gather_recv_nodes = []
self.allreduce_nodes = []
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
Passの実装 (その1)
passes.py GraphPass (ベースクラス)
passes.py GraphBuildingPass
passes.py GraphRewritePass (New)
passes.py PeepholeGraphPass
passes.py RequiredTensorShaping
passes.py CPUTensorShaping
passes.py SimplePrune
flexpass.py FlexDtypePass
flexpass.py FlexDECPass
flexpass.py ClearTensorDescriptions
nviz.py JSONPass(GraphPass):
nviz.py VizPass(GraphPass):
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes/base.py
Passの実装 (その2)
layout.py PruneContiguousPass
layout.py GenerateLayoutDomains
layout.py GenerateLayoutConstraints
layout.py AssignLayouts
layout.py AddLayoutConversions
cpufusion.py FusionPass
cpulayout.py CPUTensorLayout
gpusimplification.py GPUSubstitution
hetrpasses.py DeviceAssignPass
hetrpasses.py CommunicationPass
hetrpasses.py DistributedPass
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes
Passの実装 (その3) (New)
mkldnnpasses.py MklCreateOpDescriptors
mkldnnpasses.py MklAddLayoutConversions
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes
GraphPassクラス
class GraphPass(with_metaclass(abc.ABCMeta, object)):
@abc.abstractmethod
def do_pass(self, ops, transformer):
pass
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes/passes.py
CPUTransformerクラス
class CPUTransformer(Transformer):
def __init__(self, **kwargs):
super(CPUTransformer, self).__init__(**kwargs)
self.current_computation = None
self.conv_engine = CPUConvEngine()
self.init_code = CPUCodeGenerator(self)
self.allocate_storage_code = CPUCodeGenerator(self)
self.allocate_code = CPUCodeGenerator(self)
self.compute_code = CPUCodeGenerator(self)
self.code = CPUCodeGenerator(self)
…..
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
CPUCodeGeneratorクラス
class CPUCodeGenerator(PyGen):
def __init__(self, transformer, **kwargs):
super(CPUCodeGenerator, self).__init__(prefix="op",
**kwargs)
self.transformer = transformer
def name(self, x):
if isinstance(x, CPUDeviceBufferStorage):
return x.ref_str
if isinstance(x, CPUDeviceTensor):
return x.ref_str
return x
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
CPUComputationクラス
class CPUComputation(Computation):
def __init__(self, transformer, computation, **kwargs):
super(CPUComputation, self).__init__(transformer,
computation, **kwargs)
self.pool_params = dict()
self.pool_slices = dict()
self.conv_params = dict()
self.conv_slices = dict()
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
ありがとうございました
ブログ : Vengineerの戯言
http://blogs.yahoo.co.jp/verification_engineer
Twitter : @Vengineer
勉強会主催 :
Xilinx Zynq MPSoC (2016/02/20)
Altera SDK for OpenCL (2016/06/10)
Xilinx SDSoC (2017/01/28)
PYNQ祭り (2017/03/04)
FPGAディープラーニング実践懇親会 (2017/05/20)
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?
Intel Nervana Graph とは?

More Related Content

What's hot

Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Anne Nicolas
 
Implementing Lightweight Networking
Implementing Lightweight NetworkingImplementing Lightweight Networking
Implementing Lightweight Networkingguest6972eaf
 
Zn task - defcon russia 20
Zn task  - defcon russia 20Zn task  - defcon russia 20
Zn task - defcon russia 20DefconRussia
 
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
TensorFlow Lite (r1.5) & Android 8.1 Neural Network APITensorFlow Lite (r1.5) & Android 8.1 Neural Network API
TensorFlow Lite (r1.5) & Android 8.1 Neural Network APIMr. Vengineer
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugsDmitry Vyukov
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackKernel TLV
 
Translation Cache Policies for Dynamic Binary Translation
Translation Cache Policies for Dynamic Binary TranslationTranslation Cache Policies for Dynamic Binary Translation
Translation Cache Policies for Dynamic Binary TranslationSaber Ferjani
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionlinuxlab_conf
 
Ищем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре LinuxИщем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре LinuxPositive Hack Days
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Gavin Guo
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzerDmitry Vyukov
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesAnne Nicolas
 
Developer support/process automation tools
Developer support/process automation toolsDeveloper support/process automation tools
Developer support/process automation toolsDmitry Vyukov
 
System Hacking Tutorial #3 - Buffer Overflow - Egg Hunting
System Hacking Tutorial #3 - Buffer Overflow - Egg HuntingSystem Hacking Tutorial #3 - Buffer Overflow - Egg Hunting
System Hacking Tutorial #3 - Buffer Overflow - Egg Huntingsanghwan ahn
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaXKernel TLV
 
The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014Jian-Hong Pan
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 

What's hot (20)

Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
 
The pocl Kernel Compiler
The pocl Kernel CompilerThe pocl Kernel Compiler
The pocl Kernel Compiler
 
Implementing Lightweight Networking
Implementing Lightweight NetworkingImplementing Lightweight Networking
Implementing Lightweight Networking
 
Zn task - defcon russia 20
Zn task  - defcon russia 20Zn task  - defcon russia 20
Zn task - defcon russia 20
 
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
TensorFlow Lite (r1.5) & Android 8.1 Neural Network APITensorFlow Lite (r1.5) & Android 8.1 Neural Network API
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugs
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
Translation Cache Policies for Dynamic Binary Translation
Translation Cache Policies for Dynamic Binary TranslationTranslation Cache Policies for Dynamic Binary Translation
Translation Cache Policies for Dynamic Binary Translation
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruption
 
Ищем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре LinuxИщем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре Linux
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Developer support/process automation tools
Developer support/process automation toolsDeveloper support/process automation tools
Developer support/process automation tools
 
System Hacking Tutorial #3 - Buffer Overflow - Egg Hunting
System Hacking Tutorial #3 - Buffer Overflow - Egg HuntingSystem Hacking Tutorial #3 - Buffer Overflow - Egg Hunting
System Hacking Tutorial #3 - Buffer Overflow - Egg Hunting
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
ARM 64bit has come!
ARM 64bit has come!ARM 64bit has come!
ARM 64bit has come!
 

Viewers also liked

Altera SDK for OpenCL解体新書 perlスクリプト編
Altera SDK for OpenCL解体新書 perlスクリプト編Altera SDK for OpenCL解体新書 perlスクリプト編
Altera SDK for OpenCL解体新書 perlスクリプト編Mr. Vengineer
 
Altera SDK for OpenCL解体新書 : ホストとデバイスの関係
Altera SDK for OpenCL解体新書 : ホストとデバイスの関係Altera SDK for OpenCL解体新書 : ホストとデバイスの関係
Altera SDK for OpenCL解体新書 : ホストとデバイスの関係Mr. Vengineer
 
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)Mr. Vengineer
 
プロファイラGuiを用いたコード分析 20160610
プロファイラGuiを用いたコード分析 20160610プロファイラGuiを用いたコード分析 20160610
プロファイラGuiを用いたコード分析 20160610HIDEOMI SUZUKI
 
FPGAアクセラレータの作り方
FPGAアクセラレータの作り方FPGAアクセラレータの作り方
FPGAアクセラレータの作り方Mr. Vengineer
 
Altera sdk for open cl アンケート集計結果(公開版)
Altera sdk for open cl アンケート集計結果(公開版)Altera sdk for open cl アンケート集計結果(公開版)
Altera sdk for open cl アンケート集計結果(公開版)Hiroki Nakahara
 
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみたHiroki Nakahara
 
TensorFlow XLA とハードウェア
TensorFlow XLA とハードウェアTensorFlow XLA とハードウェア
TensorFlow XLA とハードウェアMr. Vengineer
 

Viewers also liked (8)

Altera SDK for OpenCL解体新書 perlスクリプト編
Altera SDK for OpenCL解体新書 perlスクリプト編Altera SDK for OpenCL解体新書 perlスクリプト編
Altera SDK for OpenCL解体新書 perlスクリプト編
 
Altera SDK for OpenCL解体新書 : ホストとデバイスの関係
Altera SDK for OpenCL解体新書 : ホストとデバイスの関係Altera SDK for OpenCL解体新書 : ホストとデバイスの関係
Altera SDK for OpenCL解体新書 : ホストとデバイスの関係
 
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)
 
プロファイラGuiを用いたコード分析 20160610
プロファイラGuiを用いたコード分析 20160610プロファイラGuiを用いたコード分析 20160610
プロファイラGuiを用いたコード分析 20160610
 
FPGAアクセラレータの作り方
FPGAアクセラレータの作り方FPGAアクセラレータの作り方
FPGAアクセラレータの作り方
 
Altera sdk for open cl アンケート集計結果(公開版)
Altera sdk for open cl アンケート集計結果(公開版)Altera sdk for open cl アンケート集計結果(公開版)
Altera sdk for open cl アンケート集計結果(公開版)
 
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
 
TensorFlow XLA とハードウェア
TensorFlow XLA とハードウェアTensorFlow XLA とハードウェア
TensorFlow XLA とハードウェア
 

Similar to Intel Nervana Graph とは?

Systemtap
SystemtapSystemtap
SystemtapFeng Yu
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkDatabricks
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018Fabio Janiszevski
 
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Valeriy Kravchuk
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
GopherCon IL 2020 - Web Application Profiling 101
GopherCon IL 2020 - Web Application Profiling 101GopherCon IL 2020 - Web Application Profiling 101
GopherCon IL 2020 - Web Application Profiling 101yinonavraham
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Databricks
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Mr. Vengineer
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardJian-Hong Pan
 

Similar to Intel Nervana Graph とは? (20)

Systemtap
SystemtapSystemtap
Systemtap
 
Java in flames
Java in flamesJava in flames
Java in flames
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
 
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
GopherCon IL 2020 - Web Application Profiling 101
GopherCon IL 2020 - Web Application Profiling 101GopherCon IL 2020 - Web Application Profiling 101
GopherCon IL 2020 - Web Application Profiling 101
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 

More from Mr. Vengineer

XilinxのxsimでSoftware Driven Verification.pdf
XilinxのxsimでSoftware  Driven Verification.pdfXilinxのxsimでSoftware  Driven Verification.pdf
XilinxのxsimでSoftware Driven Verification.pdfMr. Vengineer
 
VerilatorとSystemCでSoftware Driven Verification
VerilatorとSystemCでSoftware Driven VerificationVerilatorとSystemCでSoftware Driven Verification
VerilatorとSystemCでSoftware Driven VerificationMr. Vengineer
 
Cloud TPU Driver API ソースコード解析
Cloud TPU Driver API ソースコード解析Cloud TPU Driver API ソースコード解析
Cloud TPU Driver API ソースコード解析Mr. Vengineer
 
Cloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & InferenceCloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & InferenceMr. Vengineer
 
TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?Mr. Vengineer
 
Pixel Visual Core device driver source code analysis
Pixel Visual Core device driver source code analysisPixel Visual Core device driver source code analysis
Pixel Visual Core device driver source code analysisMr. Vengineer
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...Mr. Vengineer
 
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
TensorFlow XLA 「XLAとは、から、最近の利用事例について」TensorFlow XLA 「XLAとは、から、最近の利用事例について」
TensorFlow XLA 「XLAとは、から、最近の利用事例について」Mr. Vengineer
 
Facebook Glow Compiler のソースコードをグダグダ語る会
Facebook Glow Compiler のソースコードをグダグダ語る会Facebook Glow Compiler のソースコードをグダグダ語る会
Facebook Glow Compiler のソースコードをグダグダ語る会Mr. Vengineer
 
Ultra96(UltraZed)実践勉強会
Ultra96(UltraZed)実践勉強会Ultra96(UltraZed)実践勉強会
Ultra96(UltraZed)実践勉強会Mr. Vengineer
 
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
 
TensorFlow local Python XLA client
TensorFlow local Python XLA clientTensorFlow local Python XLA client
TensorFlow local Python XLA clientMr. Vengineer
 
Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Mr. Vengineer
 
LeFlowを調べてみました
LeFlowを調べてみましたLeFlowを調べてみました
LeFlowを調べてみましたMr. Vengineer
 
Tensorflow dynamically loadable XLA plugin ソースコード解析
Tensorflow  dynamically loadable XLA plugin ソースコード解析Tensorflow  dynamically loadable XLA plugin ソースコード解析
Tensorflow dynamically loadable XLA plugin ソースコード解析Mr. Vengineer
 
Tensor comprehensions
Tensor comprehensionsTensor comprehensions
Tensor comprehensionsMr. Vengineer
 

More from Mr. Vengineer (20)

XilinxのxsimでSoftware Driven Verification.pdf
XilinxのxsimでSoftware  Driven Verification.pdfXilinxのxsimでSoftware  Driven Verification.pdf
XilinxのxsimでSoftware Driven Verification.pdf
 
VerilatorとSystemCでSoftware Driven Verification
VerilatorとSystemCでSoftware Driven VerificationVerilatorとSystemCでSoftware Driven Verification
VerilatorとSystemCでSoftware Driven Verification
 
VerilatorとSystemC
VerilatorとSystemCVerilatorとSystemC
VerilatorとSystemC
 
TVM VTA (TSIM)
TVM VTA (TSIM) TVM VTA (TSIM)
TVM VTA (TSIM)
 
Cloud TPU Driver API ソースコード解析
Cloud TPU Driver API ソースコード解析Cloud TPU Driver API ソースコード解析
Cloud TPU Driver API ソースコード解析
 
Cloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & InferenceCloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & Inference
 
TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?
 
Pixel Visual Core device driver source code analysis
Pixel Visual Core device driver source code analysisPixel Visual Core device driver source code analysis
Pixel Visual Core device driver source code analysis
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
 
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
TensorFlow XLA 「XLAとは、から、最近の利用事例について」TensorFlow XLA 「XLAとは、から、最近の利用事例について」
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
 
Facebook Glow Compiler のソースコードをグダグダ語る会
Facebook Glow Compiler のソースコードをグダグダ語る会Facebook Glow Compiler のソースコードをグダグダ語る会
Facebook Glow Compiler のソースコードをグダグダ語る会
 
Ultra96(UltraZed)実践勉強会
Ultra96(UltraZed)実践勉強会Ultra96(UltraZed)実践勉強会
Ultra96(UltraZed)実践勉強会
 
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
 
TensorFlow XLA RPC
TensorFlow XLA RPCTensorFlow XLA RPC
TensorFlow XLA RPC
 
TensorFlow local Python XLA client
TensorFlow local Python XLA clientTensorFlow local Python XLA client
TensorFlow local Python XLA client
 
Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。
 
LeFlowを調べてみました
LeFlowを調べてみましたLeFlowを調べてみました
LeFlowを調べてみました
 
Tensorflow dynamically loadable XLA plugin ソースコード解析
Tensorflow  dynamically loadable XLA plugin ソースコード解析Tensorflow  dynamically loadable XLA plugin ソースコード解析
Tensorflow dynamically loadable XLA plugin ソースコード解析
 
Tiramisu概要
Tiramisu概要Tiramisu概要
Tiramisu概要
 
Tensor comprehensions
Tensor comprehensionsTensor comprehensions
Tensor comprehensions
 

Intel Nervana Graph とは?