Intel Nervana Graph とは？

Intel Nervana Graph
とは
@Vengineer
2017/05/22
2017/07/01更新
いつものように
ソースコードの中を
探ってみました

ブログ : Vengineerの戯言
http://blogs.yahoo.co.jp/verification_engineer
Twitter : ＠Vengineer
FPGAマガジン (No.16/17)
FPGAコミュニティのススメ
http://fpga.cqpub.co.jp/
自己紹介
SlideShare
https://www.slideshare.net/ssuser479fa3

この資料は、
各社の公開情報を
Google君で検索したものを
まとめたものです。
ご利用は、自己責任でお願いします

2016年8月9日、Intelは
Nervana Systemsを
3.5億ドル以上で買収
創立2年のスタートアップで、投資家から2500万ドル近くを調達していた
ということは、投資家は2年で10倍で売り抜けたということ
2年間で3億ドル
Softbank GroupのARM買収は240億ポンドなので、ざっくり 1/100
引用
：http://jp.techcrunch.com/2016/08/10/20160809intel-buys-deep-learning-startup-nervana-systems-f
or-a-reported-350-million/

Nervana Graph Compiler
引用：https://www.nervanasys.com/intel-nervana-graph-preview-release/
・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet
・Nervana Graph
・Transformers : CPU / GPU (CUDA)
Lowering

TensorFlow
グラフ
XLAグラフに変換
コード生成
JIT or AOT
LLVMを
利用
Lowering
TensorFlow XLA
CPU
GPU(CUDA)

Nervana Graph Compiler
と
TensorFlow XLA
何か同じじゃん

出ましたよ
https://www.intelnervana.com/intel-nervana-graph-and-neon-3-0-updates/
The connection between the XLA and
Intel Nervana Graph APIs was quite
straightforward given the similar
projects’ intent for a compact and
explicit intermediate representation.
While today the XLA/Intel Nervana
Graph integration is at a pre-alpha level,
we’d love for people to take it for a spin
and kick the tires. We’re working on
ironing out known performance issues and
improving op and backend support.
Intel Nervana Graph Beta : 2017/6/22

neon
https://github.com/NervanaSystems/neon
最新バージョンは、v1.9
ARMのNEONと同じ名前だけど
neon is
Intel Nervana's reference
deep learning framework committed
to best performance on all hardware

Datasets
Images: MNIST, CIFAR-10, ImageNet 1K,
PASCAL VOC, Mini-Places2
Text: IMDB, Penn Treebank,
Shakespeare Text, bAbI, Hutter-prize
Video: UCF101
Others: flickr8k, flickr30k, COCO

neon vs cuDNN 4
“Not so fast, FFT”: Winograd　(March 3, 2016)
引用：https://www.nervanasys.com/winograd/

cuDNN 5
Optimizing Recurrent Neural Networks in
cuDNN 5 (April 6, 2016)
https://devblogs.nvidia.com/parallelforall/optimizing-recurren
t-neural-networks-cudnn-5/
Faster forward
and backward convolutions
using the Winograd
convolution algorithm;

Winogradで高速化！
Fast Algorithms
for Convolutional Neural Networks
Andrew Lavin, Scott Gray
https://arxiv.org/abs/1509.09308
Going beyond full utilization: The inside scoop
on Nervana’s Winograd kernels　(June 29, 2016)
https://www.nervanasys.com/winograd-2/

neon v1.3 vs cuDNN v5.1
Still not slowing down: Benchmarking optimized
Winograd implementations　(July 25, 2016)
引用：https://www.nervanasys.com/winograd-3/
vs cuDNN v4 vs cuDNN v5.1

Scott Gray さん
https://twitter.com/scottgray76
High-Performance GPU kernels for deep learning
• Fast matrix multiply for small minibatches
• Direct convolution leveraging GEMM advances
• Even faster convolution with Winograd
Nervana (2014年10月〜 2017年7月)
現在は、Open AI所属 (〜 2017年7月)　
引用
：http://on-demand.gputechconf.com/gtc/2016/presentation/s6485-scott-gray-gpu-programming-deep-learnin
g.pdf

Graph Compilerの位置づけ
引用：http://pc.watch.impress.co.jp/docs/news/1034408.html

MKL-DNN Support
Mar 23, 2017 ：Intelに買収された後
To install with Intel MKL-DNN support, first download
MKL-DNN from [here]
・(https://github.com/01org/mkl-dnn) and follow the
installation instructions
・there to install MKL-DNN. Set environment variable
MKLDNN_ROOT to point to
・the installated location and follow the rest of the
steps to install Ngraph
引用：https://github.com/NervanaSystems/ngraph/commit/f3b7306214f40b4c1b4c40e3e223080797afb382

Transformer API
・CPU と GPU をサポート
Memory usage optimization passes
Transformers allow users to register an included
set of optional compiler passes
for debug and visualization.
・GPU
automatic kernel fusion/compounding
for increased performance
・LLVMのPassのような仕組み
引用：https://github.com/NervanaSystems/ngraph/blob/master/README.md

グラフを生成する
・Nervana Graph構造
Data Dependencies
Initializers
Non-data Control Dependencies
・General properties of ops
・Op Hierarchy
・Ops influencing evaluation
・Derivatives
引用：https://github.com/NervanaSystems/ngraph/blob/master/doc/source/building_graphs.rst

例題
import ngraph as ng
import ngraph.transformers as ngt
x = ng.placeholder(())
x_plus_one = x + 1
transformer = ngt.make_transformer()
plus_one = transformer.computation(x_plus_one, x)
for i in range(5):
print(plus_one(i))
引用：https://github.com/NervanaSystems/ngraph/blob/master/doc/source/overview.rst

将来サポートするもの？
・Nervana Graph serialization/deserialization
・Further improvements/abstractions to graph
composability for usability/optimization
・Distributed, heterogeneous backend target support
・C APIs for interoperability to enable other languages
to create/execute graphs
・Better debugging
・Support for model deployment
引用：https://github.com/NervanaSystems/ngraph/blob/master/README.md

コレ以降、
Intel Nervana Graph Compilerの
ソースコードを探っていいきます
ngraph
https://github.com/NervanaSystems/ngraph

Caffeでの例
from __future__ import print_function
from ngraph.frontends.caffe.cf_importer.importer import
parse_prototxt
model = "sum.prototxt"
op_map = parse_prototxt(model,verbose=True)
op = op_map.get("D")
res = ngt.make_transformer().computation(op)()
print("Result is:",res)
引用：https://github.com/NervanaSystems/ngraph/blob/master/doc/source/caffe.rst

TensorFlowでの例
x = tf.constant(1.)
y = tf.constant(2.)
f = x + y
importer = TFImporter()
importer.import_graph_def(tf.Session().graph_def)
f_ng = importer.get_op_handle(f)
f_result = transformer.computation(f_ng)()
print(f_result)
引用：https://github.com/NervanaSystems/ngraph/blob/master/doc/source/tensorflow.rst

Transformers
Transformers are used to convert the Op graph into a backend
specific executable format. Once the graph has been defined,
one or more computations are created using a transformer.
Computations are handles to executable objects created by
the transformer, which can be called to evaluate a subset of
the entire graph. All transformers must implement a common
abstract interface allowing users to easily switch between
backends without altering their computation graph definition.
サポートしているバックエンド
・CPUs (via NumPy)
・NVIDIA GPUs (via PyCUDA)
引用：https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst

Transformersの生成
1)、デフォルト
from ngraph.transformers import make_transformer
transformer = make_transformer()
2)、ファクトリを利用
available_transformers = ngt.transformer_choices()
if 'gpu' in available_transformers:
factory = ngt.make_transformer_factory('gpu')
ngt.set_transformer_factory(factory)

Computations
Computation objects are created by the transformer and
provide an interface to evaluate a subset of the graph. The
format of the executable used for evaluation depends on the
transformer that created the computation. For example the
CPU transformer generates python NumPy code which is called
to evaluate the computation, while the GPU transformer
generates a series of CUDA kernels which can be called to
evaluate the computation.

Computationsの生成
import ngraph as ng
a = ng.constant(4)
b = ng.placeholder(())
c = ng.placeholder(())
d = ng.multiply(a, b)
e = ng.add(d, c)
example_comp = transformer.computation(e, b, c)

Computationsの実行
example_comp = transformer.computation(e, b, c)
　result_e = eの戻り値
　b = 第一引数
　c = 第二引数
result_e = example_comp(2, 7) : b = 2, c = 7
result_e = (4 * b) + c => ( 4*2 ) + 7 = 15

Computationsの実行
複数の戻り値
example_comp2 = transformer.computation([d, e], b, c)
　result_d = dの戻り値, result_e = eの戻り値
　b = 第一引数
　c = 第二引数
result_d, result_e = example_comp2(2, 7)
result_d = (4 * b) = (4 * 2) = 8
result_e = (4 * b) + c => (4 * 2) + 7 = 15

Transformerの実装
・Transformerの生成
・Computationの生成
・Transformerの初期化
Transformer Passes
Intialization Computation
Tensor Description Initialization
Computation Transformation
・Computationの実行
引用：https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_implementation.rst

Transformerの実装
base.py : Transformer_ABC_Meta
base.py : Transformer (ベース)
cputransform.py : CPUTransformer
gputransform.py : GPUTransformer
hetrtransform.py : HetrTransformer
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers

Transformer_ABC_Metaクラス
class Transformer_ABC_Meta(abc.ABCMeta):
"""
metaclass for the backend objects
takes care of registering all the backend subclasses
"""
def __init__(cls, name, bases, dict_):
if not hasattr(cls, 'transformers'):
# First possible transformer class sets things up
cls.transformers = {}
# If this transformer has a transformer_name, register it
transformer_name = getattr(cls, 'transformer_name', None)
if transformer_name is not None:
cls.transformers[transformer_name] = cls
super(Transformer_ABC_Meta, cls).__init__(name, bases, dict_)
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py

Transformerクラス
class Transformer(with_metaclass(Transformer_ABC_Meta, object)):
"""
Produce an executable version of op-graphs.
Computations are subsets of Ops to compute. The transformer determines storage
allocation and transforms the computations and allocations into functions.
Arguments:
fusion (bool): Whether to combine sequences of operations into one operation.
**kwargs: Args for related classes.
Attributes:
computations (:obj:`set` of :class:`Computation`): The set of requested computations.
all_results (:obj:`set` of :class:`ngraph.op_graph.op_graph.Op`): A root set of Ops that
need to be computed.
finalized (bool): True when transformation has been performed.
initialized (bool): True when variables have been initialized/restored.
fusion (bool): True when fusion was enabled.
device_buffers (set): Set of handles for storage allocations.
"""

Computationの実装
base.py : Computation (ベース)
cputransform.py : CPUComputation
gputransform.py : GPUComputation
hetrtransform.py : HetrComputation

Computationクラス
class Computation(NameableValue):
"""
A handle for a computation function.
Arguments:
transformer (obj:`Transformer`): The associated transformer.
returns: If an Op, return the value
of the Op, if sequence of Ops, return the sequence of values, if
a set return a map, if None, return None.
*args: AllocationOps marked input will be arguments to the function.
**kwargs: Args for related classes.
"""

Computationクラス
def __init__(self, transformer, computation, **kwargs):
super(Computation, self).__init__(**kwargs)
self.transformer = transformer
self.computation = computation
self.computation_name = None
self.executor = None
self.send_nodes = []
self.recv_nodes = []
self.scatter_send_nodes = []
self.scatter_recv_nodes = []
self.gather_send_nodes = []
self.gather_recv_nodes = []
self.allreduce_nodes = []

Passの実装 (その1)
passes.py GraphPass (ベースクラス)
passes.py GraphBuildingPass
passes.py GraphRewritePass (New)
passes.py PeepholeGraphPass
passes.py RequiredTensorShaping
passes.py CPUTensorShaping
passes.py SimplePrune
flexpass.py FlexDtypePass
flexpass.py FlexDECPass
flexpass.py ClearTensorDescriptions
nviz.py JSONPass(GraphPass):
nviz.py VizPass(GraphPass):
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes/base.py

Passの実装 (その2)
layout.py PruneContiguousPass
layout.py GenerateLayoutDomains
layout.py GenerateLayoutConstraints
layout.py AssignLayouts
layout.py AddLayoutConversions
cpufusion.py FusionPass
cpulayout.py CPUTensorLayout
gpusimplification.py GPUSubstitution
hetrpasses.py DeviceAssignPass
hetrpasses.py CommunicationPass
hetrpasses.py DistributedPass
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes

Passの実装 (その3) (New)
mkldnnpasses.py MklCreateOpDescriptors
mkldnnpasses.py MklAddLayoutConversions
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes

GraphPassクラス
class GraphPass(with_metaclass(abc.ABCMeta, object)):
@abc.abstractmethod
def do_pass(self, ops, transformer):
pass
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes/passes.py

CPUTransformerクラス
class CPUTransformer(Transformer):
def __init__(self, **kwargs):
super(CPUTransformer, self).__init__(**kwargs)
self.current_computation = None
self.conv_engine = CPUConvEngine()
self.init_code = CPUCodeGenerator(self)
self.allocate_storage_code = CPUCodeGenerator(self)
self.allocate_code = CPUCodeGenerator(self)
self.compute_code = CPUCodeGenerator(self)
self.code = CPUCodeGenerator(self)
…..
引用：https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py

CPUCodeGeneratorクラス
class CPUCodeGenerator(PyGen):
def __init__(self, transformer, **kwargs):
super(CPUCodeGenerator, self).__init__(prefix="op",
**kwargs)
self.transformer = transformer
def name(self, x):
if isinstance(x, CPUDeviceBufferStorage):
return x.ref_str
if isinstance(x, CPUDeviceTensor):
return x.ref_str
return x

CPUComputationクラス
class CPUComputation(Computation):
def __init__(self, transformer, computation, **kwargs):
super(CPUComputation, self).__init__(transformer,
computation, **kwargs)
self.pool_params = dict()
self.pool_slices = dict()
self.conv_params = dict()
self.conv_slices = dict()

ありがとうございました
ブログ : Vengineerの戯言
http://blogs.yahoo.co.jp/verification_engineer
Twitter : ＠Vengineer
勉強会主催 :
Xilinx Zynq MPSoC (2016/02/20)
Altera SDK for OpenCL (2016/06/10)
Xilinx SDSoC (2017/01/28)
PYNQ祭り (2017/03/04)
FPGAディープラーニング実践懇親会 (2017/05/20)

Intel Nervana Graph とは？

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Intel Nervana Graph とは？

Similar to Intel Nervana Graph とは？ (20)

More from Mr. Vengineer

More from Mr. Vengineer (20)

Intel Nervana Graph とは？