SlideShare a Scribd company logo
1 of 105
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
PyTorch under the hood
A guide to understand PyTorch internals
Christian S. Perone
(christian.perone@gmail.com)
http://blog.christianperone.com
PyData Montreal, Feb 2019
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Agenda
TENSORS
Tensors
Python objects
Zero-copy
Tensor storage
Memory allocators (CPU/GPU)
The big picture
JIT
Just-in-time compiler
Tracing
Scripting
Why TorchScript ?
Building IR and JIT Phases
Optimizations
Serialization
Using models in other languages
PRODUCTION
Some tips
Q&A
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHO AM I
Christian S. Perone
14 years working with Machine
Learning, Data Science and Software
Engineering in industry R&D
Blog at
blog.christianperone.com
Open-source projects at
https://github.com/perone
Twitter @tarantulae
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DISCLAIMER
PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DISCLAIMER
PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
This is not a talk to teach you the basics of PyTorch or how to
train your network, but to teach you how PyTorch
components works under the hood in a intuitive way;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DISCLAIMER
PyTorch is a moving target, Deep Learning ecosystem moves
fast and big changes happens every week;
This is not a talk to teach you the basics of PyTorch or how to
train your network, but to teach you how PyTorch
components works under the hood in a intuitive way;
This talk is updated to the PyTorch v.1.0.1 version;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section I
TENSORS
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
>>> t.shape # a shape
torch.Size([2, 2])
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Simply put, TENSORS are a generalization of vectors and matrices.
In PyTorch, they are a multi-dimensional matrix containing elements
of a single data type.
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.]
[ 1., -1.]])
>>> t.dtype # They have a type
torch.float32
>>> t.shape # a shape
torch.Size([2, 2])
>>> t.device # and live in some device
device(type='cpu')
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
To do automatic differentiation, PyTorch uses Autograd, which
is an augmentation on top of the ATen framework;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSORS
Although PyTorch has an elegant python first design, all PyTorch
heavy work is actually implemented in C++.
In Python, the integration of C++ code is (usually) done using
what is called an extension;
PyTorch uses ATen, which is the foundational tensor operation
library on which all else is built;
To do automatic differentiation, PyTorch uses Autograd, which
is an augmentation on top of the ATen framework;
In the Python API, PyTorch previously had separate
Variable and a Tensor types, after v.0.4.0 they were
merged into Tensor .
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
struct _typeobject *ob_type
Py_ssize_t ob_refcnt
objectPyObject
double ob_fval
PyObject_HEAD
objectPyFloatObject
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
struct THPVariable {
PyObject_HEAD
torch::autograd::Variable cdata;
PyObject* backward_hooks;
};
The TH prefix is from TorcH, and P means Python.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
QUICK RECAP PYTHON OBJECTS
struct THPVariable {
PyObject_HEAD
torch::autograd::Variable cdata;
PyObject* backward_hooks;
};
(object fields)
PyObject_HEAD (w/ ref counter)
objectTHPVariable
variable_a
variable_b
Ref Count = 1
Ref Count = 2
The TH prefix is from TorcH, and P means Python.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
IN PYTHON, EVERYTHING IS AN OBJECT
>>> a = 300
>>> b = 300
>>> a is b
False
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
IN PYTHON, EVERYTHING IS AN OBJECT
>>> a = 300
>>> b = 300
>>> a is b
False
>>> a = 200
>>> b = 200
>>> a is b
True
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
IN PYTHON, EVERYTHING IS AN OBJECT
>>> a = 300
>>> b = 300
>>> a is b
False
>>> a = 200
>>> b = 200
>>> a is b
True (object fields)
PyObject_HEAD
objectPyIntObject
a
b
Ref Count = 1
Ref Count = 2
(object fields)
PyObject_HEAD
objectPyIntObject
(object fields)
PyObject_HEAD
objectPyIntObject
a
b
Ref Count = 1
Ref Count = 1
A typical Python program spend much of its time
allocating/deallocating integers. CPython then caches the small
integers.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
>>> torch_array.add_(1.0)
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
It is very common to load tensors in numpy and convert them to
PyTorch, or vice-versa;
>>> np_array = np.ones((2,2))
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.tensor(np_array)
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
>>> torch_array.add_(1.0)
>>> np_array
array([[1., 1.], # array is intact, a copy was made
[1., 1.]])
Underline after an operation means an in-place operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Now imagine that you have a batch of 128 images, 3 channels
each (RGB) and with size of 224x224;
0
1
1
1
0
0
1
1
1
0
0
1
1
1
1
1
1
0
1
0
1
1
1
1
0
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
1
0
0
1
1
0
0
0
Column
Row
Channel
This will yield a size in memory of ∼ 74MB. We don’t want to
duplicate memory (except when copying them to discrete GPUs
of course);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
>>> np_array
array([[2., 2.],
[2., 2.]])
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Let’s see now a slightly different code using the function
torch.from_numpy() this time:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array.add_(1.0)
>>> np_array
array([[2., 2.],
[2., 2.]])
The original numpy array was changed, because it used a zero-copy
operation.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
Difference between in-place and standard operations might not be
so clear in some cases:
>>> np_array
array([[1., 1.],
[1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> np_array = np_array + 1.0
>>> torch_array
tensor([[1., 1.],
[1., 1.]], dtype=torch.float64)
However, if you use np_array += 1.0 , that is an in-place
operation that will change torch_array memory.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ZERO-COPYING TENSORS
at::Tensor tensor_from_numpy(PyObject* obj) {
// (...) - omitted for brevity
auto array = (PyArrayObject*)obj;
int ndim = PyArray_NDIM(array);
auto sizes = to_aten_shape(ndim, PyArray_DIMS(array));
auto strides = to_aten_shape(ndim, PyArray_STRIDES(array));
// (...) - omitted for brevity
void* data_ptr = PyArray_DATA(array);
auto& type = CPU(dtype_to_aten(PyArray_TYPE(array)));
Py_INCREF(obj);
return type.tensorFromBlob(data_ptr, sizes, strides,
[obj](void* data) {
AutoGIL gil;
Py_DECREF(obj);
});
}
Pay attention to the reference counting using Py_INCREF() and the
call to tensorFromBlob() function.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
DATA POINTERS
(object fields)
data_pointer*
objectPyArrayObject
(object fields)
data_pointer*
objectFloatTensor
The tensor FloatTensor did a copy of the numpy array data
pointer and not of the contents. The reference is kept safe by the
Python reference counting mechanism.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
struct C10_API StorageImpl final : (...) {
// (...)
private:
// (...)
caffe2::TypeMeta data_type_;
DataPtr data_ptr_;
int64_t numel_;
Allocator* allocator_;
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The abstraction responsible for holding the data isn’t actually the
Tensor , but the Storage .
struct C10_API StorageImpl final : (...) {
// (...)
private:
// (...)
caffe2::TypeMeta data_type_;
DataPtr data_ptr_;
int64_t numel_;
Allocator* allocator_;
}
Holds a pointer to the raw data and contains information such as
the size and allocator;
Storage is a dumb abstraction, there is no metadata telling us
how to interpret the data it holds;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
>>> tensor_a = torch.ones((2, 2))
>>> tensor_b = tensor_a.view(4)
>>> tensor_a_data = tensor_a.storage().data_ptr()
>>> tensor_b_data = tensor_b.storage().data_ptr()
>>> tensor_a_data == tensor_b_data
True
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TENSOR STORAGE
The Storage abstraction is very powerful because it decouples
the raw data and how we can interpret it;
We can have multiple tensors sharing the same storage, but
with different interpretations, also called views, but without
duplicating memory:
>>> tensor_a = torch.ones((2, 2))
>>> tensor_b = tensor_a.view(4)
>>> tensor_a_data = tensor_a.storage().data_ptr()
>>> tensor_b_data = tensor_b.storage().data_ptr()
>>> tensor_a_data == tensor_b_data
True
tensor_b is a different view (interpretation) of the same data
present in the underlying storage that is shared between both
tensors.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
MEMORY ALLOCATORS (CPU/GPU)
The tensor storage can be allocated either in the CPU memory
or GPU, therefore a mechanism is required to switch between
these different allocations:
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
MEMORY ALLOCATORS (CPU/GPU)
The tensor storage can be allocated either in the CPU memory
or GPU, therefore a mechanism is required to switch between
these different allocations:
struct Allocator {
virtual ~Allocator() {}
virtual DataPtr allocate(size_t n) const = 0;
virtual DeleterFnPtr raw_deleter() const {...}
void* raw_allocate(size_t n) {...}
void raw_deallocate(void* ptr) {...}
};
There are Allocator s that will use the GPU allocators such as
cudaMallocHost() when the storage should be used for the
GPU or posix_memalign() POSIX functions for data in the
CPU memory.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
THE BIG PICTURE
(object fields)
Storage *storage
objectTensor
Allocator *allocator
(object fields)
DataPtr data_ptr
objectStorage
raw_deallocate()
(object fields)
raw_allocate()
objectAllocator
Raw Data
The Tensor has a Storage which in turn has a pointer to
the raw data and to the Allocator to allocate memory
according to the destination device.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section II
JIT
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;
However, this poses problems for optimization and for
decoupling it from Python (the model itself is Python code);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
PyTorch is eager by design, which means that it is easily
hackable to debug, inspect, etc;
However, this poses problems for optimization and for
decoupling it from Python (the model itself is Python code);
PyTorch 1.0 introduced torch.jit , which has two main
methods to convert a PyTorch model to a serializable and
optimizable format;
TorchScript was also introduced as a statically-typed subset of
Python;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
JIT - JUST-IN-TIME COMPILER
Two very different worlds with their own requirements.
Prototype, debug, train,
experiment
EAGER MODE
Optimization, other
languages, deployment
SCRIPT MODE
tracing
scripting
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
>>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
def my_function(x):
if x.mean() > 1.0:
r = torch.tensor(1.0)
else:
r = torch.tensor(2.0)
return r
>>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
>>> ftrace.graph
graph(%x : Float(2, 2)) {
%4 : Float() = prim::Constant[value={2}]()
%5 : Device = prim::Constant[value="cpu"]()
%6 : int = prim::Constant[value=6]()
%7 : bool = prim::Constant[value=0]()
%8 : bool = prim::Constant[value=0]()
%9 : Float() = aten::to(%4, %5, %6, %7, %8)
%10 : Float() = aten::detach(%9)
return (%10); }
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
To call the JIT’ed function, just call the forward() method:
>>> x = torch.ones(2, 2)
>>> ftrace.forward(x)
tensor(2.)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
TRACING
To call the JIT’ed function, just call the forward() method:
>>> x = torch.ones(2, 2)
>>> ftrace.forward(x)
tensor(2.)
However, tracing will not record any control-flow like if statements
or loops, it executes the code with the given context and creates the
graph. You can see this limitation below:
>>> x = torch.ones(2, 2).add_(1.0)
>>> ftrace.forward(x)
tensor(2.)
According to my_function() , result should have been 1.0. Tracing
also checks for differences between traced and Python function, but
what about Dropout ?
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
Another alternative is to use scripting, where you can use decorators
such as @torch.jit.script :
@torch.jit.script
def my_function(x):
if bool(x.mean() > 1.0):
r = 1
else:
r = 2
return r
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
>>> my_function.graph
graph(%x : Tensor) {
%2 : float = prim::Constant[value=1]()
%5 : int = prim::Constant[value=1]()
%6 : int = prim::Constant[value=2]()
%1 : Tensor = aten::mean(%x)
%3 : Tensor = aten::gt(%1, %2)
%4 : bool = prim::Bool(%3)
%r : int = prim::If(%4)
block0() {
-> (%5)
}
block1() {
-> (%6)
}
return (%r);
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
When we check the results again:
>>> x = torch.ones(2, 2)
>>> my_function(x)
2
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SCRIPTING
The my_function() is now a ScriptModule :
>>> type(my_function)
torch.jit.ScriptModule
When we check the results again:
>>> x = torch.ones(2, 2)
>>> my_function(x)
2
>>> x = torch.ones(2, 2).add_(1.0)
>>> my_function(x)
1
Control-flow logic was preserved !
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
Use it in production with C++ (no GIL) or other languages;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
Use it in production with C++ (no GIL) or other languages;
Capitalize on optimizations (whole program);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
WHY TORCHSCRIPT ?
The concept of having a well-defined Intermediate
Representation (IR) is very powerful, it’s the main concept
behind LLVM platform as well;
This opens the door to:
Decouple the model (computationl graph) from Python runtime;
Use it in production with C++ (no GIL) or other languages;
Capitalize on optimizations (whole program);
Split the development world of hackable and easy to debug from
the world of putting these models in production and optimize
them.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BUILDING THE IR
To build the IR, PyTorch takes leverage of the Python Abstract
Syntax Tree (AST) which is a tree representation of the syntactic
structure of the source code.
>>> ast_mod = ast.parse("print(1 + 2)")
>>> astpretty.pprint(ast_mod.body[0], show_offsets=False)
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
BinOp(
left=Num(n=1),
op=Add(),
right=Num(n=2),
),
],
keywords=[],
),
)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BUILDING THE IR
print(1 + 2)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
PYTORCH JIT PHASES
Parsing Checking Optimization
Translation Execution○
AST Code
or
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
EXECUTING
Just like Python interpreter executes your code, PyTorch has a
interpreter that executes the IR instructions:
bool runImpl(Stack& stack) {
auto& instructions = function->instructions;
size_t last = instructions.size();
while (pc < last) {
auto& inst = instructions[pc];
try {
loadTensorsFromRegisters(inst.inputs, stack);
size_t new_pc = pc + 1 + inst.callback(stack);
for (int i = inst.outputs.size - 1; i >= 0; --i) {
int reg = get(inst.outputs, i);
registers[reg] = pop(stack);
}
pc = new_pc;
// (...) omitted
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Many optimizations can be used on the computational graph of the
model, such as Loop Unrolling:
for i.. i+= 1 for i.. i+= 4
for j.. for j..
code(i, j) code(i, j)
code(i+1, j)
code(i+2, j)
code(i+3, j)
remainder loop
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Also Peephole optimizations such as:
x.t().t() = x
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Also Peephole optimizations such as:
x.t().t() = x
Example:
def dumb_function(x):
return x.t().t()
>>> traced_fn = torch.jit.trace(dumb_function,
... torch.ones(2,2))
>>> traced_fn.graph_for(torch.ones(2,2))
graph(%x : Float(*, *)) {
return (%x);
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
OPTIMIZATIONS
Also Peephole optimizations such as:
x.t().t() = x
Example:
def dumb_function(x):
return x.t().t()
>>> traced_fn = torch.jit.trace(dumb_function,
... torch.ones(2,2))
>>> traced_fn.graph_for(torch.ones(2,2))
graph(%x : Float(*, *)) {
return (%x);
}
Other optimizations include Constant Propagation, Dead Code
Elimination (DCE), fusion, inlining, etc.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
$ file resnet.pt
resnet.pt: Zip archive data
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
>>> resnet = torch.jit.trace(models.resnet18(),
... torch.rand(1, 3, 224, 224))
>>> resnet.save("resnet.pt")
$ file resnet.pt
resnet.pt: Zip archive data
$ unzip resnet.pt
Archive: resnet.pt
extracting: resnet/version
extracting: resnet/code/resnet.py
extracting: resnet/model.json
extracting: resnet/tensors/0
(...)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)
model.json
{"parameters":
[{ "isBuffer": false,
"tensorId": "1",
"name": "weight" }],
"name": "conv1",
"optimize": true}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
SERIALIZATION
code/resnet.py
op_version_set = 0
def forward(self, input_1: Tensor) -> Tensor:
input_2 = torch._convolution(input_1, self.conv1.weight, ...)
# (...)
input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias,
self.bn1.running_mean, self.bn1.running_var, ...)
# (...)
model.json
{"parameters":
[{ "isBuffer": false,
"tensorId": "1",
"name": "weight" }],
"name": "conv1",
"optimize": true}
model.json
[{"isBuffer": true,
"tensorId": "4",
"name": "running_mean"},
{"isBuffer": true,
"tensorId": "5",
"name": "running_var"}],
"name": "bn1",
"optimize": true}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
USING THE MODEL IN C++
PyTorch also has a C++ API that you can use to load/train models in
C++. This is good for production, mobile, embedded devices, etc.
Example of loading a traced model in PyTorch C++ API:
#include <torch/script.h>
int main(int argc, const char* argv[])
{
auto module = torch::jit::load("resnet.pt");
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));
at::Tensor output = module->forward(inputs).toTensor();
}
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
USING THE MODEL IN NODEJS
Complete tutorial at https://goo.gl/7wMJuS.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section III
PRODUCTION
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
They often use HTTP/1;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
They often use HTTP/1;
They seldom do batching (important for GPUs);
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
ISSUES WITH TUTORIALS
Be careful with online tutorials using Flask, etc. They are simple,
but they often fail on good practices:
They often use JSON and base64 to serialize images. This adds ∼
33% overhead per call (uncompressed);
They don’t pay attention to zero-copy practices, so they often
transform, reshape, convert to numpy, convert to PyTorch, etc;
They often use HTTP/1;
They seldom do batching (important for GPUs);
They never put that "production" code in production.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
PREFER BINARY SERIALIZATION FORMATS
Prefer using good binary serialization methods such as Protobuf
that offers a schema and a schema evolution mechanism.
Example from EuclidesDB RPC message:
message AddImageRequest {
int32 image_id = 1;
bytes image_data = 2;
// This field can encode JSON data
bytes image_metadata = 3;
repeated string models = 4;
}
* http://euclidesdb.readthedocs.io
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
AVOID EXTRA COPIES
Be careful to avoid extra copies of your tensors, especially during
pre-processing;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
AVOID EXTRA COPIES
Be careful to avoid extra copies of your tensors, especially during
pre-processing;
You can use in-place operations. It is a functional anti-pattern
because it introduces side-effects, but it’s a fair price to pay for
performance;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
AVOID EXTRA COPIES
Be careful to avoid extra copies of your tensors, especially during
pre-processing;
You can use in-place operations. It is a functional anti-pattern
because it introduces side-effects, but it’s a fair price to pay for
performance;
Caveat: in-place operations doesn’t make much sense when you
need gradients. PyTorch uses tensor versioning to catch that:
>>> a = torch.tensor(1.0, requires_grad=True)
>>> y = a.tanh()
>>> y.add_(2.0)
>>> y.backward() # error !
>>> a._version
0
>>> y._version
1
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
Client Server
Time
HTTP 2.0 - Multiplexing
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
Client Server
Time
HTTP 2.0 - Multiplexing
Use HTTP 2.0 if possible, and avoid the head-of-line blocking;
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
A TALE OF TWO HTTPS
Client Server
Time
HTTP 1.0
Client Server
Time
HTTP 1.1 - Pipelining
Client Server
Time
HTTP 1.1 - HoL
Client Server
Time
HTTP 2.0 - Multiplexing
Use HTTP 2.0 if possible, and avoid the head-of-line blocking;
Even better, you can use frameworks such as gRPC that uses
HTTP/2.0 and Protobuf.
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BATCHING
Batching data is a way to amortize the performance bottleneck.
GPU
Non-batching
Requests
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
BATCHING
Batching data is a way to amortize the performance bottleneck.
GPU
Non-batching
Requests
GPU
Batch 2 Batch 1
BatchingRequests
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Section IV
Q&A
PyTorch under the hood - Christian S. Perone (2019)
TENSORS JIT PRODUCTION Q&A
Q&A
Thanks !

More Related Content

What's hot

What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...Simplilearn
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyAlexander Sergeev
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningRoberto Pereira Silveira
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
 

What's hot (20)

What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
 
Bert
BertBert
Bert
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
BERT
BERTBERT
BERT
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made Easy
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
Python ppt
Python pptPython ppt
Python ppt
 

Similar to PyTorch under the hood

data-microscopes
data-microscopesdata-microscopes
data-microscopesStephen Tu
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)Hansol Kang
 
a quick Introduction to PyPy
a quick Introduction to PyPya quick Introduction to PyPy
a quick Introduction to PyPyKai Aras
 
Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?takluyver
 
Current State of Python Packaging
Current State of Python PackagingCurrent State of Python Packaging
Current State of Python PackagingClayton Parker
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataGael Varoquaux
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimeNational Cheng Kung University
 
Getting Started with Python
Getting Started with PythonGetting Started with Python
Getting Started with PythonEmily Dong
 
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor DumitrescuThinkOpen
 
PYTHON FOR BEGINNERS (BASICS OF PYTHON)
PYTHON FOR BEGINNERS (BASICS OF PYTHON)PYTHON FOR BEGINNERS (BASICS OF PYTHON)
PYTHON FOR BEGINNERS (BASICS OF PYTHON)HemaArora2
 
Python_Haegl.powerpoint presentation. tx
Python_Haegl.powerpoint presentation. txPython_Haegl.powerpoint presentation. tx
Python_Haegl.powerpoint presentation. txvishwanathgoudapatil1
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning ProgrammingPaulSombat
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
Introduction to python 3
Introduction to python 3Introduction to python 3
Introduction to python 3Youhei Sakurai
 

Similar to PyTorch under the hood (20)

PyTorch 2 Internals
PyTorch 2 InternalsPyTorch 2 Internals
PyTorch 2 Internals
 
data-microscopes
data-microscopesdata-microscopes
data-microscopes
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
a quick Introduction to PyPy
a quick Introduction to PyPya quick Introduction to PyPy
a quick Introduction to PyPy
 
chapter1(3).pdf
chapter1(3).pdfchapter1(3).pdf
chapter1(3).pdf
 
Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?Python packaging: how did we get here, and where are we going?
Python packaging: how did we get here, and where are we going?
 
Current State of Python Packaging
Current State of Python PackagingCurrent State of Python Packaging
Current State of Python Packaging
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Analyzing and Sharing HDF5 Data with Python
Analyzing and Sharing HDF5 Data with PythonAnalyzing and Sharing HDF5 Data with Python
Analyzing and Sharing HDF5 Data with Python
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptx
 
PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtime
 
Getting Started with Python
Getting Started with PythonGetting Started with Python
Getting Started with Python
 
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
 
PYTHON FOR BEGINNERS (BASICS OF PYTHON)
PYTHON FOR BEGINNERS (BASICS OF PYTHON)PYTHON FOR BEGINNERS (BASICS OF PYTHON)
PYTHON FOR BEGINNERS (BASICS OF PYTHON)
 
Python_Haegl.powerpoint presentation. tx
Python_Haegl.powerpoint presentation. txPython_Haegl.powerpoint presentation. tx
Python_Haegl.powerpoint presentation. tx
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Introduction to python 3
Introduction to python 3Introduction to python 3
Introduction to python 3
 
Python Tutorial .pdf
Python Tutorial .pdfPython Tutorial .pdf
Python Tutorial .pdf
 

More from Christian Perone

Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionChristian Perone
 
Bayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesBayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesChristian Perone
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningChristian Perone
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooChristian Perone
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Machine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnMachine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnChristian Perone
 
Python - Introdução Básica
Python - Introdução BásicaPython - Introdução Básica
Python - Introdução BásicaChristian Perone
 
C++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresC++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresChristian Perone
 

More from Christian Perone (10)

Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
 
Bayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesBayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studies
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural Zoo
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Machine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnMachine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learn
 
Python - Introdução Básica
Python - Introdução BásicaPython - Introdução Básica
Python - Introdução Básica
 
C++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresC++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing features
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

PyTorch under the hood

  • 1. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A PyTorch under the hood A guide to understand PyTorch internals Christian S. Perone (christian.perone@gmail.com) http://blog.christianperone.com PyData Montreal, Feb 2019
  • 2. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Agenda TENSORS Tensors Python objects Zero-copy Tensor storage Memory allocators (CPU/GPU) The big picture JIT Just-in-time compiler Tracing Scripting Why TorchScript ? Building IR and JIT Phases Optimizations Serialization Using models in other languages PRODUCTION Some tips Q&A
  • 3. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHO AM I Christian S. Perone 14 years working with Machine Learning, Data Science and Software Engineering in industry R&D Blog at blog.christianperone.com Open-source projects at https://github.com/perone Twitter @tarantulae
  • 4. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DISCLAIMER PyTorch is a moving target, Deep Learning ecosystem moves fast and big changes happens every week;
  • 5. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DISCLAIMER PyTorch is a moving target, Deep Learning ecosystem moves fast and big changes happens every week; This is not a talk to teach you the basics of PyTorch or how to train your network, but to teach you how PyTorch components works under the hood in a intuitive way;
  • 6. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DISCLAIMER PyTorch is a moving target, Deep Learning ecosystem moves fast and big changes happens every week; This is not a talk to teach you the basics of PyTorch or how to train your network, but to teach you how PyTorch components works under the hood in a intuitive way; This talk is updated to the PyTorch v.1.0.1 version;
  • 7. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section I TENSORS
  • 8. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type.
  • 9. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]])
  • 10. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]]) >>> t.dtype # They have a type torch.float32
  • 11. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]]) >>> t.dtype # They have a type torch.float32 >>> t.shape # a shape torch.Size([2, 2])
  • 12. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type. >>> import torch >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> t tensor([[ 1., -1.] [ 1., -1.]]) >>> t.dtype # They have a type torch.float32 >>> t.shape # a shape torch.Size([2, 2]) >>> t.device # and live in some device device(type='cpu')
  • 13. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension;
  • 14. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension; PyTorch uses ATen, which is the foundational tensor operation library on which all else is built;
  • 15. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension; PyTorch uses ATen, which is the foundational tensor operation library on which all else is built; To do automatic differentiation, PyTorch uses Autograd, which is an augmentation on top of the ATen framework;
  • 16. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSORS Although PyTorch has an elegant python first design, all PyTorch heavy work is actually implemented in C++. In Python, the integration of C++ code is (usually) done using what is called an extension; PyTorch uses ATen, which is the foundational tensor operation library on which all else is built; To do automatic differentiation, PyTorch uses Autograd, which is an augmentation on top of the ATen framework; In the Python API, PyTorch previously had separate Variable and a Tensor types, after v.0.4.0 they were merged into Tensor .
  • 17. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject;
  • 18. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; typedef struct _object { Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject;
  • 19. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; typedef struct _object { Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; struct _typeobject *ob_type Py_ssize_t ob_refcnt objectPyObject double ob_fval PyObject_HEAD objectPyFloatObject
  • 20. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS struct THPVariable { PyObject_HEAD torch::autograd::Variable cdata; PyObject* backward_hooks; }; The TH prefix is from TorcH, and P means Python.
  • 21. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A QUICK RECAP PYTHON OBJECTS struct THPVariable { PyObject_HEAD torch::autograd::Variable cdata; PyObject* backward_hooks; }; (object fields) PyObject_HEAD (w/ ref counter) objectTHPVariable variable_a variable_b Ref Count = 1 Ref Count = 2 The TH prefix is from TorcH, and P means Python.
  • 22. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A IN PYTHON, EVERYTHING IS AN OBJECT >>> a = 300 >>> b = 300 >>> a is b False
  • 23. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A IN PYTHON, EVERYTHING IS AN OBJECT >>> a = 300 >>> b = 300 >>> a is b False >>> a = 200 >>> b = 200 >>> a is b True
  • 24. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A IN PYTHON, EVERYTHING IS AN OBJECT >>> a = 300 >>> b = 300 >>> a is b False >>> a = 200 >>> b = 200 >>> a is b True (object fields) PyObject_HEAD objectPyIntObject a b Ref Count = 1 Ref Count = 2 (object fields) PyObject_HEAD objectPyIntObject (object fields) PyObject_HEAD objectPyIntObject a b Ref Count = 1 Ref Count = 1 A typical Python program spend much of its time allocating/deallocating integers. CPython then caches the small integers.
  • 25. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) Underline after an operation means an in-place operation.
  • 26. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.tensor(np_array) >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) Underline after an operation means an in-place operation.
  • 27. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.tensor(np_array) >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) >>> torch_array.add_(1.0) Underline after an operation means an in-place operation.
  • 28. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS It is very common to load tensors in numpy and convert them to PyTorch, or vice-versa; >>> np_array = np.ones((2,2)) >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.tensor(np_array) >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) >>> torch_array.add_(1.0) >>> np_array array([[1., 1.], # array is intact, a copy was made [1., 1.]]) Underline after an operation means an in-place operation.
  • 29. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Now imagine that you have a batch of 128 images, 3 channels each (RGB) and with size of 224x224; 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 0 0 Column Row Channel This will yield a size in memory of ∼ 74MB. We don’t want to duplicate memory (except when copying them to discrete GPUs of course);
  • 30. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array)
  • 31. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> torch_array.add_(1.0)
  • 32. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> torch_array.add_(1.0) >>> np_array array([[2., 2.], [2., 2.]])
  • 33. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Let’s see now a slightly different code using the function torch.from_numpy() this time: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> torch_array.add_(1.0) >>> np_array array([[2., 2.], [2., 2.]]) The original numpy array was changed, because it used a zero-copy operation.
  • 34. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array)
  • 35. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> np_array = np_array + 1.0
  • 36. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> np_array = np_array + 1.0 >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64)
  • 37. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS Difference between in-place and standard operations might not be so clear in some cases: >>> np_array array([[1., 1.], [1., 1.]]) >>> torch_array = torch.from_numpy(np_array) >>> np_array = np_array + 1.0 >>> torch_array tensor([[1., 1.], [1., 1.]], dtype=torch.float64) However, if you use np_array += 1.0 , that is an in-place operation that will change torch_array memory.
  • 38. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ZERO-COPYING TENSORS at::Tensor tensor_from_numpy(PyObject* obj) { // (...) - omitted for brevity auto array = (PyArrayObject*)obj; int ndim = PyArray_NDIM(array); auto sizes = to_aten_shape(ndim, PyArray_DIMS(array)); auto strides = to_aten_shape(ndim, PyArray_STRIDES(array)); // (...) - omitted for brevity void* data_ptr = PyArray_DATA(array); auto& type = CPU(dtype_to_aten(PyArray_TYPE(array))); Py_INCREF(obj); return type.tensorFromBlob(data_ptr, sizes, strides, [obj](void* data) { AutoGIL gil; Py_DECREF(obj); }); } Pay attention to the reference counting using Py_INCREF() and the call to tensorFromBlob() function.
  • 39. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A DATA POINTERS (object fields) data_pointer* objectPyArrayObject (object fields) data_pointer* objectFloatTensor The tensor FloatTensor did a copy of the numpy array data pointer and not of the contents. The reference is kept safe by the Python reference counting mechanism.
  • 40. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The abstraction responsible for holding the data isn’t actually the Tensor , but the Storage .
  • 41. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The abstraction responsible for holding the data isn’t actually the Tensor , but the Storage . struct C10_API StorageImpl final : (...) { // (...) private: // (...) caffe2::TypeMeta data_type_; DataPtr data_ptr_; int64_t numel_; Allocator* allocator_; }
  • 42. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The abstraction responsible for holding the data isn’t actually the Tensor , but the Storage . struct C10_API StorageImpl final : (...) { // (...) private: // (...) caffe2::TypeMeta data_type_; DataPtr data_ptr_; int64_t numel_; Allocator* allocator_; } Holds a pointer to the raw data and contains information such as the size and allocator; Storage is a dumb abstraction, there is no metadata telling us how to interpret the data it holds;
  • 43. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it;
  • 44. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it; We can have multiple tensors sharing the same storage, but with different interpretations, also called views, but without duplicating memory:
  • 45. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it; We can have multiple tensors sharing the same storage, but with different interpretations, also called views, but without duplicating memory: >>> tensor_a = torch.ones((2, 2)) >>> tensor_b = tensor_a.view(4) >>> tensor_a_data = tensor_a.storage().data_ptr() >>> tensor_b_data = tensor_b.storage().data_ptr() >>> tensor_a_data == tensor_b_data True
  • 46. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TENSOR STORAGE The Storage abstraction is very powerful because it decouples the raw data and how we can interpret it; We can have multiple tensors sharing the same storage, but with different interpretations, also called views, but without duplicating memory: >>> tensor_a = torch.ones((2, 2)) >>> tensor_b = tensor_a.view(4) >>> tensor_a_data = tensor_a.storage().data_ptr() >>> tensor_b_data = tensor_b.storage().data_ptr() >>> tensor_a_data == tensor_b_data True tensor_b is a different view (interpretation) of the same data present in the underlying storage that is shared between both tensors.
  • 47. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A MEMORY ALLOCATORS (CPU/GPU) The tensor storage can be allocated either in the CPU memory or GPU, therefore a mechanism is required to switch between these different allocations:
  • 48. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A MEMORY ALLOCATORS (CPU/GPU) The tensor storage can be allocated either in the CPU memory or GPU, therefore a mechanism is required to switch between these different allocations: struct Allocator { virtual ~Allocator() {} virtual DataPtr allocate(size_t n) const = 0; virtual DeleterFnPtr raw_deleter() const {...} void* raw_allocate(size_t n) {...} void raw_deallocate(void* ptr) {...} }; There are Allocator s that will use the GPU allocators such as cudaMallocHost() when the storage should be used for the GPU or posix_memalign() POSIX functions for data in the CPU memory.
  • 49. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A THE BIG PICTURE (object fields) Storage *storage objectTensor Allocator *allocator (object fields) DataPtr data_ptr objectStorage raw_deallocate() (object fields) raw_allocate() objectAllocator Raw Data The Tensor has a Storage which in turn has a pointer to the raw data and to the Allocator to allocate memory according to the destination device.
  • 50. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section II JIT
  • 51. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER PyTorch is eager by design, which means that it is easily hackable to debug, inspect, etc;
  • 52. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER PyTorch is eager by design, which means that it is easily hackable to debug, inspect, etc; However, this poses problems for optimization and for decoupling it from Python (the model itself is Python code);
  • 53. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER PyTorch is eager by design, which means that it is easily hackable to debug, inspect, etc; However, this poses problems for optimization and for decoupling it from Python (the model itself is Python code); PyTorch 1.0 introduced torch.jit , which has two main methods to convert a PyTorch model to a serializable and optimizable format; TorchScript was also introduced as a statically-typed subset of Python;
  • 54. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A JIT - JUST-IN-TIME COMPILER Two very different worlds with their own requirements. Prototype, debug, train, experiment EAGER MODE Optimization, other languages, deployment SCRIPT MODE tracing scripting
  • 55. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING def my_function(x): if x.mean() > 1.0: r = torch.tensor(1.0) else: r = torch.tensor(2.0) return r
  • 56. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING def my_function(x): if x.mean() > 1.0: r = torch.tensor(1.0) else: r = torch.tensor(2.0) return r >>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2)))
  • 57. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING def my_function(x): if x.mean() > 1.0: r = torch.tensor(1.0) else: r = torch.tensor(2.0) return r >>> ftrace = torch.jit.trace(my_function, (torch.ones(2, 2))) >>> ftrace.graph graph(%x : Float(2, 2)) { %4 : Float() = prim::Constant[value={2}]() %5 : Device = prim::Constant[value="cpu"]() %6 : int = prim::Constant[value=6]() %7 : bool = prim::Constant[value=0]() %8 : bool = prim::Constant[value=0]() %9 : Float() = aten::to(%4, %5, %6, %7, %8) %10 : Float() = aten::detach(%9) return (%10); }
  • 58. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING To call the JIT’ed function, just call the forward() method: >>> x = torch.ones(2, 2) >>> ftrace.forward(x) tensor(2.)
  • 59. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A TRACING To call the JIT’ed function, just call the forward() method: >>> x = torch.ones(2, 2) >>> ftrace.forward(x) tensor(2.) However, tracing will not record any control-flow like if statements or loops, it executes the code with the given context and creates the graph. You can see this limitation below: >>> x = torch.ones(2, 2).add_(1.0) >>> ftrace.forward(x) tensor(2.) According to my_function() , result should have been 1.0. Tracing also checks for differences between traced and Python function, but what about Dropout ?
  • 60. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING Another alternative is to use scripting, where you can use decorators such as @torch.jit.script : @torch.jit.script def my_function(x): if bool(x.mean() > 1.0): r = 1 else: r = 2 return r
  • 61. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING >>> my_function.graph graph(%x : Tensor) { %2 : float = prim::Constant[value=1]() %5 : int = prim::Constant[value=1]() %6 : int = prim::Constant[value=2]() %1 : Tensor = aten::mean(%x) %3 : Tensor = aten::gt(%1, %2) %4 : bool = prim::Bool(%3) %r : int = prim::If(%4) block0() { -> (%5) } block1() { -> (%6) } return (%r); }
  • 62. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING The my_function() is now a ScriptModule : >>> type(my_function) torch.jit.ScriptModule
  • 63. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING The my_function() is now a ScriptModule : >>> type(my_function) torch.jit.ScriptModule When we check the results again: >>> x = torch.ones(2, 2) >>> my_function(x) 2
  • 64. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SCRIPTING The my_function() is now a ScriptModule : >>> type(my_function) torch.jit.ScriptModule When we check the results again: >>> x = torch.ones(2, 2) >>> my_function(x) 2 >>> x = torch.ones(2, 2).add_(1.0) >>> my_function(x) 1 Control-flow logic was preserved !
  • 65. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well;
  • 66. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime;
  • 67. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime; Use it in production with C++ (no GIL) or other languages;
  • 68. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime; Use it in production with C++ (no GIL) or other languages; Capitalize on optimizations (whole program);
  • 69. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A WHY TORCHSCRIPT ? The concept of having a well-defined Intermediate Representation (IR) is very powerful, it’s the main concept behind LLVM platform as well; This opens the door to: Decouple the model (computationl graph) from Python runtime; Use it in production with C++ (no GIL) or other languages; Capitalize on optimizations (whole program); Split the development world of hackable and easy to debug from the world of putting these models in production and optimize them.
  • 70. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BUILDING THE IR To build the IR, PyTorch takes leverage of the Python Abstract Syntax Tree (AST) which is a tree representation of the syntactic structure of the source code. >>> ast_mod = ast.parse("print(1 + 2)") >>> astpretty.pprint(ast_mod.body[0], show_offsets=False) Expr( value=Call( func=Name(id='print', ctx=Load()), args=[ BinOp( left=Num(n=1), op=Add(), right=Num(n=2), ), ], keywords=[], ), )
  • 71. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BUILDING THE IR print(1 + 2)
  • 72. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A PYTORCH JIT PHASES Parsing Checking Optimization Translation Execution○ AST Code or
  • 73. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A EXECUTING Just like Python interpreter executes your code, PyTorch has a interpreter that executes the IR instructions: bool runImpl(Stack& stack) { auto& instructions = function->instructions; size_t last = instructions.size(); while (pc < last) { auto& inst = instructions[pc]; try { loadTensorsFromRegisters(inst.inputs, stack); size_t new_pc = pc + 1 + inst.callback(stack); for (int i = inst.outputs.size - 1; i >= 0; --i) { int reg = get(inst.outputs, i); registers[reg] = pop(stack); } pc = new_pc; // (...) omitted
  • 74. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Many optimizations can be used on the computational graph of the model, such as Loop Unrolling: for i.. i+= 1 for i.. i+= 4 for j.. for j.. code(i, j) code(i, j) code(i+1, j) code(i+2, j) code(i+3, j) remainder loop
  • 75. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Also Peephole optimizations such as: x.t().t() = x
  • 76. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Also Peephole optimizations such as: x.t().t() = x Example: def dumb_function(x): return x.t().t() >>> traced_fn = torch.jit.trace(dumb_function, ... torch.ones(2,2)) >>> traced_fn.graph_for(torch.ones(2,2)) graph(%x : Float(*, *)) { return (%x); }
  • 77. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A OPTIMIZATIONS Also Peephole optimizations such as: x.t().t() = x Example: def dumb_function(x): return x.t().t() >>> traced_fn = torch.jit.trace(dumb_function, ... torch.ones(2,2)) >>> traced_fn.graph_for(torch.ones(2,2)) graph(%x : Float(*, *)) { return (%x); } Other optimizations include Constant Propagation, Dead Code Elimination (DCE), fusion, inlining, etc.
  • 78. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION >>> resnet = torch.jit.trace(models.resnet18(), ... torch.rand(1, 3, 224, 224)) >>> resnet.save("resnet.pt")
  • 79. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION >>> resnet = torch.jit.trace(models.resnet18(), ... torch.rand(1, 3, 224, 224)) >>> resnet.save("resnet.pt") $ file resnet.pt resnet.pt: Zip archive data
  • 80. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION >>> resnet = torch.jit.trace(models.resnet18(), ... torch.rand(1, 3, 224, 224)) >>> resnet.save("resnet.pt") $ file resnet.pt resnet.pt: Zip archive data $ unzip resnet.pt Archive: resnet.pt extracting: resnet/version extracting: resnet/code/resnet.py extracting: resnet/model.json extracting: resnet/tensors/0 (...)
  • 81. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION code/resnet.py op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: input_2 = torch._convolution(input_1, self.conv1.weight, ...) # (...) input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias, self.bn1.running_mean, self.bn1.running_var, ...) # (...)
  • 82. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION code/resnet.py op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: input_2 = torch._convolution(input_1, self.conv1.weight, ...) # (...) input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias, self.bn1.running_mean, self.bn1.running_var, ...) # (...) model.json {"parameters": [{ "isBuffer": false, "tensorId": "1", "name": "weight" }], "name": "conv1", "optimize": true}
  • 83. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A SERIALIZATION code/resnet.py op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: input_2 = torch._convolution(input_1, self.conv1.weight, ...) # (...) input_3 = torch.batch_norm(input_2, self.bn1.weight, self.bn1.bias, self.bn1.running_mean, self.bn1.running_var, ...) # (...) model.json {"parameters": [{ "isBuffer": false, "tensorId": "1", "name": "weight" }], "name": "conv1", "optimize": true} model.json [{"isBuffer": true, "tensorId": "4", "name": "running_mean"}, {"isBuffer": true, "tensorId": "5", "name": "running_var"}], "name": "bn1", "optimize": true}
  • 84. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A USING THE MODEL IN C++ PyTorch also has a C++ API that you can use to load/train models in C++. This is good for production, mobile, embedded devices, etc. Example of loading a traced model in PyTorch C++ API: #include <torch/script.h> int main(int argc, const char* argv[]) { auto module = torch::jit::load("resnet.pt"); std::vector<torch::jit::IValue> inputs; inputs.push_back(torch::ones({1, 3, 224, 224})); at::Tensor output = module->forward(inputs).toTensor(); }
  • 85. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A USING THE MODEL IN NODEJS Complete tutorial at https://goo.gl/7wMJuS.
  • 86. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section III PRODUCTION
  • 87. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed);
  • 88. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc;
  • 89. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc; They often use HTTP/1;
  • 90. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc; They often use HTTP/1; They seldom do batching (important for GPUs);
  • 91. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A ISSUES WITH TUTORIALS Be careful with online tutorials using Flask, etc. They are simple, but they often fail on good practices: They often use JSON and base64 to serialize images. This adds ∼ 33% overhead per call (uncompressed); They don’t pay attention to zero-copy practices, so they often transform, reshape, convert to numpy, convert to PyTorch, etc; They often use HTTP/1; They seldom do batching (important for GPUs); They never put that "production" code in production.
  • 92. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A PREFER BINARY SERIALIZATION FORMATS Prefer using good binary serialization methods such as Protobuf that offers a schema and a schema evolution mechanism. Example from EuclidesDB RPC message: message AddImageRequest { int32 image_id = 1; bytes image_data = 2; // This field can encode JSON data bytes image_metadata = 3; repeated string models = 4; } * http://euclidesdb.readthedocs.io
  • 93. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A AVOID EXTRA COPIES Be careful to avoid extra copies of your tensors, especially during pre-processing;
  • 94. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A AVOID EXTRA COPIES Be careful to avoid extra copies of your tensors, especially during pre-processing; You can use in-place operations. It is a functional anti-pattern because it introduces side-effects, but it’s a fair price to pay for performance;
  • 95. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A AVOID EXTRA COPIES Be careful to avoid extra copies of your tensors, especially during pre-processing; You can use in-place operations. It is a functional anti-pattern because it introduces side-effects, but it’s a fair price to pay for performance; Caveat: in-place operations doesn’t make much sense when you need gradients. PyTorch uses tensor versioning to catch that: >>> a = torch.tensor(1.0, requires_grad=True) >>> y = a.tanh() >>> y.add_(2.0) >>> y.backward() # error ! >>> a._version 0 >>> y._version 1
  • 96. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0
  • 97. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining
  • 98. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL
  • 99. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL Client Server Time HTTP 2.0 - Multiplexing
  • 100. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL Client Server Time HTTP 2.0 - Multiplexing Use HTTP 2.0 if possible, and avoid the head-of-line blocking;
  • 101. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A A TALE OF TWO HTTPS Client Server Time HTTP 1.0 Client Server Time HTTP 1.1 - Pipelining Client Server Time HTTP 1.1 - HoL Client Server Time HTTP 2.0 - Multiplexing Use HTTP 2.0 if possible, and avoid the head-of-line blocking; Even better, you can use frameworks such as gRPC that uses HTTP/2.0 and Protobuf.
  • 102. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BATCHING Batching data is a way to amortize the performance bottleneck. GPU Non-batching Requests
  • 103. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A BATCHING Batching data is a way to amortize the performance bottleneck. GPU Non-batching Requests GPU Batch 2 Batch 1 BatchingRequests
  • 104. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Section IV Q&A
  • 105. PyTorch under the hood - Christian S. Perone (2019) TENSORS JIT PRODUCTION Q&A Q&A Thanks !