SlideShare a Scribd company logo
1 of 64
Download to read offline
FROMRENDERMAN22.0®tonext-genrendermanXPU
andbeyond:RoleofOPENshadinglanguage(OSL)
withIntel®Advancedvectorextensions
(Intel®AVX-512) Presenters: Steena Monteiro (Intel) and Max Liani (Pixar
Animation Studios)
Contributors: Alex M. Wells (Intel), Steena Monteiro (Intel),
Louis Feng (Intel),
Max Liani (Pixar Animation Studios), Stephen Friedman (Pixar
Animation Studios),
Larry Gritz (Sony Pictures Imageworks)
• This document contains information on products, services and/or processes in development. All information provided here is subject to change without
notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
• Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance
varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at
intel.com.
• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks
•
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes.
Any differences in your system hardware, software or configuration may affect your actual performance.
• Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm
whether referenced data are accurate.
• Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are
intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer
to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
• Intel, Xeon and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
• *Other names and brands may be claimed as the property of others
• © Intel Corporation.
Legal Disclaimers and Optimization Notices
2
Shading in Physically Based Rendering
3
Image credit Sony Pictures Imageworks
Shading Network
• Multiple reusable shading
nodes
• Connect nodes to define
complex materials
• Production shading
networks can grow very
large to 100s, 1000s of
nodes.
4
C++ Shader Limitations
• Lack of context at compile time
• Input parameters unknown
• Geometry being shaded
unknown
• Mode of shading unknown
• Surrounding shading
network unknown
• Branchy testing required
• Lack of portability
• Requires “Performance Ninjas”
Image Credit: Ninja Working AT Desk from Vector.me (by Hector Gomez)
5
Open Shading
Language
• Developed by Sony Pictures Imageworks*
• C-like DSL for programmable shading
• API to connect shaders into networks
• Open source
• http://github.com/imageworks/OpenShadingLanguage
• Sci-Tech Award* in 2017
Logo owned by Academy of Motion Picture Arts and Sciences for Infobox
*Other names and brands may be claimed as the property of others.
6
Poster images (c) Sony Pictures*, Paramount*, Warner
Brothers*, Disney*, Fox*, Universal*
7
Example OSL Shader
shader marble (color Cin = .5,
float freq = 1.0,
output color Cout = 0)
{
float sum = 0;
float freqVal = freq;
point Pshad = transform ("object", P);
for (int i = 0; i < 6; i++)
{
sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ;
freqVal = 2 * freqVal;
}
Cout = Cin * sum;
}
Shader
Globals
(input set by renderer)
Library Calls
8
Motivation for SIMD Open Shading
Language
In its native form, OSL is
unable to leverage Intel®
Advanced Vector
Extensions (Intel® AVX-
512) on Intel® Xeon®
Intel has been leading the
re-architecture of OSL
since 2016
Image © Disney/Pixar
9
*Other names and brands may be claimed as the property of others.
oslc
Offline
compiler
Shader
Written in OSL
Intermediate OSO
(Instructions + operands)
Renderer
(Pixar’s RenderMan*, Autodesk Arnold*, Blender*)
Scene Management
Ray Tracing/Path Tracing
Light Integration
OSL Runtime
Build
Shading
Network
callbacks
Execute
Shading
Network
(per Point)
Optimized
x86-64
QueryOutputs
*Other names and brands may be claimed as the property of others.
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Pre-
compiled
library
functions
OSL Framework
Renderer Shading System
execute(ShaderGlobals,…)
symbol_address(…)
execute_batch(ShaderGlobalsBatch, …)
Wide<T>(symbol_address)
Submit Single Point
Query Results
Submit Batch
of Points
Query Batch of
Results
ShaderGlobalsBatch
Uniform:
context *’s
Raytype
…
Queue of Varying:
Surface Position
Incident Ray
Surface Normal
…
ShaderGlobals
New “Batched” Interface
SIMD OSL’s Batched Interface
11
Renderer
(Pixar’s RenderMan*, Autodesk Arnold*, Blender*)
Scene Management
Ray Tracing/Path Tracing
Light Integration
SIMD OSL Runtime
callbacks
Execute
Shading
Network
(per Point)
Optimized Intel®
AVX-512, AVX2,
or AVX
QueryOutputs
*Other names and brands may be claimed as the property of others.
Render Time
Optimization
With
LLVM* Wide JIT
(Just In Time Compilation)
Pre-compiled
library
functions
Intel® AVX-
512
SIMD OSL Framework
Pre-compiled
library
functions
Intel® AVX2
Pre-compiled
library
functions
Intel® AVX
12
Components in
SIMD OSL Render-time
Optimized x86-64
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
13
*Other names and brands may be claimed as the property of others.
my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Accessors
transparent
AOS view of SOA
SIMD OSL’s Wide Library
14
my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Accessors
transparent
AOS view of SOA
Extract data
from a lane
of the SOA
SIMD OSL’s Wide Library
15
my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Array subscript returns a
proxy object to that lane
Accessors
transparent
AOS view of SOA
Extract data
from a lane
of the SOA
SIMD OSL’s Wide Library
16
my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Array subscript returns a
proxy object to that lane
Accessors
transparent
AOS view of SOA
Extract data
from a lane
of the SOA
Skips assignment if lane masked off
SIMD OSL’s Wide Library
17
Components in
SIMD OSL Render-time
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flows
Optimized x86-64
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
18
*Other names and brands may be claimed as the property of others.
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
Effective mask
(result of combining stack)
Divergent Control Flows
19
Stack of masks
PUSH
Effective mask
(result of combining stack)
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Divergent Control Flows
20
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
PUSH
Effective mask
(result of combining stack)
Divergent Control Flows
21
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
PUSH
Effective mask
(result of combining stack)
Divergent Control Flows
22
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective mask
(result of combining stack)
Divergent Control Flows
23
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
NEGATE
Stack of masks
Effective mask
(result of combining stack)
PUSH
Divergent Control Flows
24
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective mask
(result of combining stack)
Divergent Control Flows
25
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective mask
(result of combining stack)
Divergent Control Flows
26
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective of mask
(result of combining stack)
Divergent Control Flows
27
Components in
SIMD OSL Render-time
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flow
Vectorized IR
Generation
Optimized x86-64
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
28
*Other names and brands may be claimed as the property of others.
General LLVM Code Flow for
OSL Operations
OSL
Retrieve symbols for
Operands
Emit LLVM-defined operations
OR
Call appropriate functions
Store Result
29
What changes in SIMD OSL
OSL
Retrieve symbols for
Operands
Load values
Initialize values
Emit LLVM-defined operations
OR
Call appropriate functions
Store Result
30
OperandsàUniform
ResultsàUniform
OperandsàUniform
ResultsàVarying
OperandsàVarying
ResultsàUniform
OperandsàVarying
ResultsàVarying
What changes in SIMD OSL
31
SIMD OSL
Retrieve symbols for
Operands
Call uniform
function
Store Result
OperandsàUniform
ResultsàUniform
What changes in SIMD OSL
32
SIMD OSL
Retrieve symbols for
Operands
Call uniform
function
Widen Result
Store Result
OperandsàUniform
ResultsàVarying
What changes in SIMD OSL
33
SIMD OSL
Retrieve symbols for
Operands
Add effective mask to
arguments
Call varying function
Add address for
Results to arguments
OperandsàVarying
ResultsàVarying
What changes in SIMD OSL
34
SIMD OSL
Retrieve symbols for
Operands
Add effective mask to
all arguments
Call varying function
Add address for
Results to arguments
Allocate a varying
temp
Widen uniform
Operands and store to
varying temp
OperandsàUniform,
and Varying
ResultsàVarying
What changes in SIMD OSL
35
Unreachable
OperandsàVarying
ResultsàUniform
Components in
SIMD OSL Render-time
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flow
Vectorized IR
Generation
“For-each-
unique”
algorithm
Optimized x86-64
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
36
*Other names and brands may be claimed as the property of others.
layer =
file =
Mask =
wrap =
3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4
For-Each-Unique Algorithm
if (layer == 1) file = “r.tex”;
if (layer == 2) file = “g.tex”;
if (layer == 3) file = “r.tex”;
if (layer == 4) file = “g.tex”;
wrap_mode = (layer%2==0)?“clamp”:“mirror”;
texture(file, u, v, “wrap”,wrap_mode );
37
layer =
file =
Mask =
wrap =
3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4
JIT’d
Binning
For-Each-Unique Algorithm
38
if (layer == 1) file = “r.tex”;
if (layer == 2) file = “g.tex”;
if (layer == 3) file = “r.tex”;
if (layer == 4) file = “g.tex”;
wrap_mode = (layer%2==0)?“clamp”:“mirror”;
texture(file, u, v, “wrap”,wrap_mode );
layer =
file =
Mask =
wrap =
3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4
JIT’d
Binning
For-Each-Unique Algorithm
Full flexibility
BatchedRendererServices
1st Pass
texture(“r.tex”,”mirror”,…);
39
if (layer == 1) file = “r.tex”;
if (layer == 2) file = “g.tex”;
if (layer == 3) file = “r.tex”;
if (layer == 4) file = “g.tex”;
wrap_mode = (layer%2==0)?“clamp”:“mirror”;
texture(file, u, v, “wrap”,wrap_mode );
layer =
file =
Mask =
wrap =
3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4
JIT’d
Binning
For-Each-Unique Algorithm
Full flexibility
BatchedRendererServices
1st Pass
texture(“r.tex”,”mirror”,…);
2nd Pass
texture(“g.tex”,”clamp”,…);
40
if (layer == 1) file = “r.tex”;
if (layer == 2) file = “g.tex”;
if (layer == 3) file = “r.tex”;
if (layer == 4) file = “g.tex”;
wrap_mode = (layer%2==0)?“clamp”:“mirror”;
texture(file, u, v, “wrap”,wrap_mode );
Components in
SIMD OSL Render-time
Optimized x86
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flows
Vectorized IR
Generation
“For-each-
unique”
algorithm
SIMD OSL
built-ins
41
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
*Other names and brands may be claimed as the property of others.
42
Scalar computation
with
Scalar data types
Block Vectorization
with intrinsics
template<int WidthT> void operator() (MaskedAccessor<float, WidthT> wresult,
ConstWideAccessor<Vec3, WidthT> wp) const {
#pragma forceinline recursive
{
#pragma omp simd simdlen(WidthT)
for(int l=0; l< WidthT; ++l) {
Vec3 p = wp[l];
float perlinResult;
HashScalar h;
perlin_scalar(perlinResult, h, p.x, p.y, p.z);
float scaledResult = 0.5f * (perlinResult + 1.0f);
wresult[l] = scaledResult;
}
}
}
inline void operator() (float &result, const Vec3 &p) const
{
HashScalar h;
perlin(result, h, p.x, p.y, p.z);
result = 0.5f * (result + 1.0f);
}
Explicit
Outer Loop
Vectorization
(Intel® C++ Compiler)
(Clang 5+)
SIMD OSL’s Perlin Noise
OSL Microbenchmarks: Speedup of
SIMD AVX-512 OSL over Scalar OSL
0.125
0.25
0.5
1
2
4
8
16
null
sin cos tan
asin
acos
atan
sinh
cosh
tanh
atan2
sincos
log
log2
log10
logb
exp
exp2
expm1
pow
erf
erfc
radians
degrees
sqrt
inversesqrt
hypot
abs
fabs
sign
floor
ceil
roundtruncmod
min
maxclampmix
isnan
isfinite
select
dot
cross
length
distance
normalize
reflect
fresnel
rotate
transform
transform_matrix
matrix_object_camera
determinant
transpose
linearstep
smooth_linearstep
noise_perlin
noise_cell
noise_simplex
noise_gabor
pnoise_perlin
pnoise_cell
pnoise_gabor
spline_bezier
spline_bspline
spline_catmull-rom
spline_hermitespline_linearspline_constant
48 threads on Intel(R) Xeon(R) Platinum 8260L CPU @2.30GHz (config 2)
Average: 6.9x
Geomean: 6.14x
43
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
OSL SIMD Performance at Maximum
Batch Utilization
OSL’s testshade running Intel® AVX-512® on 48 threads of
Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1)
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
leopard concrete diamond oak marble
Speedupatmaxbatchsize
5.2x
6x
10x
12x
15x
44
*Other names and brands may be claimed as the property of others.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
SIMD OSL Intel® AVX-512 VS AVX2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
leopard concrete diamond plate oak marble thread donut
Speedup
1.6x 1.9x
1.1x
OSL’s testshade running Intel® AVX-512 and AVX2 on 48 threads of
Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1)
1.3x 1.3x
1.4x
1.8x
45
*Other names and brands may be claimed as the property of others.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Evolution of SIMD OSL—Proof of
Concept to Production 2016‒2019
SIMD OSL
Library
SIMD OSL
Framework
SIMD OSL
Performance
Intel® AVX-512,
AVX2, AVX-specific
libraries
Masking and scatter-
gather
17k+ tests
Improved
performance on
built-in functions
Compiler + platform
support
Reduction in JIT
time
Coverage for built-in
function variants
Handling
treacherous control
flows
Noise functions
with options
LLVM optimization
passes to improve
AVX2
46
SIMD Open Shading
Language
Open Shading
Language
https://github.com/imageworks/OpenShadingLanguage
https://gitlab.com/intel-osl/BatchedOSL
47
This Page Intentionally Left Blank
48
Intel® AVX-512 Performance
Vs Batch Utilization
marble
oak
diamond
concrete
leopard
0
5
10
15
batch 1 batch 2 batch 3 batch 4 batch 5 batch 6 batch 7 batch 8 batch 9 batch 10 batch 11 batch 12 batch 13 batch 14 batch 15 batch 16
Speedupfrombatching
Performance gain with increased batch utilization
15x
12x
10x
6x
5.2x
OSL’s testshade running Intel® AVX-512® on 48 threads of
Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1)
49
*Other names and brands may be claimed as the property of others.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
22.4 Shading Speedup
with SIMD OSL
50
1
1.2
1.4
1.6
1.8
2
2.2
Bonnie’s room Fillmore Bonnie
Speedup
CLX8260L (24c, 2.3GHz)
1.26x
1.37x
2.06x
Image © Disney/Pixar
Image © Disney/Pixar
Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2)
*Other names and brands may be claimed as the property of others.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
22.4’s Overall Rendering
Speedup with SIMD OSL
51
1
1.05
1.1
1.15
1.2
1.25
1.3
Bonnie’s room Fillmore Bonnie
Speedup
CLX8260L (24c, 2.3GHz)
1.11x
1.17x
1.27x
*Other names and brands may be claimed as the property of others.
Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2)
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Bonnie
• Real production character with 55 shader networks
• 85663 shader operations on 67680 symbols (post-optimization)
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
52
Single Point Batched
Amdahl’s
Law
66.64%
Batch
Utilization
2.05x Shading
Speedup
Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2)
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Performance Progression
3 factors at play:
● Efficiency of the generated vectorized shader code
● Effective vectorization of the shading interface
● How effective is the renderer in taking advantage
of the vectorized shading language
53
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Efficiency in the shading language
Most effort up to now on the quality
of the shader code generation
● Masked control flow for
vectorized execution
● Optimization of noises and math
functions
● Optimization of texture calls.
54Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Efficiency in the Shading API
55
The shading language calls into the renderer
● To access data, primvars, tranforms, etc…
● To compute things, texture interpolation, trace rays,
etc…
● To return values
● All of the above is nicely vectorized (batched)
● We call across the API boundaries fewer times
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Efficiency in the Renderer
56
We started with a vectorized renderer
● RIS is one of the few vectorized renderers in
the industry that works on ray batches
● It turns out that our batch granularity is not
enabling effective vectorization
● Results we see today are a fraction of the
benefit we would get.
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Efficiency in the Renderer
What is efficient?
● Portions of the renderer where execution is coherent
● Displacement shading
● Camera rays hits
What is inefficient?
● Indirect illumination
● Deep bounces
57
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Efficiency in the Renderer
58
*Other names and brands may be claimed as the property of others.
1 point
2 points
3 points
4 points
5 points
6 points
7 points
8 points
9 points
10 points
11 points
12 points
13 points
14 points
15 points
16 points
0
10
20
30
40
50
60
70
80
1 Bounce 2 Bounces 3 Bounces 5 Bounces 9 Bounces
7.3%
13.9%
18.9%
22.3%
25.4%
76.6%
67.1%
60.9%
56.5%
52.6%
%ofBatchesSubmitted
Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148
@2.4Ghz (config 4)
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Efficiency in the Renderer
How do we currently accomodate for low occupancy?
● We switch over single point evaluation for small batches.
● We use some heuristic to determine when to switch.
● A threshold point of 4 active lanes tends to be a decent starting point.
● This may change as more optimizations are done
● However it would be best to guarantee high SIMD occupancy
59
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Towards a new Rendering Architecture
Batches are currently determined by the size of bucket rendering
● Computational workload is uneven throughout the image
● Larger buckets gives more points, higher occupancy
● Larger buckets means one thread may be stuck rendering a single heavy
buckets for long time, reducing thread scaling
● Decent bucket size for good thread load balancing is 8x8 or 16x16.
● This is a batch size of 64-256.
● We would need 2k-8k batch size at least.
60
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Different options at hand
● Wavefront rendering
● Shading queues
● Non image-space decomposition scheduling
● The new architecture in being implemented in Pixar’s Renderman® XPU
● Stay tuned
61
Towards a new Rendering Architecture
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
OSL Shaders
• Concrete - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/concrete.osl
• Modifications:
• Leopard - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/leopard.osl
• Diamond plate - https://github.com/varkenvarken/osl-
shaders/blob/master/Shaders/diamondplateshader.osl
• Thread - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN-
Experimental/Threads.osl
• Donut - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN-
Experimental/TheDonutShader.osl
• Oak – https://renderman.pixar.com/forum/download.php
• Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/oak.osl
• Marble - https://renderman.pixar.com/forum/download.php
• Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/marble.osl
< float
grain=noise("gabor",p,8,"bandwidth",4,"anisotropic",2,"direction",vector(SandDensity,0
,0));
---
> float grain=noise("gabor",p,8);
*Other names and brands may be claimed as the property of others.
62
63
Config 1 Config 2 Config 3 Config 4
Model name
Intel(R) Xeon(R) Platinum 8260L CPU @
2.40GHz
Intel(R) Xeon(R) Platinum 8260L CPU
@ 2.30GHz
Intel(R) Xeon(R) CPU E5-2697 v4 @
2.30GHz
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Core(s) per socket24 24 18 20
Socket(s)2 2 2 2
Memory192GB, DDR4-2933 Mhz (12 x 16GB) 192GB, DDR4-2933 Mhz (12 x 16GB) 128GB, DDR4-2400 MHz (8 x 16GB)
192GB, DDR4-2666 Mhz (12 x 16GB)
CPU Power PolicyPerformance Performance Performance Powersave
HyperthreadingDisabled Enabled Enabled Enabled
Turbo Boost TechEnabled Enabled Enabled Enabled
L1d cache32K 32K 32K 32K
L1i cache32K 32K 32K 32K
L2 cache1024K 1024K 256K 1024K
L3 cache36608K 33792K 46080K 28160K
Operating SystemFedora release 27 (Twenty Seven) CentOS Linux release 7.6.1810 (Core)
Red Hat Enterprise Linux Server release
7.2 (Maipo)
CentOS Linux release 7.3.1611 (Core)
Bios Version
SE5C620.86B.0D.01.0286.0111201908
16
SE5C620.86B.0D.01.0395.022720191
340
GRRFSDP1.86B0271.R00.1510301446
SE5C620.86B.01.00.0412.020920172159
Configurations
• Subtitle Copy Goes Here

More Related Content

What's hot

Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialRalph Schlosser
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning GlossaryNVIDIA
 
IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...
IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...
IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...IRJET Journal
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning ANKUSH PAL
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnDebarko De
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection methodAmir Razmjou
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Computational Complexity: Introduction-Turing Machines-Undecidability
Computational Complexity: Introduction-Turing Machines-UndecidabilityComputational Complexity: Introduction-Turing Machines-Undecidability
Computational Complexity: Introduction-Turing Machines-UndecidabilityAntonis Antonopoulos
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Alessio Tonioni
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in VideosNAVER Engineering
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 

What's hot (20)

Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning Glossary
 
IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...
IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...
IRJET-Performance Analysis of Liver Disease Prediction using Machine Learning...
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
 
Computational Complexity: Introduction-Turing Machines-Undecidability
Computational Complexity: Introduction-Turing Machines-UndecidabilityComputational Complexity: Introduction-Turing Machines-Undecidability
Computational Complexity: Introduction-Turing Machines-Undecidability
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in Videos
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 

Similar to RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vector Extensions | SIGGRAPH 2019 Technical Sessions

Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Intel® Software
 
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Intel® Software
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Intel® Software
 
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...Intel® Software
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Intel® Software
 
Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...
Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...
Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...christopherfairbairn
 
01 foundations
01 foundations01 foundations
01 foundationsankit_ppt
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlabNational Cheng Kung University
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architecturespsteinb
 
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Robb Boyd
 
Unlocking the SDN and NFV Transformation
Unlocking the SDN and NFV TransformationUnlocking the SDN and NFV Transformation
Unlocking the SDN and NFV TransformationOpen Networking Summits
 
Web of Technologies
Web of TechnologiesWeb of Technologies
Web of Technologiesdynamis
 
Web Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdfWeb Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdfSamHoney6
 
Developing a Windows CE OAL.ppt
Developing a Windows CE OAL.pptDeveloping a Windows CE OAL.ppt
Developing a Windows CE OAL.pptKundanSingh887495
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesIntel® Software
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfSamiraKids
 

Similar to RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vector Extensions | SIGGRAPH 2019 Technical Sessions (20)

Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
 
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
 
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
 
Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...
Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...
Christchurch Embedded .NET User Group - Introduction to Microsoft Embedded pl...
 
01 foundations
01 foundations01 foundations
01 foundations
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architectures
 
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
 
VLSI
VLSIVLSI
VLSI
 
Unlocking the SDN and NFV Transformation
Unlocking the SDN and NFV TransformationUnlocking the SDN and NFV Transformation
Unlocking the SDN and NFV Transformation
 
Web of Technologies
Web of TechnologiesWeb of Technologies
Web of Technologies
 
Web Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdfWeb Template Mechanisms in SOC Verification - DVCon.pdf
Web Template Mechanisms in SOC Verification - DVCon.pdf
 
Developing a Windows CE OAL.ppt
Developing a Windows CE OAL.pptDeveloping a Windows CE OAL.ppt
Developing a Windows CE OAL.ppt
 
Rsockets ofa12
Rsockets ofa12Rsockets ofa12
Rsockets ofa12
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android Devices
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
 

More from Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Software
 

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vector Extensions | SIGGRAPH 2019 Technical Sessions

  • 1. FROMRENDERMAN22.0®tonext-genrendermanXPU andbeyond:RoleofOPENshadinglanguage(OSL) withIntel®Advancedvectorextensions (Intel®AVX-512) Presenters: Steena Monteiro (Intel) and Max Liani (Pixar Animation Studios) Contributors: Alex M. Wells (Intel), Steena Monteiro (Intel), Louis Feng (Intel), Max Liani (Pixar Animation Studios), Stephen Friedman (Pixar Animation Studios), Larry Gritz (Sony Pictures Imageworks)
  • 2. • This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. • Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. • Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks • Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. • Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. • Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. • Intel, Xeon and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. • *Other names and brands may be claimed as the property of others • © Intel Corporation. Legal Disclaimers and Optimization Notices 2
  • 3. Shading in Physically Based Rendering 3 Image credit Sony Pictures Imageworks
  • 4. Shading Network • Multiple reusable shading nodes • Connect nodes to define complex materials • Production shading networks can grow very large to 100s, 1000s of nodes. 4
  • 5. C++ Shader Limitations • Lack of context at compile time • Input parameters unknown • Geometry being shaded unknown • Mode of shading unknown • Surrounding shading network unknown • Branchy testing required • Lack of portability • Requires “Performance Ninjas” Image Credit: Ninja Working AT Desk from Vector.me (by Hector Gomez) 5
  • 6. Open Shading Language • Developed by Sony Pictures Imageworks* • C-like DSL for programmable shading • API to connect shaders into networks • Open source • http://github.com/imageworks/OpenShadingLanguage • Sci-Tech Award* in 2017 Logo owned by Academy of Motion Picture Arts and Sciences for Infobox *Other names and brands may be claimed as the property of others. 6
  • 7. Poster images (c) Sony Pictures*, Paramount*, Warner Brothers*, Disney*, Fox*, Universal* 7
  • 8. Example OSL Shader shader marble (color Cin = .5, float freq = 1.0, output color Cout = 0) { float sum = 0; float freqVal = freq; point Pshad = transform ("object", P); for (int i = 0; i < 6; i++) { sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ; freqVal = 2 * freqVal; } Cout = Cin * sum; } Shader Globals (input set by renderer) Library Calls 8
  • 9. Motivation for SIMD Open Shading Language In its native form, OSL is unable to leverage Intel® Advanced Vector Extensions (Intel® AVX- 512) on Intel® Xeon® Intel has been leading the re-architecture of OSL since 2016 Image © Disney/Pixar 9 *Other names and brands may be claimed as the property of others.
  • 10. oslc Offline compiler Shader Written in OSL Intermediate OSO (Instructions + operands) Renderer (Pixar’s RenderMan*, Autodesk Arnold*, Blender*) Scene Management Ray Tracing/Path Tracing Light Integration OSL Runtime Build Shading Network callbacks Execute Shading Network (per Point) Optimized x86-64 QueryOutputs *Other names and brands may be claimed as the property of others. Render Time Optimization With LLVM* JIT (Just In Time Compilation) Pre- compiled library functions OSL Framework
  • 11. Renderer Shading System execute(ShaderGlobals,…) symbol_address(…) execute_batch(ShaderGlobalsBatch, …) Wide<T>(symbol_address) Submit Single Point Query Results Submit Batch of Points Query Batch of Results ShaderGlobalsBatch Uniform: context *’s Raytype … Queue of Varying: Surface Position Incident Ray Surface Normal … ShaderGlobals New “Batched” Interface SIMD OSL’s Batched Interface 11
  • 12. Renderer (Pixar’s RenderMan*, Autodesk Arnold*, Blender*) Scene Management Ray Tracing/Path Tracing Light Integration SIMD OSL Runtime callbacks Execute Shading Network (per Point) Optimized Intel® AVX-512, AVX2, or AVX QueryOutputs *Other names and brands may be claimed as the property of others. Render Time Optimization With LLVM* Wide JIT (Just In Time Compilation) Pre-compiled library functions Intel® AVX- 512 SIMD OSL Framework Pre-compiled library functions Intel® AVX2 Pre-compiled library functions Intel® AVX 12
  • 13. Components in SIMD OSL Render-time Optimized x86-64 Render Time Optimization With LLVM* JIT (Just In Time Compilation) Wide Library Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html; <a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a> 13 *Other names and brands may be claimed as the property of others.
  • 14. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int mask_value) { Mask mask (mask_value); ASSERT(mask.any_on()); Wide<const float> wScale (wS); Wide<const Vec3> wVec (wVec); Wide<const Matrix44> wMat (wM); Masked<Vec3> wVT_result (wVT, mask); Masked<Vec3> wVS_result (wVS, mask); for(int lane = 0; lane < __OSL_WIDTH; ++lane) { Vec3 V = wVec[lane]; Float F = wScale[lane]; Matrix M = wMat[lane]; wVS_result[lane] = V*F; wVT_result[lane] = transform(M,V); } } Accessors transparent AOS view of SOA SIMD OSL’s Wide Library 14
  • 15. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int mask_value) { Mask mask (mask_value); ASSERT(mask.any_on()); Wide<const float> wScale (wS); Wide<const Vec3> wVec (wVec); Wide<const Matrix44> wMat (wM); Masked<Vec3> wVT_result (wVT, mask); Masked<Vec3> wVS_result (wVS, mask); for(int lane = 0; lane < __OSL_WIDTH; ++lane) { Vec3 V = wVec[lane]; Float F = wScale[lane]; Matrix M = wMat[lane]; wVS_result[lane] = V*F; wVT_result[lane] = transform(M,V); } } Accessors transparent AOS view of SOA Extract data from a lane of the SOA SIMD OSL’s Wide Library 15
  • 16. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int mask_value) { Mask mask (mask_value); ASSERT(mask.any_on()); Wide<const float> wScale (wS); Wide<const Vec3> wVec (wVec); Wide<const Matrix44> wMat (wM); Masked<Vec3> wVT_result (wVT, mask); Masked<Vec3> wVS_result (wVS, mask); for(int lane = 0; lane < __OSL_WIDTH; ++lane) { Vec3 V = wVec[lane]; Float F = wScale[lane]; Matrix M = wMat[lane]; wVS_result[lane] = V*F; wVT_result[lane] = transform(M,V); } } Array subscript returns a proxy object to that lane Accessors transparent AOS view of SOA Extract data from a lane of the SOA SIMD OSL’s Wide Library 16
  • 17. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int mask_value) { Mask mask (mask_value); ASSERT(mask.any_on()); Wide<const float> wScale (wS); Wide<const Vec3> wVec (wVec); Wide<const Matrix44> wMat (wM); Masked<Vec3> wVT_result (wVT, mask); Masked<Vec3> wVS_result (wVS, mask); for(int lane = 0; lane < __OSL_WIDTH; ++lane) { Vec3 V = wVec[lane]; Float F = wScale[lane]; Matrix M = wMat[lane]; wVS_result[lane] = V*F; wVT_result[lane] = transform(M,V); } } Array subscript returns a proxy object to that lane Accessors transparent AOS view of SOA Extract data from a lane of the SOA Skips assignment if lane masked off SIMD OSL’s Wide Library 17
  • 18. Components in SIMD OSL Render-time Render Time Optimization With LLVM* JIT (Just In Time Compilation) Wide Library Divergent Control Flows Optimized x86-64 Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html; <a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a> 18 *Other names and brands may be claimed as the property of others.
  • 19. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks Effective mask (result of combining stack) Divergent Control Flows 19
  • 20. Stack of masks PUSH Effective mask (result of combining stack) if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Divergent Control Flows 20
  • 21. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks PUSH Effective mask (result of combining stack) Divergent Control Flows 21
  • 22. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks PUSH Effective mask (result of combining stack) Divergent Control Flows 22
  • 23. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks POP Effective mask (result of combining stack) Divergent Control Flows 23
  • 24. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x NEGATE Stack of masks Effective mask (result of combining stack) PUSH Divergent Control Flows 24
  • 25. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks POP Effective mask (result of combining stack) Divergent Control Flows 25
  • 26. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks POP Effective mask (result of combining stack) Divergent Control Flows 26
  • 27. if (x > 0.5) { ... if (y > 0.5) { … if (powB > 0.23) { … } else { … } } //y } //x Stack of masks POP Effective of mask (result of combining stack) Divergent Control Flows 27
  • 28. Components in SIMD OSL Render-time Render Time Optimization With LLVM* JIT (Just In Time Compilation) Wide Library Divergent Control Flow Vectorized IR Generation Optimized x86-64 Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html; <a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a> 28 *Other names and brands may be claimed as the property of others.
  • 29. General LLVM Code Flow for OSL Operations OSL Retrieve symbols for Operands Emit LLVM-defined operations OR Call appropriate functions Store Result 29
  • 30. What changes in SIMD OSL OSL Retrieve symbols for Operands Load values Initialize values Emit LLVM-defined operations OR Call appropriate functions Store Result 30 OperandsàUniform ResultsàUniform OperandsàUniform ResultsàVarying OperandsàVarying ResultsàUniform OperandsàVarying ResultsàVarying
  • 31. What changes in SIMD OSL 31 SIMD OSL Retrieve symbols for Operands Call uniform function Store Result OperandsàUniform ResultsàUniform
  • 32. What changes in SIMD OSL 32 SIMD OSL Retrieve symbols for Operands Call uniform function Widen Result Store Result OperandsàUniform ResultsàVarying
  • 33. What changes in SIMD OSL 33 SIMD OSL Retrieve symbols for Operands Add effective mask to arguments Call varying function Add address for Results to arguments OperandsàVarying ResultsàVarying
  • 34. What changes in SIMD OSL 34 SIMD OSL Retrieve symbols for Operands Add effective mask to all arguments Call varying function Add address for Results to arguments Allocate a varying temp Widen uniform Operands and store to varying temp OperandsàUniform, and Varying ResultsàVarying
  • 35. What changes in SIMD OSL 35 Unreachable OperandsàVarying ResultsàUniform
  • 36. Components in SIMD OSL Render-time Render Time Optimization With LLVM* JIT (Just In Time Compilation) Wide Library Divergent Control Flow Vectorized IR Generation “For-each- unique” algorithm Optimized x86-64 Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html; <a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a> 36 *Other names and brands may be claimed as the property of others.
  • 37. layer = file = Mask = wrap = 3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4 For-Each-Unique Algorithm if (layer == 1) file = “r.tex”; if (layer == 2) file = “g.tex”; if (layer == 3) file = “r.tex”; if (layer == 4) file = “g.tex”; wrap_mode = (layer%2==0)?“clamp”:“mirror”; texture(file, u, v, “wrap”,wrap_mode ); 37
  • 38. layer = file = Mask = wrap = 3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4 JIT’d Binning For-Each-Unique Algorithm 38 if (layer == 1) file = “r.tex”; if (layer == 2) file = “g.tex”; if (layer == 3) file = “r.tex”; if (layer == 4) file = “g.tex”; wrap_mode = (layer%2==0)?“clamp”:“mirror”; texture(file, u, v, “wrap”,wrap_mode );
  • 39. layer = file = Mask = wrap = 3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4 JIT’d Binning For-Each-Unique Algorithm Full flexibility BatchedRendererServices 1st Pass texture(“r.tex”,”mirror”,…); 39 if (layer == 1) file = “r.tex”; if (layer == 2) file = “g.tex”; if (layer == 3) file = “r.tex”; if (layer == 4) file = “g.tex”; wrap_mode = (layer%2==0)?“clamp”:“mirror”; texture(file, u, v, “wrap”,wrap_mode );
  • 40. layer = file = Mask = wrap = 3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1 4 JIT’d Binning For-Each-Unique Algorithm Full flexibility BatchedRendererServices 1st Pass texture(“r.tex”,”mirror”,…); 2nd Pass texture(“g.tex”,”clamp”,…); 40 if (layer == 1) file = “r.tex”; if (layer == 2) file = “g.tex”; if (layer == 3) file = “r.tex”; if (layer == 4) file = “g.tex”; wrap_mode = (layer%2==0)?“clamp”:“mirror”; texture(file, u, v, “wrap”,wrap_mode );
  • 41. Components in SIMD OSL Render-time Optimized x86 Render Time Optimization With LLVM* JIT (Just In Time Compilation) Wide Library Divergent Control Flows Vectorized IR Generation “For-each- unique” algorithm SIMD OSL built-ins 41 Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html; <a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a> *Other names and brands may be claimed as the property of others.
  • 42. 42 Scalar computation with Scalar data types Block Vectorization with intrinsics template<int WidthT> void operator() (MaskedAccessor<float, WidthT> wresult, ConstWideAccessor<Vec3, WidthT> wp) const { #pragma forceinline recursive { #pragma omp simd simdlen(WidthT) for(int l=0; l< WidthT; ++l) { Vec3 p = wp[l]; float perlinResult; HashScalar h; perlin_scalar(perlinResult, h, p.x, p.y, p.z); float scaledResult = 0.5f * (perlinResult + 1.0f); wresult[l] = scaledResult; } } } inline void operator() (float &result, const Vec3 &p) const { HashScalar h; perlin(result, h, p.x, p.y, p.z); result = 0.5f * (result + 1.0f); } Explicit Outer Loop Vectorization (Intel® C++ Compiler) (Clang 5+) SIMD OSL’s Perlin Noise
  • 43. OSL Microbenchmarks: Speedup of SIMD AVX-512 OSL over Scalar OSL 0.125 0.25 0.5 1 2 4 8 16 null sin cos tan asin acos atan sinh cosh tanh atan2 sincos log log2 log10 logb exp exp2 expm1 pow erf erfc radians degrees sqrt inversesqrt hypot abs fabs sign floor ceil roundtruncmod min maxclampmix isnan isfinite select dot cross length distance normalize reflect fresnel rotate transform transform_matrix matrix_object_camera determinant transpose linearstep smooth_linearstep noise_perlin noise_cell noise_simplex noise_gabor pnoise_perlin pnoise_cell pnoise_gabor spline_bezier spline_bspline spline_catmull-rom spline_hermitespline_linearspline_constant 48 threads on Intel(R) Xeon(R) Platinum 8260L CPU @2.30GHz (config 2) Average: 6.9x Geomean: 6.14x 43 For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 44. OSL SIMD Performance at Maximum Batch Utilization OSL’s testshade running Intel® AVX-512® on 48 threads of Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1) 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 leopard concrete diamond oak marble Speedupatmaxbatchsize 5.2x 6x 10x 12x 15x 44 *Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 45. SIMD OSL Intel® AVX-512 VS AVX2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 leopard concrete diamond plate oak marble thread donut Speedup 1.6x 1.9x 1.1x OSL’s testshade running Intel® AVX-512 and AVX2 on 48 threads of Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1) 1.3x 1.3x 1.4x 1.8x 45 *Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 46. Evolution of SIMD OSL—Proof of Concept to Production 2016‒2019 SIMD OSL Library SIMD OSL Framework SIMD OSL Performance Intel® AVX-512, AVX2, AVX-specific libraries Masking and scatter- gather 17k+ tests Improved performance on built-in functions Compiler + platform support Reduction in JIT time Coverage for built-in function variants Handling treacherous control flows Noise functions with options LLVM optimization passes to improve AVX2 46
  • 47. SIMD Open Shading Language Open Shading Language https://github.com/imageworks/OpenShadingLanguage https://gitlab.com/intel-osl/BatchedOSL 47
  • 48. This Page Intentionally Left Blank 48
  • 49. Intel® AVX-512 Performance Vs Batch Utilization marble oak diamond concrete leopard 0 5 10 15 batch 1 batch 2 batch 3 batch 4 batch 5 batch 6 batch 7 batch 8 batch 9 batch 10 batch 11 batch 12 batch 13 batch 14 batch 15 batch 16 Speedupfrombatching Performance gain with increased batch utilization 15x 12x 10x 6x 5.2x OSL’s testshade running Intel® AVX-512® on 48 threads of Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1) 49 *Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 50. 22.4 Shading Speedup with SIMD OSL 50 1 1.2 1.4 1.6 1.8 2 2.2 Bonnie’s room Fillmore Bonnie Speedup CLX8260L (24c, 2.3GHz) 1.26x 1.37x 2.06x Image © Disney/Pixar Image © Disney/Pixar Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2) *Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 51. 22.4’s Overall Rendering Speedup with SIMD OSL 51 1 1.05 1.1 1.15 1.2 1.25 1.3 Bonnie’s room Fillmore Bonnie Speedup CLX8260L (24c, 2.3GHz) 1.11x 1.17x 1.27x *Other names and brands may be claimed as the property of others. Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2) For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 52. Bonnie • Real production character with 55 shader networks • 85663 shader operations on 67680 symbols (post-optimization) Image © Disney/Pixar *Other names and brands may be claimed as the property of others. 52 Single Point Batched Amdahl’s Law 66.64% Batch Utilization 2.05x Shading Speedup Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2) For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 53. Performance Progression 3 factors at play: ● Efficiency of the generated vectorized shader code ● Effective vectorization of the shading interface ● How effective is the renderer in taking advantage of the vectorized shading language 53 Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 54. Efficiency in the shading language Most effort up to now on the quality of the shader code generation ● Masked control flow for vectorized execution ● Optimization of noises and math functions ● Optimization of texture calls. 54Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 55. Efficiency in the Shading API 55 The shading language calls into the renderer ● To access data, primvars, tranforms, etc… ● To compute things, texture interpolation, trace rays, etc… ● To return values ● All of the above is nicely vectorized (batched) ● We call across the API boundaries fewer times Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 56. Efficiency in the Renderer 56 We started with a vectorized renderer ● RIS is one of the few vectorized renderers in the industry that works on ray batches ● It turns out that our batch granularity is not enabling effective vectorization ● Results we see today are a fraction of the benefit we would get. Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 57. Efficiency in the Renderer What is efficient? ● Portions of the renderer where execution is coherent ● Displacement shading ● Camera rays hits What is inefficient? ● Indirect illumination ● Deep bounces 57 Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 58. Efficiency in the Renderer 58 *Other names and brands may be claimed as the property of others. 1 point 2 points 3 points 4 points 5 points 6 points 7 points 8 points 9 points 10 points 11 points 12 points 13 points 14 points 15 points 16 points 0 10 20 30 40 50 60 70 80 1 Bounce 2 Bounces 3 Bounces 5 Bounces 9 Bounces 7.3% 13.9% 18.9% 22.3% 25.4% 76.6% 67.1% 60.9% 56.5% 52.6% %ofBatchesSubmitted Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 4) For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 59. Efficiency in the Renderer How do we currently accomodate for low occupancy? ● We switch over single point evaluation for small batches. ● We use some heuristic to determine when to switch. ● A threshold point of 4 active lanes tends to be a decent starting point. ● This may change as more optimizations are done ● However it would be best to guarantee high SIMD occupancy 59 Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 60. Towards a new Rendering Architecture Batches are currently determined by the size of bucket rendering ● Computational workload is uneven throughout the image ● Larger buckets gives more points, higher occupancy ● Larger buckets means one thread may be stuck rendering a single heavy buckets for long time, reducing thread scaling ● Decent bucket size for good thread load balancing is 8x8 or 16x16. ● This is a batch size of 64-256. ● We would need 2k-8k batch size at least. 60 Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 61. Different options at hand ● Wavefront rendering ● Shading queues ● Non image-space decomposition scheduling ● The new architecture in being implemented in Pixar’s Renderman® XPU ● Stay tuned 61 Towards a new Rendering Architecture Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 62. OSL Shaders • Concrete - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/concrete.osl • Modifications: • Leopard - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/leopard.osl • Diamond plate - https://github.com/varkenvarken/osl- shaders/blob/master/Shaders/diamondplateshader.osl • Thread - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN- Experimental/Threads.osl • Donut - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN- Experimental/TheDonutShader.osl • Oak – https://renderman.pixar.com/forum/download.php • Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/oak.osl • Marble - https://renderman.pixar.com/forum/download.php • Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/marble.osl < float grain=noise("gabor",p,8,"bandwidth",4,"anisotropic",2,"direction",vector(SandDensity,0 ,0)); --- > float grain=noise("gabor",p,8); *Other names and brands may be claimed as the property of others. 62
  • 63. 63 Config 1 Config 2 Config 3 Config 4 Model name Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Core(s) per socket24 24 18 20 Socket(s)2 2 2 2 Memory192GB, DDR4-2933 Mhz (12 x 16GB) 192GB, DDR4-2933 Mhz (12 x 16GB) 128GB, DDR4-2400 MHz (8 x 16GB) 192GB, DDR4-2666 Mhz (12 x 16GB) CPU Power PolicyPerformance Performance Performance Powersave HyperthreadingDisabled Enabled Enabled Enabled Turbo Boost TechEnabled Enabled Enabled Enabled L1d cache32K 32K 32K 32K L1i cache32K 32K 32K 32K L2 cache1024K 1024K 256K 1024K L3 cache36608K 33792K 46080K 28160K Operating SystemFedora release 27 (Twenty Seven) CentOS Linux release 7.6.1810 (Core) Red Hat Enterprise Linux Server release 7.2 (Maipo) CentOS Linux release 7.3.1611 (Core) Bios Version SE5C620.86B.0D.01.0286.0111201908 16 SE5C620.86B.0D.01.0395.022720191 340 GRRFSDP1.86B0271.R00.1510301446 SE5C620.86B.01.00.0412.020920172159 Configurations
  • 64. • Subtitle Copy Goes Here