SlideShare a Scribd company logo
1 of 42
Download to read offline
Advanced Scenegraph Rendering Pipeline
Markus Tavenrath – NVIDIA – matavenrath@nvidia.com
Christoph Kubisch – NVIDIA – ckubisch@nvidia.com
2
 Traditional approach is render while
traversing a SceneGraph
 Scene complexity increases
– Deep hierarchies, traversal expensive
– Large objects split up into a lot of
little pieces, increased draw call count
– Unsorted rendering, lot of state
changes
 CPU becomes bottleneck when
rendering those scenes
SceneGraph Rendering
models courtesy of PTC
Introduction SceneGraph SceneTree ShapeList Renderer
3
Overview
Introduction SceneGraph SceneTree ShapeList Renderer
G0
T0 T1
T2
S1 S2
G1
T3
S0
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
ShapeList
S1 S2S0 S1‘ S2‘
RendererCache
S1 S2S0 S1‘ S2‘
SceneGraph SceneTree ShapeList
Renderer
Gi Group Ti Transform Si Shape
4
SceneGraph
G0
T0 T1
T2
S1 S2
G1
T3
S0
 SceneGraph is DAG
 No unique path to a node
– Cannot efficiently cache path-dependent data per node
 Traversal runs over 14 nodes for rendering.
 Processed 6 Transform Nodes
– 6 matrix/matrix multiplications and inversions
 Nodes are usually ‚large‘ and not linear in memory
– Each node access generates at least one, most
likely cache misses
Gi Group Ti Transform Si Shape
Introduction SceneGraph SceneTree ShapeList Renderer
5
SceneTree construction
G0
T0 T1
T2
S1 S2
G1
T3
S0
Gi Group Ti Transform Si Shape
Introduction SceneGraph SceneTree ShapeList Renderer
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
G0 -> (G0)
T0 -> (T0)
T1 -> (T1)
S0 -> (S0)
G1 -> (G1)
T2 -> (T2)
S1 -> (S1)
T3 -> (T3)
S2 -> (S2)
G0 -> (G0)
T0 -> (T0)
T1 -> (T1)
S0 -> (S0)
G1 -> (G1,G1‘)
T2 -> (T2,T2‘)
S1 -> (S1,S1‘)
T3 -> (T3,T3‘)
S2 -> (S2,S2‘)
 Observer based
synchronization
6
 SceneTree has unique path to each
node
 Store accumulated attributes like
transforms or visibility in each Node
 Trade memory for performance
– 64-byte per node, 100k nodes ~6MB
– Transforms stored separate vector
 Traversal still processes 14 nodes.
Gi Group Ti Transform Si Shape
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
SceneTree
Introduction SceneGraph SceneTree ShapeList Renderer
7
SceneTree invalidate attributes cache
 Keep dirty flags per node
 Keep dirty vector per flag
 SceneGraph change notifications
invalidated nodes
– If not dirty, mark dirty and add to
dirty vector
– O(1) operation, no sorting required
upon changes
 Before rendering a frame process
dirty vectorDirty vector
T1T3 T3‘
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
Introduction SceneGraph SceneTree ShapeList Renderer
8
T3‘
T1
T2‘T3
SceneTree validate attribute cache
 Walk through dirty vector
— Node marked dirty -> search top dirty
— Validate subtree from top dirty
 Validation example
— T3 dirty, traverse up to root node
 T3 top dirty node, validate T3 subtree
— T3‘ dirty, traverse up to root node
 T1 top dirty node, validate T1 subtree
— T1 not dirty
 No work to do
Dirty vector
T1T3 T3‘
G0
T0 T1
T2
S1 S2
G1
T3
S0
T3‘
S1‘ S2‘
G1‘
T3 T1T3‘
Introduction SceneGraph SceneTree ShapeList Renderer
9
SceneTree to ShapeList
 Add Events for ShapeList generation
– addShape(Shape)
– removeShape(Shape)
ShapeList
S1 S2S0 S1‘ S2‘
Gi Group Ti Transform Si Shape
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
Introduction SceneGraph SceneTree ShapeList Renderer
10
 SceneGraph to SceneTree synchronization
– Store accumulated data per node instance
 SceneTree to ShapeList synchronization
– Avoid SceneTree traversal
 Next: Efficient data structure for renderer based on
ShapeList
Summary
Introduction SceneGraph SceneTree ShapeList Renderer
11
Renderer Data Structures
Program
Shader
Vertex
Shader
Fragment
ParameterDescription
Camera
ParameterDescription
Light
ParameterDescription
Matrices
ParameterDescription
Material
ambient
diffuse
specular
texture
Name Type Arraysize
vec3
vec3
vec3
Sampler
2
2
2
0
ParamaterDescriptionShape
Program
‚colored‘
Geometry
Shape1
ParameterData
Camera1
ParameterData
Lightset 1
ParameterData
Transform 1
ParameterData
red
Introduction SceneGraph SceneTree ShapeList Renderer
12
Example Parameter Grouping
Shader independent globals, i.e. camera
Object parameters, i.e. position/rotation/scaling
Material handles, i.e. textures and buffers
Material raw values, i.e. float, int and bool
Light, i.e. light sources and shadow maps
Shader dependent globals, i.e. environment map
always
frequent
constant
rare
Introduction SceneGraph SceneTree ShapeList Renderer
Parameters Frequency
13
Shapes
coloredGroup
by
Program
shapelist
Rendering structures
‚colored‘ ‚textured‘ ‚colored‘ ‚colored‘ ‚textured‘
‚colored‘ ‚colored‘‚colored‘
Shapes
textured
‚textured‘ ‚textured‘
Introduction SceneGraph SceneTree ShapeList Renderer
Sort by ParameterData
14
ParameterData Cache
 Cache is a big char[] with all ParameterData.
 ParameterData are sorted by first usage.
 Parameters are converted to Target-API datatype, i.e.
— Int8 to int32, TextureHandle to bindless texture...
 Updating parameters is only playback of data in memory,
no conditionals.
 Filter for used parameters to reduce cache size
Parameters
colored
red blue
Parameters
textured
wood marble
Introduction SceneGraph SceneTree ShapeList Renderer
15
Vertex Attribute Cache
 Big char[] with vertex attribute pointers
— Bindless pointers, VBOs or VAB streams
 Each set of attributes stored only once
 Ordered by first usage
 Attributes required by program are known
— Store only used attributes in Cache
— Useful for special passes like depth pass where only pos is
required
attributes
colored
pos
normal
pos
normal
pos
normal
Introduction SceneGraph SceneTree ShapeList Renderer
16
Renderer Cache complete
Parameters
colored
Phong red Phong blue
Shapes
colored
‚colored‘ ‚colored‘‚colored‘
Attributes
colored
pos
normal
pos
normal
pos
normal
Introduction SceneGraph SceneTree ShapeList Renderer
foreach(shape) {
if (visible(shape)) {
if (changed(parameters)) render(parameters);
if (changed(attributes)) render(attributes);
render(shape);
}
}
17
 CPU boundedness improved (application)
– Recomputation of attributes (transforms)
– Deep hierarchies: traversal expensive
– Unsorted rendering, lot of state changes
 CPU boundedness remaining (OpenGL usage)
– Large objects split up into a lot of little pieces,
increased draw call count
Achievements
ShapeList
Renderer
SceneTree
RendererCache
OpenGL
implementation
Renderer
18
 Avoid data redundancy
– Data stored once, referenced multiple times
– Update only once (less host to gpu transfers)
 Increase batching potential
– Further cuts api calls
– Less driver CPU work
 Minimize CPU/GPU interaction
– Allow GPU to update its own data
– Lower api usage when scene is changed little
– E.g. GPU-based culling
Enabling Hardware Scalability
19
 Avoids classic SceneGraph design
 Geometry
– Vertex/IndexBuffer
– BoundingBox
– Divided into parts (CAD features)
 Material
 Node Hierarchy
 Object
– Node and Geometry reference
– For each Geometry part
 Material reference
 Enabled state
model courtesy of PTC
OpenGL Research Framework
 99000 total parts, 3.8 Mtris, 5.1 Mverts
 700 geometries, 128 materials
 2000 objects
20
 Kepler Quadro K5000, i7
 vbo bind and drawcall per part, i.e. 99 000
drawcalls
scene draw time > 38 ms (CPU bound)
 vbo bind per geometry, drawcalls per part
scene draw time > 14 ms (CPU bound)
 All subsequent techniques raise perf significantly
scene draw time < 6 ms
1.8 ms with occlusion culling
Performance baseline
21
 MultiDraw (1.x)
– Render ranges from current VBO/IBO
– Single drawcall for many distinct objects
– Reduces overhead for low complexity objects
 ARB_draw_indirect (4.x)
 ARB_multi_draw_indirect
– Store drawcall information on GPU or HOST
– Let GPU create/modify GPU buffers
Drawcall Reduction
DrawElementsIndirect
{
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLint baseVertex;
GLuint baseInstance;
}
22
– All use multidraw capabilites to
render across gaps
– BATCHED use CPU generated list of
combined parts with same state
 Object‘s part cache must be rebuilt
based on material/enabled state
– INDIVIDUAL stay on per-part level
 No caches, can update assignment or
cmd buffers directly
Drawing Techniques
a b c
a+b c
Parts with different materials in geometry
Grouped and „grown“ drawcalls
Single call, encode material/matrix
assignment via vertex attribute
23
 Group parameters by frequency of change
 Generating shader strings allows different storage
backend for „uniforms“
Parameters
Effect "Phong {
Group „material" (many) {
vec4 "ambient"
vec4 "diffuse"
vec4 "specular"
}
Group „view" (few) {
vec4 „viewProjTM„
}
Group „object" (many) {
mat4 „worldTM„
}
... Code ...
}
 OpenGL 2 uniforms
 OpenGL 3,4 buffers
 NVIDIA bindless technology...
24
 GL2 approach:
– Avoid many small
uniforms
– Arrays of uniforms,
grouped by frequency of
update, tightly-packed
Parameters
uniform mat4 worldMatrices[2];
uniform vec4 materialData[8];
#define matrix_world worldMatrices[0]
#define matrix_worldIT worldMatrices[1]
#define material_diffuse materialData[0]
#define material_emissive materialData[1]
#define material_gloss materialData[2].x
// GL3 can use floatBitsToInt and friends
// for free reinterpret casts within
// macros
...
wPos = matrix_world * oPos;
...
// in fragment shader
color = material_diffuse +
material_emissive;
...
25
 GL4 approach:
– TextureBufferObject
(TBO) for matrices
– UniformBufferObject
(UBO) with array data
to save costly binds
– Assignment indices
passed as vertex
attribute
Parameters in vec4 oPos;
uniform samplerBuffer matrixBuffer;
uniform materialBuffer {
Material materials[512];
};
in ivec2 vAssigns;
flat out ivec2 fAssigns;
// in vertex shader
fAssigns = vAssigns;
worldTM = getMatrix (matrixBuffer,
vAssigns.x);
wPos = worldTM * oPos;
...
// in fragment shader
color = materials[fAssigns.y].color;
...
26
setupSceneMatrixAndMaterialBuffer (scene);
foreach (obj in scene) {
if ( isVisible(obj) ) {
setupDrawGeometryVertexBuffer (obj);
// iterate over different materials used
foreach ( batch in obj.materialCaches) {
glVertexAttribI2i (indexAttr, batch.materialIndex, matrixIndex);
glMultiDrawElements (GL_TRIANGLES, batch.counts, GL_UNSIGNED_INT ,
batch.offsets,batched.numUsed);
}
}
}
OpenGL 4.x approach
27
glVertexAttribDivisor == 0 : VArray[ gl_VertexID + baseVertex ]
glVertexAttribDivisor != 0 : VArray[ gl_InstanceID / VDivisor + baseInstance ]
VArray[ 0 / 1 + baseInstance ]
Material & Matrix Index
VertexBuffer (divisor:1)
Position & Normal
VertexBuffer (divisor:0)
...
instanceCount = 1
baseInstance = 0
...
instanceCount = 1
baseInstance = 1
MultiDrawIndirect
Buffer
Per drawcall vertex attribute
vertex attributes
fetched for last
vertex in second
drawcall
baseinstance = 1
28
OpenGL 4.2+ indirect approach
...
foreach ( obj in scene.objects ) {
...
// instead of glVertexAttribI2i calls and a loop
// we use the baseInstance for the attribute
// bind special assignment buffer as vertex attribute
glBindBuffer ( GL_ARRAY_BUFFER, obj->assignBuffer);
glVertexAttribIPointer (indexAttr, 2, GL_INT, . . . );
// draw everything in one go
glMultiDrawElementsIndirect ( GL_TRIANGLES, GL_UNSIGNED_INT,
obj->indirectOffset, obj->numIndirects, 0 );
}
29
 ARB_vertex_attrib_binding
(VAB)
– Avoids many buffer changes
– Separates format from data
– Bind multiple vertex
attributes to one buffer
 NV_vertex_buffer_unified_
memory (VBUM)
– Allows very fast switching
through GPU pointers
Vertex Setup
/* setup once, similar to glVertexAttribPointer
but with relative offset last */
glVertexAttribFormat (ATTR_NORMAL, 3,
GL_FLOAT, GL_TRUE, offsetof(Vertex,normal));
glVertexAttribFormat (ATTR_POS, 3,
GL_FLOAT, GL_FALSE, offsetof(Vertex,pos));
// bind to stream
glVertexAttribBinding (ATTR_NORMAL, 0);
glVertexAttribBinding (ATTR_POS, 0);
// switch single stream buffer
glBindVertexBuffer (0, bufID, 0, sizeof(Vertex));
// NV_vertex_buffer_unified_memory
// enable once and set stride
glEnableClientState (GL_VERTEX...NV);...
glBindVertexBuffer (0, 0, 0, sizeof(Vertex));
// switch single buffer via pointer
glBufferAddressRangeNV (GL_VERTEX...,0,bufADDR,
bufSize);
30
0
200
400
600
800
1000
VBO VAB VAB+VBUM BINDLESS
INDIRECT HOST
VAB+VBUM
BINDLESS
INDIRECT GPU
VAB+VBUM
Timeinmicroseocnds[us]
– Vertex/Index setup inside MultiDrawIndirect command
NV_bindless_multidraw_indirect
one GL call to draw entire scene
GPU benefit depends on triangles
per drawcall (> ~ 500)
NV_bindless_multidraw_indirect
~ 2400 drawcalls, GL4 BATCHED style
Lower is
Better
Effect on CPU time
31
0
1000
2000
3000
4000
5000
6000
K KB K KB K KB K KB
GL4 INDIRECTHOST
INDIVIDUAL
GL4 INDIRECTGPU
INDIVIDUAL
GL4 BATCHED GL2 BATCHED
Timeinmicroseconds[us]
Bindless (green) always reduces CPU, and
may help framerate/GPU a bit
K = Kepler 5000, regular VBO
KB = Kepler 5000, VBUM + VAB
Lower is
Better
GPU
CPU
99.000 2.400 hw drawcalls
2.000 2.400 sw drawcalls
32
0
1000
2000
3000
4000
5000
6000
K KB K KB K KB K KB
GL4 INDIRECTHOST
INDIVIDUAL
GL4 INDIRECTGPU
INDIVIDUAL
GL4 BATCHED GL2 BATCHED
Timeinmicroseconds[us]
GPU
CPU
MultiDrawIndirect achieves almost 20 Mio drawcalls per
second (2000 VBO changes, „only“ 1/3 perf lost).
GPU-buffered commands save lots of CPU time
Lower is
Better
99.000 2.400 hw drawcalls
2.000 2.400 sw drawcalls
Scene-dependent!
INDIVIDUAL could be as fast
if enough work per drawcall
K = Kepler 5000, regular VBO
KB = Kepler 5000, VBUM + VAB
33
0
1000
2000
3000
4000
5000
6000
K KB K KB K KB K KB
GL4 INDIRECTHOST
INDIVIDUAL
GL4 INDIRECTGPU
INDIVIDUAL
GL4 BATCHED GL2 BATCHED
Timeinmicroseconds[us]
GL2 uniforms beat paletted UBO a bit in GPU, but are slower on
CPU side. (1 glUniform call with 8x vec4, vs indexed UBO)
Lower is
Better
GPU
CPU
Scene-dependent!
GL4 better when more
materials changed per object
K = Kepler 5000, regular VBO
KB = Kepler 5000, VBUM + VAB
99.000 2.400 hw drawcalls
2.000 2.400 sw drawcalls
34
 Share geometry buffers for batching
 Group parameters for fast updating
 MultiDraw/Indirect for keeping objects
independent or remove additional loops
– baseInstance to provide unique
index/assignments for drawcall
 Bindless to reduce validation
overhead/add flexibility
Recap
35
 GPU friendly processing
– Matrix and bbox buffer, object buffer
– XFB/Compute or „invisible“ rendering
– Vs. old techniques: Single GPU job for ALL objects!
 Results
– „Readback“ GPU to Host
 Can use GPU to pack into bit stream
– „Indirect“ GPU to GPU
 Set DrawIndirect‘s instanceCount to 0 or 1
GPU Culling Basics
0,1,0,1,1,1,0,0,0
buffer cmdBuffer{
Command cmds[];
};
...
cmds[obj].instanceCount = visible;
36
 OpenGL 4.2+
– Depth-Pass
– Raster „invisible“ bounding boxes
 Disable Color/Depth writes
 Geometry Shader to create the three
visible box sides
 Depth buffer discards occluded fragments
(earlyZ...)
 Fragment Shader writes output:
visible[objindex] = 1
Occlusion Culling
// GLSL fragment shader
// from ARB_shader_image_load_store
layout(early_fragment_tests) in;
buffer visibilityBuffer{
int visibility[];
};
flat in int objID;
void main(){
visibility[objID] = 1;
}
// buffer would have been cleared
// to 0 before
Passing bbox fragments
enable object
Algorithm by
Evgeny Makarov, NVIDIA
depth
buffer
37
 Exploit that majority of objects don‘t change
much relative to camera
 Draw each object only once (vertex/drawcall-
bound)
– Render last visible, fully shaded
(last)
– Test all against current depth:
(visible)
– Render newly added visible:
none, if no spatial changes made
(~last) & (visible)
– (last) = (visible)
Temporal Coherence
frame: f – 1
frame: f
last visible
bboxes occluded
bboxes pass depth
(visible)
new visible
invisible
visible
camera
camera
moved
38
Culling Readback vs Indirect
0
500
1000
1500
2000
2500
readback indirect NVindirect
Timeinmicroseconds[us] In the „draw new visible“ phase indirect cannot
benefit of „nothing to setup/draw“ in advance,
still processes „empty“ lists
For readback results, CPU has to
wait for GPU idle
37% faster with
culling
33% faster with
culling 37% faster with
culling
NV_bindless_
multidraw_indirect
saves CPU and bit of
GPU time
Scene-dependent,
i.e. triangles per
drawcall and # of
„invisible“Lower is
Better
GL4 BATCHED style
GPU
CPU
39
 Temporal culling very useful for object/vertex-boundedness
– Can also apply for Z-pass...
 Readback vs Indirect
– Readback variant „easier“ to be faster (no setups...), but syncs!
– NV_bindless_multidraw benefit depends on scene (VBO changes
and primitives per drawcall)
 Working towards GPU autonomous system
– (NV_bindless)/ARB_multidraw_indirect as mechanism for GPU
creating its own work, research and feature work in progresss
Culling Results
40
 Thank you!
– Contact
 ckubisch@nvidia.com
 matavenrath@nvidia.com
glFinish();
41
 Family of extensions to use
native handles/addresses
 NV_vertex_buffer_unified_memory
 NV_bindless_multidraw_indirect
 NV_shader_buffer_load/store
– Pointers in GLSL
 NV_bindless_texture
– No more unit restrictions
– References inside buffers
NVIDIA Bindless Technology
// GLSL with true pointers
uniform MyStruct* mystructs;
// API
glUniformui64NV (bufferLocation,
bufferADDR);
texHDL = glGetTextureHandleNV (tex);
// later instead of glBindTexture
glUniformHandleui64NV (texLocation,
texHDL)
// GLSL
// can also store textures in resources
uniform materialBuffer {
sampler2D manyTextures [LARGE];
}
42
0
500
1000
1500
2000
2500
Readback
GPU
Readback
CPU
Indirect
GPU
Indirect
CPU
NVIndirect
GPU
NVIndirect
CPU
Timeinmicroseconds
6. Draw New Visible
5. Update Internals
4. Occlusion Cull
3. Draw Last Visible
2. Update Internals
1. Frustum Cull
Nothing „new“ to draw, but CPU doesn‘t
know, still setting things up, GPU runs
thru „empty“ cmd buffer
For readback results, CPU has to
wait for GPU idle
Culling
Readback
vs
Indirect 432 fps with culling
315 without 387 fps with culling
289 without
429 fps with culling
313 without
Special bindless indirect
version can save lots of
CPU and a bit GPU costs
for drawing the scene
with a single big cmd
buffer

More Related Content

What's hot

A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space MarinePope Kim
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingElectronic Arts / DICE
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The SurgePhilip Hammer
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3Electronic Arts / DICE
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017Mark Kilgard
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Tiago Sousa
 
Beyond porting
Beyond portingBeyond porting
Beyond portingCass Everitt
 

What's hot (20)

A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space Marine
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 

Viewers also liked

Game engine introduction and approach
Game engine introduction and approachGame engine introduction and approach
Game engine introduction and approachDuy Tan Geek
 
Game Engine Overview
Game Engine OverviewGame Engine Overview
Game Engine OverviewSharad Mitra
 
Game Engine Architecture
Game Engine ArchitectureGame Engine Architecture
Game Engine ArchitectureAttila Jenei
 
What Is A Game Engine
What Is A Game EngineWhat Is A Game Engine
What Is A Game EngineSeth Sivak
 
Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4jrouwe
 
Design your 3d game engine
Design your 3d game engineDesign your 3d game engine
Design your 3d game engineDaosheng Mu
 
GRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityGRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityMichael Heron
 
Unity - Game Engine
Unity - Game EngineUnity - Game Engine
Unity - Game EngineGeeks Anonymes
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and ProgrammingSumit Jain
 
Ray tracing
 Ray tracing Ray tracing
Ray tracingLinh Nguyen
 
Pitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYPitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYAnaya Medias Swiss
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
 

Viewers also liked (15)

Game engine introduction and approach
Game engine introduction and approachGame engine introduction and approach
Game engine introduction and approach
 
Game Engine Overview
Game Engine OverviewGame Engine Overview
Game Engine Overview
 
Game Engine Architecture
Game Engine ArchitectureGame Engine Architecture
Game Engine Architecture
 
What Is A Game Engine
What Is A Game EngineWhat Is A Game Engine
What Is A Game Engine
 
Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4
 
Design your 3d game engine
Design your 3d game engineDesign your 3d game engine
Design your 3d game engine
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 
GRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityGRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and Radiosity
 
Unity - Game Engine
Unity - Game EngineUnity - Game Engine
Unity - Game Engine
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and Programming
 
Ray tracing
 Ray tracing Ray tracing
Ray tracing
 
Pitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYPitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONY
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 

Similar to Advanced Scenegraph Rendering Pipeline

D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Penn graphics
Penn graphicsPenn graphics
Penn graphicsfloored
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakovmistercteam
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010Mark Kilgard
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Introduction To Geometry Shaders
Introduction To Geometry ShadersIntroduction To Geometry Shaders
Introduction To Geometry Shaderspjcozzi
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shadersgueste52f1b
 
Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Matthias Trapp
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)mistercteam
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 
2DCompsitionEngine
2DCompsitionEngine2DCompsitionEngine
2DCompsitionEngineShereef Shehata
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Prabindh Sundareson
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?Bartlomiej Filipek
 

Similar to Advanced Scenegraph Rendering Pipeline (20)

D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Penn graphics
Penn graphicsPenn graphics
Penn graphics
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Introduction To Geometry Shaders
Introduction To Geometry ShadersIntroduction To Geometry Shaders
Introduction To Geometry Shaders
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shaders
 
Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
Chapter-3.pdf
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
2DCompsitionEngine
2DCompsitionEngine2DCompsitionEngine
2DCompsitionEngine
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Advanced Scenegraph Rendering Pipeline

  • 1. Advanced Scenegraph Rendering Pipeline Markus Tavenrath – NVIDIA – matavenrath@nvidia.com Christoph Kubisch – NVIDIA – ckubisch@nvidia.com
  • 2. 2  Traditional approach is render while traversing a SceneGraph  Scene complexity increases – Deep hierarchies, traversal expensive – Large objects split up into a lot of little pieces, increased draw call count – Unsorted rendering, lot of state changes  CPU becomes bottleneck when rendering those scenes SceneGraph Rendering models courtesy of PTC Introduction SceneGraph SceneTree ShapeList Renderer
  • 3. 3 Overview Introduction SceneGraph SceneTree ShapeList Renderer G0 T0 T1 T2 S1 S2 G1 T3 S0 G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ ShapeList S1 S2S0 S1‘ S2‘ RendererCache S1 S2S0 S1‘ S2‘ SceneGraph SceneTree ShapeList Renderer Gi Group Ti Transform Si Shape
  • 4. 4 SceneGraph G0 T0 T1 T2 S1 S2 G1 T3 S0  SceneGraph is DAG  No unique path to a node – Cannot efficiently cache path-dependent data per node  Traversal runs over 14 nodes for rendering.  Processed 6 Transform Nodes – 6 matrix/matrix multiplications and inversions  Nodes are usually ‚large‘ and not linear in memory – Each node access generates at least one, most likely cache misses Gi Group Ti Transform Si Shape Introduction SceneGraph SceneTree ShapeList Renderer
  • 5. 5 SceneTree construction G0 T0 T1 T2 S1 S2 G1 T3 S0 Gi Group Ti Transform Si Shape Introduction SceneGraph SceneTree ShapeList Renderer G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ G0 -> (G0) T0 -> (T0) T1 -> (T1) S0 -> (S0) G1 -> (G1) T2 -> (T2) S1 -> (S1) T3 -> (T3) S2 -> (S2) G0 -> (G0) T0 -> (T0) T1 -> (T1) S0 -> (S0) G1 -> (G1,G1‘) T2 -> (T2,T2‘) S1 -> (S1,S1‘) T3 -> (T3,T3‘) S2 -> (S2,S2‘)  Observer based synchronization
  • 6. 6  SceneTree has unique path to each node  Store accumulated attributes like transforms or visibility in each Node  Trade memory for performance – 64-byte per node, 100k nodes ~6MB – Transforms stored separate vector  Traversal still processes 14 nodes. Gi Group Ti Transform Si Shape G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ SceneTree Introduction SceneGraph SceneTree ShapeList Renderer
  • 7. 7 SceneTree invalidate attributes cache  Keep dirty flags per node  Keep dirty vector per flag  SceneGraph change notifications invalidated nodes – If not dirty, mark dirty and add to dirty vector – O(1) operation, no sorting required upon changes  Before rendering a frame process dirty vectorDirty vector T1T3 T3‘ G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ Introduction SceneGraph SceneTree ShapeList Renderer
  • 8. 8 T3‘ T1 T2‘T3 SceneTree validate attribute cache  Walk through dirty vector — Node marked dirty -> search top dirty — Validate subtree from top dirty  Validation example — T3 dirty, traverse up to root node  T3 top dirty node, validate T3 subtree — T3‘ dirty, traverse up to root node  T1 top dirty node, validate T1 subtree — T1 not dirty  No work to do Dirty vector T1T3 T3‘ G0 T0 T1 T2 S1 S2 G1 T3 S0 T3‘ S1‘ S2‘ G1‘ T3 T1T3‘ Introduction SceneGraph SceneTree ShapeList Renderer
  • 9. 9 SceneTree to ShapeList  Add Events for ShapeList generation – addShape(Shape) – removeShape(Shape) ShapeList S1 S2S0 S1‘ S2‘ Gi Group Ti Transform Si Shape G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ Introduction SceneGraph SceneTree ShapeList Renderer
  • 10. 10  SceneGraph to SceneTree synchronization – Store accumulated data per node instance  SceneTree to ShapeList synchronization – Avoid SceneTree traversal  Next: Efficient data structure for renderer based on ShapeList Summary Introduction SceneGraph SceneTree ShapeList Renderer
  • 11. 11 Renderer Data Structures Program Shader Vertex Shader Fragment ParameterDescription Camera ParameterDescription Light ParameterDescription Matrices ParameterDescription Material ambient diffuse specular texture Name Type Arraysize vec3 vec3 vec3 Sampler 2 2 2 0 ParamaterDescriptionShape Program ‚colored‘ Geometry Shape1 ParameterData Camera1 ParameterData Lightset 1 ParameterData Transform 1 ParameterData red Introduction SceneGraph SceneTree ShapeList Renderer
  • 12. 12 Example Parameter Grouping Shader independent globals, i.e. camera Object parameters, i.e. position/rotation/scaling Material handles, i.e. textures and buffers Material raw values, i.e. float, int and bool Light, i.e. light sources and shadow maps Shader dependent globals, i.e. environment map always frequent constant rare Introduction SceneGraph SceneTree ShapeList Renderer Parameters Frequency
  • 13. 13 Shapes coloredGroup by Program shapelist Rendering structures ‚colored‘ ‚textured‘ ‚colored‘ ‚colored‘ ‚textured‘ ‚colored‘ ‚colored‘‚colored‘ Shapes textured ‚textured‘ ‚textured‘ Introduction SceneGraph SceneTree ShapeList Renderer Sort by ParameterData
  • 14. 14 ParameterData Cache  Cache is a big char[] with all ParameterData.  ParameterData are sorted by first usage.  Parameters are converted to Target-API datatype, i.e. — Int8 to int32, TextureHandle to bindless texture...  Updating parameters is only playback of data in memory, no conditionals.  Filter for used parameters to reduce cache size Parameters colored red blue Parameters textured wood marble Introduction SceneGraph SceneTree ShapeList Renderer
  • 15. 15 Vertex Attribute Cache  Big char[] with vertex attribute pointers — Bindless pointers, VBOs or VAB streams  Each set of attributes stored only once  Ordered by first usage  Attributes required by program are known — Store only used attributes in Cache — Useful for special passes like depth pass where only pos is required attributes colored pos normal pos normal pos normal Introduction SceneGraph SceneTree ShapeList Renderer
  • 16. 16 Renderer Cache complete Parameters colored Phong red Phong blue Shapes colored ‚colored‘ ‚colored‘‚colored‘ Attributes colored pos normal pos normal pos normal Introduction SceneGraph SceneTree ShapeList Renderer foreach(shape) { if (visible(shape)) { if (changed(parameters)) render(parameters); if (changed(attributes)) render(attributes); render(shape); } }
  • 17. 17  CPU boundedness improved (application) – Recomputation of attributes (transforms) – Deep hierarchies: traversal expensive – Unsorted rendering, lot of state changes  CPU boundedness remaining (OpenGL usage) – Large objects split up into a lot of little pieces, increased draw call count Achievements ShapeList Renderer SceneTree RendererCache OpenGL implementation Renderer
  • 18. 18  Avoid data redundancy – Data stored once, referenced multiple times – Update only once (less host to gpu transfers)  Increase batching potential – Further cuts api calls – Less driver CPU work  Minimize CPU/GPU interaction – Allow GPU to update its own data – Lower api usage when scene is changed little – E.g. GPU-based culling Enabling Hardware Scalability
  • 19. 19  Avoids classic SceneGraph design  Geometry – Vertex/IndexBuffer – BoundingBox – Divided into parts (CAD features)  Material  Node Hierarchy  Object – Node and Geometry reference – For each Geometry part  Material reference  Enabled state model courtesy of PTC OpenGL Research Framework  99000 total parts, 3.8 Mtris, 5.1 Mverts  700 geometries, 128 materials  2000 objects
  • 20. 20  Kepler Quadro K5000, i7  vbo bind and drawcall per part, i.e. 99 000 drawcalls scene draw time > 38 ms (CPU bound)  vbo bind per geometry, drawcalls per part scene draw time > 14 ms (CPU bound)  All subsequent techniques raise perf significantly scene draw time < 6 ms 1.8 ms with occlusion culling Performance baseline
  • 21. 21  MultiDraw (1.x) – Render ranges from current VBO/IBO – Single drawcall for many distinct objects – Reduces overhead for low complexity objects  ARB_draw_indirect (4.x)  ARB_multi_draw_indirect – Store drawcall information on GPU or HOST – Let GPU create/modify GPU buffers Drawcall Reduction DrawElementsIndirect { GLuint count; GLuint instanceCount; GLuint firstIndex; GLint baseVertex; GLuint baseInstance; }
  • 22. 22 – All use multidraw capabilites to render across gaps – BATCHED use CPU generated list of combined parts with same state  Object‘s part cache must be rebuilt based on material/enabled state – INDIVIDUAL stay on per-part level  No caches, can update assignment or cmd buffers directly Drawing Techniques a b c a+b c Parts with different materials in geometry Grouped and „grown“ drawcalls Single call, encode material/matrix assignment via vertex attribute
  • 23. 23  Group parameters by frequency of change  Generating shader strings allows different storage backend for „uniforms“ Parameters Effect "Phong { Group „material" (many) { vec4 "ambient" vec4 "diffuse" vec4 "specular" } Group „view" (few) { vec4 „viewProjTM„ } Group „object" (many) { mat4 „worldTM„ } ... Code ... }  OpenGL 2 uniforms  OpenGL 3,4 buffers  NVIDIA bindless technology...
  • 24. 24  GL2 approach: – Avoid many small uniforms – Arrays of uniforms, grouped by frequency of update, tightly-packed Parameters uniform mat4 worldMatrices[2]; uniform vec4 materialData[8]; #define matrix_world worldMatrices[0] #define matrix_worldIT worldMatrices[1] #define material_diffuse materialData[0] #define material_emissive materialData[1] #define material_gloss materialData[2].x // GL3 can use floatBitsToInt and friends // for free reinterpret casts within // macros ... wPos = matrix_world * oPos; ... // in fragment shader color = material_diffuse + material_emissive; ...
  • 25. 25  GL4 approach: – TextureBufferObject (TBO) for matrices – UniformBufferObject (UBO) with array data to save costly binds – Assignment indices passed as vertex attribute Parameters in vec4 oPos; uniform samplerBuffer matrixBuffer; uniform materialBuffer { Material materials[512]; }; in ivec2 vAssigns; flat out ivec2 fAssigns; // in vertex shader fAssigns = vAssigns; worldTM = getMatrix (matrixBuffer, vAssigns.x); wPos = worldTM * oPos; ... // in fragment shader color = materials[fAssigns.y].color; ...
  • 26. 26 setupSceneMatrixAndMaterialBuffer (scene); foreach (obj in scene) { if ( isVisible(obj) ) { setupDrawGeometryVertexBuffer (obj); // iterate over different materials used foreach ( batch in obj.materialCaches) { glVertexAttribI2i (indexAttr, batch.materialIndex, matrixIndex); glMultiDrawElements (GL_TRIANGLES, batch.counts, GL_UNSIGNED_INT , batch.offsets,batched.numUsed); } } } OpenGL 4.x approach
  • 27. 27 glVertexAttribDivisor == 0 : VArray[ gl_VertexID + baseVertex ] glVertexAttribDivisor != 0 : VArray[ gl_InstanceID / VDivisor + baseInstance ] VArray[ 0 / 1 + baseInstance ] Material & Matrix Index VertexBuffer (divisor:1) Position & Normal VertexBuffer (divisor:0) ... instanceCount = 1 baseInstance = 0 ... instanceCount = 1 baseInstance = 1 MultiDrawIndirect Buffer Per drawcall vertex attribute vertex attributes fetched for last vertex in second drawcall baseinstance = 1
  • 28. 28 OpenGL 4.2+ indirect approach ... foreach ( obj in scene.objects ) { ... // instead of glVertexAttribI2i calls and a loop // we use the baseInstance for the attribute // bind special assignment buffer as vertex attribute glBindBuffer ( GL_ARRAY_BUFFER, obj->assignBuffer); glVertexAttribIPointer (indexAttr, 2, GL_INT, . . . ); // draw everything in one go glMultiDrawElementsIndirect ( GL_TRIANGLES, GL_UNSIGNED_INT, obj->indirectOffset, obj->numIndirects, 0 ); }
  • 29. 29  ARB_vertex_attrib_binding (VAB) – Avoids many buffer changes – Separates format from data – Bind multiple vertex attributes to one buffer  NV_vertex_buffer_unified_ memory (VBUM) – Allows very fast switching through GPU pointers Vertex Setup /* setup once, similar to glVertexAttribPointer but with relative offset last */ glVertexAttribFormat (ATTR_NORMAL, 3, GL_FLOAT, GL_TRUE, offsetof(Vertex,normal)); glVertexAttribFormat (ATTR_POS, 3, GL_FLOAT, GL_FALSE, offsetof(Vertex,pos)); // bind to stream glVertexAttribBinding (ATTR_NORMAL, 0); glVertexAttribBinding (ATTR_POS, 0); // switch single stream buffer glBindVertexBuffer (0, bufID, 0, sizeof(Vertex)); // NV_vertex_buffer_unified_memory // enable once and set stride glEnableClientState (GL_VERTEX...NV);... glBindVertexBuffer (0, 0, 0, sizeof(Vertex)); // switch single buffer via pointer glBufferAddressRangeNV (GL_VERTEX...,0,bufADDR, bufSize);
  • 30. 30 0 200 400 600 800 1000 VBO VAB VAB+VBUM BINDLESS INDIRECT HOST VAB+VBUM BINDLESS INDIRECT GPU VAB+VBUM Timeinmicroseocnds[us] – Vertex/Index setup inside MultiDrawIndirect command NV_bindless_multidraw_indirect one GL call to draw entire scene GPU benefit depends on triangles per drawcall (> ~ 500) NV_bindless_multidraw_indirect ~ 2400 drawcalls, GL4 BATCHED style Lower is Better Effect on CPU time
  • 31. 31 0 1000 2000 3000 4000 5000 6000 K KB K KB K KB K KB GL4 INDIRECTHOST INDIVIDUAL GL4 INDIRECTGPU INDIVIDUAL GL4 BATCHED GL2 BATCHED Timeinmicroseconds[us] Bindless (green) always reduces CPU, and may help framerate/GPU a bit K = Kepler 5000, regular VBO KB = Kepler 5000, VBUM + VAB Lower is Better GPU CPU 99.000 2.400 hw drawcalls 2.000 2.400 sw drawcalls
  • 32. 32 0 1000 2000 3000 4000 5000 6000 K KB K KB K KB K KB GL4 INDIRECTHOST INDIVIDUAL GL4 INDIRECTGPU INDIVIDUAL GL4 BATCHED GL2 BATCHED Timeinmicroseconds[us] GPU CPU MultiDrawIndirect achieves almost 20 Mio drawcalls per second (2000 VBO changes, „only“ 1/3 perf lost). GPU-buffered commands save lots of CPU time Lower is Better 99.000 2.400 hw drawcalls 2.000 2.400 sw drawcalls Scene-dependent! INDIVIDUAL could be as fast if enough work per drawcall K = Kepler 5000, regular VBO KB = Kepler 5000, VBUM + VAB
  • 33. 33 0 1000 2000 3000 4000 5000 6000 K KB K KB K KB K KB GL4 INDIRECTHOST INDIVIDUAL GL4 INDIRECTGPU INDIVIDUAL GL4 BATCHED GL2 BATCHED Timeinmicroseconds[us] GL2 uniforms beat paletted UBO a bit in GPU, but are slower on CPU side. (1 glUniform call with 8x vec4, vs indexed UBO) Lower is Better GPU CPU Scene-dependent! GL4 better when more materials changed per object K = Kepler 5000, regular VBO KB = Kepler 5000, VBUM + VAB 99.000 2.400 hw drawcalls 2.000 2.400 sw drawcalls
  • 34. 34  Share geometry buffers for batching  Group parameters for fast updating  MultiDraw/Indirect for keeping objects independent or remove additional loops – baseInstance to provide unique index/assignments for drawcall  Bindless to reduce validation overhead/add flexibility Recap
  • 35. 35  GPU friendly processing – Matrix and bbox buffer, object buffer – XFB/Compute or „invisible“ rendering – Vs. old techniques: Single GPU job for ALL objects!  Results – „Readback“ GPU to Host  Can use GPU to pack into bit stream – „Indirect“ GPU to GPU  Set DrawIndirect‘s instanceCount to 0 or 1 GPU Culling Basics 0,1,0,1,1,1,0,0,0 buffer cmdBuffer{ Command cmds[]; }; ... cmds[obj].instanceCount = visible;
  • 36. 36  OpenGL 4.2+ – Depth-Pass – Raster „invisible“ bounding boxes  Disable Color/Depth writes  Geometry Shader to create the three visible box sides  Depth buffer discards occluded fragments (earlyZ...)  Fragment Shader writes output: visible[objindex] = 1 Occlusion Culling // GLSL fragment shader // from ARB_shader_image_load_store layout(early_fragment_tests) in; buffer visibilityBuffer{ int visibility[]; }; flat in int objID; void main(){ visibility[objID] = 1; } // buffer would have been cleared // to 0 before Passing bbox fragments enable object Algorithm by Evgeny Makarov, NVIDIA depth buffer
  • 37. 37  Exploit that majority of objects don‘t change much relative to camera  Draw each object only once (vertex/drawcall- bound) – Render last visible, fully shaded (last) – Test all against current depth: (visible) – Render newly added visible: none, if no spatial changes made (~last) & (visible) – (last) = (visible) Temporal Coherence frame: f – 1 frame: f last visible bboxes occluded bboxes pass depth (visible) new visible invisible visible camera camera moved
  • 38. 38 Culling Readback vs Indirect 0 500 1000 1500 2000 2500 readback indirect NVindirect Timeinmicroseconds[us] In the „draw new visible“ phase indirect cannot benefit of „nothing to setup/draw“ in advance, still processes „empty“ lists For readback results, CPU has to wait for GPU idle 37% faster with culling 33% faster with culling 37% faster with culling NV_bindless_ multidraw_indirect saves CPU and bit of GPU time Scene-dependent, i.e. triangles per drawcall and # of „invisible“Lower is Better GL4 BATCHED style GPU CPU
  • 39. 39  Temporal culling very useful for object/vertex-boundedness – Can also apply for Z-pass...  Readback vs Indirect – Readback variant „easier“ to be faster (no setups...), but syncs! – NV_bindless_multidraw benefit depends on scene (VBO changes and primitives per drawcall)  Working towards GPU autonomous system – (NV_bindless)/ARB_multidraw_indirect as mechanism for GPU creating its own work, research and feature work in progresss Culling Results
  • 40. 40  Thank you! – Contact  ckubisch@nvidia.com  matavenrath@nvidia.com glFinish();
  • 41. 41  Family of extensions to use native handles/addresses  NV_vertex_buffer_unified_memory  NV_bindless_multidraw_indirect  NV_shader_buffer_load/store – Pointers in GLSL  NV_bindless_texture – No more unit restrictions – References inside buffers NVIDIA Bindless Technology // GLSL with true pointers uniform MyStruct* mystructs; // API glUniformui64NV (bufferLocation, bufferADDR); texHDL = glGetTextureHandleNV (tex); // later instead of glBindTexture glUniformHandleui64NV (texLocation, texHDL) // GLSL // can also store textures in resources uniform materialBuffer { sampler2D manyTextures [LARGE]; }
  • 42. 42 0 500 1000 1500 2000 2500 Readback GPU Readback CPU Indirect GPU Indirect CPU NVIndirect GPU NVIndirect CPU Timeinmicroseconds 6. Draw New Visible 5. Update Internals 4. Occlusion Cull 3. Draw Last Visible 2. Update Internals 1. Frustum Cull Nothing „new“ to draw, but CPU doesn‘t know, still setting things up, GPU runs thru „empty“ cmd buffer For readback results, CPU has to wait for GPU idle Culling Readback vs Indirect 432 fps with culling 315 without 387 fps with culling 289 without 429 fps with culling 313 without Special bindless indirect version can save lots of CPU and a bit GPU costs for drawing the scene with a single big cmd buffer