SlideShare a Scribd company logo
1 of 35
HOLY SMOKE!
FASTER PARTICLE RENDERING USING DIRECTCOMPUTE
AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM
GARETH THOMAS
2ND JUNE 2014
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2
PLAN FOR TODAY
 Simulation Overview
 Collisions
 Sorting
 Tiled Rendering
 Conclusions
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3
OVERVIEW
Why use the gpu for simulation?
‒Highly parallel workload
‒Free your CPU to do other cool stuff
‒Leverage compute
‒ Take advantage of the Local Data Store (LDS)
‒ Asynchronous compute on some platforms
MOTIVATION
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM4
OVERVIEW
 Emit
 Simulate
 Sort
 Render
‒ Rasterize billboards
‒ Tiled Rendering using DirectCompute
HOW TO BUILD A GPU PARTICLE SYSTEM
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5
SIMULATION OVERVIEW
HOW THE SIMULATION FITS TOGETHER
Simulate Compute Shader
Update Particles. Add alive ones to Alive List, add dead ones to Dead List
Dead List
Persistent list of particle indices
Alive List
List of alive particle indices. Rebuilt each frame by Simulation
CS
Emit Compute Shader
Reads free indices from dead list. Writes new
particle data into global array
Particle Array
Persistent list of particle indices
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6
COLLISIONS
 Can no longer use CPU-side physics engine for collisions
 Use depth buffer [Tchou11]
‒ Project particle into screen space and read depth buffer
‒ Project particle into view space
‒ Transform depth buffer value into view space and compare depths
 Generate collision response
‒ Use G-buffer normals
‒ Or take multiple depth samples to reconstruct the normal
A GPU-BASED SOLUTION
view space
P(n)
P(n+1)
thickness
Z
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7
COLLISIONS
 Only collides against geometry in the depth buffer
 Particles would collide against depth buffer even if they
are behind the geometry
‒ Use a thickness value to assume particles are in free space
behind geometry
 Particles don’t collide when they are off screen
‒ Causes issues when particles that are at rest on the floor have
gone off-screen and have now disappeared
‒ Put particles to sleep in the simulation once they have come to
rest
‒ Use G-buffer to mark parts of the scene that particles can sleep
on (static objects)
 Not Multi-GPU Friendly!
‒ Switch off depth buffer collisions in MGPU mode
PROBLEMS WITH USING THE DEPTH BUFFER
Fallen through world! 
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8
7 3 6 8 1 4 2 5
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2)
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2)
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9
2 51 46 87 3
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 2
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT (PASS 1)
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10
3 7 8 6 1 4 5 2
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT (PASS 2)
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11
3 6 8 7 5 4 1 2
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT (PASS 3)
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12
3 6 7 8 5 4 2 1
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 4
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT (PASS 4)
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13
3 4 2 1 5 6 7 8
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT (PASS 5)
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14
2 1 3 4 5 6 7 8
for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8
{
for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1
{
// Begin: GPU part of the sort
for each element n
n = selectBitonic(n, n^compareDist);
// End: GPU part of the sort
}
}
BITONIC SORT (PASS 6)
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15
Sorted Alive List
Vertex Shader
Read Particle Buffer
Geometry Shader
Expand one point to four. Billboard in view space.
Pixel Shader
Texturing and tinting. Depth fade for soft particles.
Particle Pool
RENDERING
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16
Sorted Alive List
Vertex Shader
Read particle buffer and billboard in view space
Pixel Shader
Texturing and tinting. Depth fade for soft particles.
Particle Pool
Index Buffer
RENDERING
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17
RENDERING
 The alive particle count is only available on the GPU
‒ Use Indirect API
 DrawInstancedIndirect( GPU-args ) for Geometry Shader billboards
‒ D3DPT_POINTLIST with no VB, IB or IA
‒ VertexId = Particle index
‒ VertexCountPerInstance = NumParticles
‒ InstanceCount = 1
‒ Geometry Shader expands the point into four vertices and a 2 triangle strip per billboard
 Or better still……. DrawIndexedInstancedIndirect( GPU-args )
‒ D3DPT_TRIANGLELIST, use IB
‒ VertexId / 4 = Particle index
‒ VertexId % 4 = Billboard corner index
‒ IndexCountPerInstance = NumParticles * 6
‒ InstanceCount = 1
RASTERIZATION – FOR OLD SCHOOL GPU PARTICLE SYSTEMS 
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18
RENDERING
 Overdraw from large particles kills game performance!
‒ Get artists to throttle back on the VFX 
 Optimizations
‒ Tightly fit polygons around texture [Persson09]
‒ Render to smaller buffer [Cantlay07]
‒ Sorting issues
‒ Loss of fidelity
PROBLEMS WITH RASTERIZATION 
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19
TILED RENDERING
 Inspired by Forward+ [Harada12]
‒ Screen-space binning of particles instead of
point lights!
 Use a 32x32 thread group to shade a 32x32
pixel tile in screen space
‒ Cull particles (just like Forward+)
‒ Sort particles
‒ Per pixel/thread
‒ Evaluate colour of each particle
‒ Blend together
‒ Composite back onto scene
OVERVIEW
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20
TILED RENDERING
1
2
3
[1] [1,2,3] [2,3]
 Divide screen into tiles
 Build index lists of intersecting
particles per tile
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21
TILED RENDERING
 View space asymmetric frustum
generated per tile
 Use camera’s near plane
 Use camera’s far plane
 Or calculate far plane from depth
buffer
Tile0 Tile1 Tile2 Tile3
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM22
TILED RENDERING
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM23
TILED RENDERING
 numthreads[ 32,32,1]
 Culling 1024 particles in parallel
 Add to LDS index list
 Write out to memory
‒ Particle count
‒ Particle indices
THREAD GROUP VIEW
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM24
TILED RENDERING
TILE COMPLEXITY
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM25
TILED RENDERING
 Cannot sort global list of particles
‒ Because 1024 particles get culled in parallel they get
added to visible list in arbitrary order
 Need to sort particles per-tile
‒ This is a good thing!
‒ Only need to sort a subset of the global list
‒ Sorting particles in single pass in LDS vs main memory
and in multiple passes
PER TILE BITONIC SORT
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM26
TILED RENDERING
 numthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen space
 Set accumulation colour to float4( 0, 0, 0, 0 )
 For each particle in tile (back to front)
‒ Evaluate particle contribution
‒ UV generation & radius check
‒ Texture lookup
‒ Normal generation and lighting
‒ Manually blend
‒ Colour = ( srcA x srcCol ) + ( invSrcA x destCol )
‒ Alpha = srcA + ( invSrcA x destA )
‒ Write result to screen size UAV
EVALUATING TILE COLOUR
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM27
TILED RENDERING
 numthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen space
 Set accumulation colour to float4( 0, 0, 0, 0 )
 For each particle in tile (front to back)
‒ Evaluate particle contribution
‒ UV generation & radius check
‒ Texture lookup
‒ Normal generation and lighting
‒ Manually blend [Bavoil08]
‒ Colour = ( invDestA x srcA x srcCol ) + destCol
‒ Alpha = srcA + ( invSrcA x destA )
‒ if ( accumulation alpha > threshold )
accumulation alpha = 1 and bail
‒ Write result to screen size UAV
EVALUATING TILE COLOUR – IMPROVED!!!
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM28
TILED RENDERING
 Bin particles into 8x8 grid
 For each particle
‒ For each bin
‒ Test particle against bin
‒ Add particle if visible
 UAV0 for particle indices (size = 8 x 8 x maxparticles)
‒ Array split into 64 bins using offsets
 UAV1 for storing particle count per bin (size = 8 x 8)
‒ 1 element per bin
‒ Use InterlockedAdd() to bump bin’s counter
COARSE CULLING
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM29
TILED RENDERING
COMPUTE SHADER SETUP
Per-bin particle indices
Per-tile sorted particle indices
Screen space colour buffer
Per-bin frustum planes
Per-tile particle indices and
distances
Particle data (position, radius,
colour etc)
Compute ShadersLDS Shader Output
Updated particle dataSimulation
numthreads[256, 1, 1], 1 thread per particle
Coarse Culling
numthreads[256, 1, 1], 1 thread per particle
Tile Culling and Sorting
numthreads[32, 32, 1], 1 thread per particle
Tile Rendering
numthreads[32, 32, 1], 1 thread per pixel
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM30
mode frame time (ms)*
Rasterization 5.2
Tiled 3.4
*AMD Radeon R9 290X @ 1080p
Breakdown frame time (ms)*
Simulation 0.50
Coarse Culling 0.06
Tile Culling and Sorting 0.37
Tiled Rendering 1.86
PERFORMANCE RESULTS
Default View, ~35K particles
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM31
mode frame time (ms)*
Rasterization 27.3
Tiled 6.2
*AMD Radeon R9 290X @ 1080p
PERFORMANCE RESULTS
In Smoke View, ~35K particles
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM32
CONCLUSIONS
 Depth buffer collisions
‒ Great bang-for-buck
‒ Not perfect!
 Bitonic sort
‒ Good fit for sorting on the GPU
 Tiled Rendering
‒ Faster than rasterization
‒ Great for combatting heavy overdraw
‒ More predictable behaviour
 Future work
‒ Add arbitrary geometry for OIT
‒ Volume tracing
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM33
QUESTIONS?
 Demo with full source coming soon
 http://developer.amd.com/tools/graphics-development/amd-radeon-sdk/
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM34
REFERENCES
 [Tchou11] Chris Tchou, “Halo Reach Effects Tech”, GDC 2011
 [Persson09] Emil Persson, http://www.humus.name/index.php?page=News&ID=266
 [Cantlay07] Iain Cantlay, “High-Speed, Off-Screen Particles”, GPU Gems 3 2007
 [Harada12] Takahiro Harada et al, “Forward+: Bringing Deferred Lighting to the Next Level”, Short Papers,
Eurographics 2012
 [Bavoil08] Louis Bavoil et al, “Order Independent Transparency with Dual Depth Peeling”, 2008
| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM35
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

More Related Content

What's hot

Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbiteElectronic Arts / DICE
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The SurgePhilip Hammer
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017Mark Kilgard
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsSeongdae Kim
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCQLOC
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizationspjcozzi
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarinePope Kim
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Unity Technologies
 

What's hot (20)

Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizations
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space Marine
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
 

Similar to Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas

GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86Droidcon Berlin
 
PlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge TechniquesPlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge TechniquesSlide_N
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Alexander Dolbilov
 
Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Matthias Trapp
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsTakahiro Harada
 
Efficient Image Processing - Nicolas Roard
Efficient Image Processing - Nicolas RoardEfficient Image Processing - Nicolas Roard
Efficient Image Processing - Nicolas RoardParis Android User Group
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Example uses of gpu compute models
Example uses of gpu compute modelsExample uses of gpu compute models
Example uses of gpu compute modelsPedram Mazloom
 
Tales from the Optimization Trenches - Unite Copenhagen 2019
Tales from the Optimization Trenches - Unite Copenhagen 2019Tales from the Optimization Trenches - Unite Copenhagen 2019
Tales from the Optimization Trenches - Unite Copenhagen 2019Unity Technologies
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 

Similar to Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas (20)

Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
PlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge TechniquesPlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge Techniques
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)
 
Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)
 
Praseed Pai
Praseed PaiPraseed Pai
Praseed Pai
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUs
 
Efficient Image Processing - Nicolas Roard
Efficient Image Processing - Nicolas RoardEfficient Image Processing - Nicolas Roard
Efficient Image Processing - Nicolas Roard
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Example uses of gpu compute models
Example uses of gpu compute modelsExample uses of gpu compute models
Example uses of gpu compute models
 
Tales from the Optimization Trenches - Unite Copenhagen 2019
Tales from the Optimization Trenches - Unite Copenhagen 2019Tales from the Optimization Trenches - Unite Copenhagen 2019
Tales from the Optimization Trenches - Unite Copenhagen 2019
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas

  • 1. HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2ND JUNE 2014
  • 2. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2 PLAN FOR TODAY  Simulation Overview  Collisions  Sorting  Tiled Rendering  Conclusions
  • 3. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3 OVERVIEW Why use the gpu for simulation? ‒Highly parallel workload ‒Free your CPU to do other cool stuff ‒Leverage compute ‒ Take advantage of the Local Data Store (LDS) ‒ Asynchronous compute on some platforms MOTIVATION
  • 4. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM4 OVERVIEW  Emit  Simulate  Sort  Render ‒ Rasterize billboards ‒ Tiled Rendering using DirectCompute HOW TO BUILD A GPU PARTICLE SYSTEM
  • 5. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5 SIMULATION OVERVIEW HOW THE SIMULATION FITS TOGETHER Simulate Compute Shader Update Particles. Add alive ones to Alive List, add dead ones to Dead List Dead List Persistent list of particle indices Alive List List of alive particle indices. Rebuilt each frame by Simulation CS Emit Compute Shader Reads free indices from dead list. Writes new particle data into global array Particle Array Persistent list of particle indices
  • 6. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6 COLLISIONS  Can no longer use CPU-side physics engine for collisions  Use depth buffer [Tchou11] ‒ Project particle into screen space and read depth buffer ‒ Project particle into view space ‒ Transform depth buffer value into view space and compare depths  Generate collision response ‒ Use G-buffer normals ‒ Or take multiple depth samples to reconstruct the normal A GPU-BASED SOLUTION view space P(n) P(n+1) thickness Z
  • 7. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7 COLLISIONS  Only collides against geometry in the depth buffer  Particles would collide against depth buffer even if they are behind the geometry ‒ Use a thickness value to assume particles are in free space behind geometry  Particles don’t collide when they are off screen ‒ Causes issues when particles that are at rest on the floor have gone off-screen and have now disappeared ‒ Put particles to sleep in the simulation once they have come to rest ‒ Use G-buffer to mark parts of the scene that particles can sleep on (static objects)  Not Multi-GPU Friendly! ‒ Switch off depth buffer collisions in MGPU mode PROBLEMS WITH USING THE DEPTH BUFFER Fallen through world! 
  • 8. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8 7 3 6 8 1 4 2 5 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT
  • 9. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9 2 51 46 87 3 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 2 { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT (PASS 1)
  • 10. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10 3 7 8 6 1 4 5 2 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4 { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT (PASS 2)
  • 11. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11 3 6 8 7 5 4 1 2 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4 { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT (PASS 3)
  • 12. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12 3 6 7 8 5 4 2 1 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8 { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 4 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT (PASS 4)
  • 13. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13 3 4 2 1 5 6 7 8 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8 { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT (PASS 5)
  • 14. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14 2 1 3 4 5 6 7 8 for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8 { for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort } } BITONIC SORT (PASS 6)
  • 15. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15 Sorted Alive List Vertex Shader Read Particle Buffer Geometry Shader Expand one point to four. Billboard in view space. Pixel Shader Texturing and tinting. Depth fade for soft particles. Particle Pool RENDERING
  • 16. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16 Sorted Alive List Vertex Shader Read particle buffer and billboard in view space Pixel Shader Texturing and tinting. Depth fade for soft particles. Particle Pool Index Buffer RENDERING
  • 17. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17 RENDERING  The alive particle count is only available on the GPU ‒ Use Indirect API  DrawInstancedIndirect( GPU-args ) for Geometry Shader billboards ‒ D3DPT_POINTLIST with no VB, IB or IA ‒ VertexId = Particle index ‒ VertexCountPerInstance = NumParticles ‒ InstanceCount = 1 ‒ Geometry Shader expands the point into four vertices and a 2 triangle strip per billboard  Or better still……. DrawIndexedInstancedIndirect( GPU-args ) ‒ D3DPT_TRIANGLELIST, use IB ‒ VertexId / 4 = Particle index ‒ VertexId % 4 = Billboard corner index ‒ IndexCountPerInstance = NumParticles * 6 ‒ InstanceCount = 1 RASTERIZATION – FOR OLD SCHOOL GPU PARTICLE SYSTEMS 
  • 18. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18 RENDERING  Overdraw from large particles kills game performance! ‒ Get artists to throttle back on the VFX   Optimizations ‒ Tightly fit polygons around texture [Persson09] ‒ Render to smaller buffer [Cantlay07] ‒ Sorting issues ‒ Loss of fidelity PROBLEMS WITH RASTERIZATION 
  • 19. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19 TILED RENDERING  Inspired by Forward+ [Harada12] ‒ Screen-space binning of particles instead of point lights!  Use a 32x32 thread group to shade a 32x32 pixel tile in screen space ‒ Cull particles (just like Forward+) ‒ Sort particles ‒ Per pixel/thread ‒ Evaluate colour of each particle ‒ Blend together ‒ Composite back onto scene OVERVIEW
  • 20. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20 TILED RENDERING 1 2 3 [1] [1,2,3] [2,3]  Divide screen into tiles  Build index lists of intersecting particles per tile
  • 21. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21 TILED RENDERING  View space asymmetric frustum generated per tile  Use camera’s near plane  Use camera’s far plane  Or calculate far plane from depth buffer Tile0 Tile1 Tile2 Tile3
  • 22. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM22 TILED RENDERING
  • 23. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM23 TILED RENDERING  numthreads[ 32,32,1]  Culling 1024 particles in parallel  Add to LDS index list  Write out to memory ‒ Particle count ‒ Particle indices THREAD GROUP VIEW
  • 24. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM24 TILED RENDERING TILE COMPLEXITY
  • 25. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM25 TILED RENDERING  Cannot sort global list of particles ‒ Because 1024 particles get culled in parallel they get added to visible list in arbitrary order  Need to sort particles per-tile ‒ This is a good thing! ‒ Only need to sort a subset of the global list ‒ Sorting particles in single pass in LDS vs main memory and in multiple passes PER TILE BITONIC SORT
  • 26. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM26 TILED RENDERING  numthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen space  Set accumulation colour to float4( 0, 0, 0, 0 )  For each particle in tile (back to front) ‒ Evaluate particle contribution ‒ UV generation & radius check ‒ Texture lookup ‒ Normal generation and lighting ‒ Manually blend ‒ Colour = ( srcA x srcCol ) + ( invSrcA x destCol ) ‒ Alpha = srcA + ( invSrcA x destA ) ‒ Write result to screen size UAV EVALUATING TILE COLOUR
  • 27. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM27 TILED RENDERING  numthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen space  Set accumulation colour to float4( 0, 0, 0, 0 )  For each particle in tile (front to back) ‒ Evaluate particle contribution ‒ UV generation & radius check ‒ Texture lookup ‒ Normal generation and lighting ‒ Manually blend [Bavoil08] ‒ Colour = ( invDestA x srcA x srcCol ) + destCol ‒ Alpha = srcA + ( invSrcA x destA ) ‒ if ( accumulation alpha > threshold ) accumulation alpha = 1 and bail ‒ Write result to screen size UAV EVALUATING TILE COLOUR – IMPROVED!!!
  • 28. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM28 TILED RENDERING  Bin particles into 8x8 grid  For each particle ‒ For each bin ‒ Test particle against bin ‒ Add particle if visible  UAV0 for particle indices (size = 8 x 8 x maxparticles) ‒ Array split into 64 bins using offsets  UAV1 for storing particle count per bin (size = 8 x 8) ‒ 1 element per bin ‒ Use InterlockedAdd() to bump bin’s counter COARSE CULLING
  • 29. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM29 TILED RENDERING COMPUTE SHADER SETUP Per-bin particle indices Per-tile sorted particle indices Screen space colour buffer Per-bin frustum planes Per-tile particle indices and distances Particle data (position, radius, colour etc) Compute ShadersLDS Shader Output Updated particle dataSimulation numthreads[256, 1, 1], 1 thread per particle Coarse Culling numthreads[256, 1, 1], 1 thread per particle Tile Culling and Sorting numthreads[32, 32, 1], 1 thread per particle Tile Rendering numthreads[32, 32, 1], 1 thread per pixel
  • 30. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM30 mode frame time (ms)* Rasterization 5.2 Tiled 3.4 *AMD Radeon R9 290X @ 1080p Breakdown frame time (ms)* Simulation 0.50 Coarse Culling 0.06 Tile Culling and Sorting 0.37 Tiled Rendering 1.86 PERFORMANCE RESULTS Default View, ~35K particles
  • 31. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM31 mode frame time (ms)* Rasterization 27.3 Tiled 6.2 *AMD Radeon R9 290X @ 1080p PERFORMANCE RESULTS In Smoke View, ~35K particles
  • 32. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM32 CONCLUSIONS  Depth buffer collisions ‒ Great bang-for-buck ‒ Not perfect!  Bitonic sort ‒ Good fit for sorting on the GPU  Tiled Rendering ‒ Faster than rasterization ‒ Great for combatting heavy overdraw ‒ More predictable behaviour  Future work ‒ Add arbitrary geometry for OIT ‒ Volume tracing
  • 33. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM33 QUESTIONS?  Demo with full source coming soon  http://developer.amd.com/tools/graphics-development/amd-radeon-sdk/
  • 34. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM34 REFERENCES  [Tchou11] Chris Tchou, “Halo Reach Effects Tech”, GDC 2011  [Persson09] Emil Persson, http://www.humus.name/index.php?page=News&ID=266  [Cantlay07] Iain Cantlay, “High-Speed, Off-Screen Particles”, GPU Gems 3 2007  [Harada12] Takahiro Harada et al, “Forward+: Bringing Deferred Lighting to the Next Level”, Short Papers, Eurographics 2012  [Bavoil08] Louis Bavoil et al, “Order Independent Transparency with Dual Depth Peeling”, 2008
  • 35. | FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM35 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Editor's Notes

  1. Add pic