2. ⶠCurrent working title
ⶠSci-Fi third person action RPG
ⶠExoskeletons!
ⶠIn-house engine âFledgeâ
ⶠMultiplatform
ⶠPC (D3D11)
ⶠXbox One
ⶠPS4
Deck13 - The Surge
3. What this talk is NOT about:
ⶠNovel rendering technique
ⶠAccurate physically based approach
ⶠHeavy math formulas
Disclaimer
4. What this talk IS about:
ⶠWhat worked for us
ⶠHow we approached the problem
ⶠShare ideas that can be used for other techniques
Disclaimer
10. ⶠPerformance
ⶠNot that much frame time to spare
ⶠMaximum budget allowed < 2 ms in worst case
ⶠStill other features to implement
ⶠParticularly true on Xbox One platform
ⶠQuality
ⶠPlausible BRDF match with our IBL
ⶠContact hardening reflections
ⶠAmbient specular occlusion approximation
ⶠNo aggressive masking based on roughness
Overview - What we wanted
11. ⶠCompute reflection vector from view direction
ⶠUse GBuffer normals
ⶠRay marching against depth buffer
ⶠIterate until ray âintersectsâ the depth buffer
ⶠUse hit coordinate to resolve reflection color
ⶠReproject from previous frame
Overview - Screen Space Reflections
Hit Point
13. ⶠTile classification
ⶠRay Marching
ⶠConvolve Scene
ⶠResolve Reflections
ⶠDeinterleave and Reproject
ⶠAsync Compute
Rendering - Overview
14. Rendering - Tile Classification
ⶠSome texels are not contributing
ⶠOther texels might require extra marching steps
ⶠDivide screen in 16x16 texel tiles
ⶠFast ray march
ⶠSparse ray distribution [Wronski14]
ⶠ64 rays in 16x16 texel
ⶠNon-uniform jittered
ⶠDifferent each frame to maximize coverage
ⶠEstimate tile ray hit variance
ⶠDiscard non contributing tiles
ⶠProduce GPU job queue
ⶠEncode tile data into uint32
ⶠAppend to GPU job queue
ⶠConsume later on with DispatchIndirect
(0, 4) (0,5) (0,6) (0, 8) ...
GPU job queue
15. ⶠNaive approach is simple but it is also slow
ⶠHi-Z is sexy but might have too much overhead
ⶠDepth sample distribution is a serious thing [McGuire14]
ⶠDonât forget youâre bound to screen space data
ⶠWhat about depth thickness?
ⶠAnd sampling coherency?
ⶠWhat else?
ⶠ(ăàČ çàČ )ăćœĄâ»ââ»
Rendering - Ray Marching Overview
16. ⶠRay march at lower resolution (720p, 900p)
ⶠInterleaved rendering
ⶠEven/Odd checkerboard pattern [El Mansouri16]
ⶠSuccessive passes work with interleaved data
ⶠUse low resolution depth buffer
ⶠLess bandwidth, better cache usage
ⶠNo big impact on quality
ⶠImportance sampling (GGX distributed rays)
ⶠFixed ray step count
ⶠLine segment intersection [Valient14][Timonen15]
ⶠJitter ray start time, reduce banding artifacts
ⶠNoise filtered out with temporal reprojection
ⶠProcess 4 depth values at time to hide VMEM latency (GCN)
ⶠOutput hit coordinate in a R10G10B10A2_UNORM target
Rendering - Ray Marching
A B C D
E F G H
I J K L
M N O P
B D
E G
J L
M O
A C
F H
I K
N P
Odd Frame
Checkerboard Pattern
Even Frame
17. Ray Hit Point (Interleaved) Attenuation mask (Interleaved)
18. ⶠBased on âScreen-Space Cone-Traced Reflectionsâ [Uludag14]
ⶠCreate convolved scene buffer mip chain
ⶠUse previous frame buffer
ⶠIncludes reflections
ⶠAccumulate multiple bounces
ⶠ7x7 separable blur in a single dispatch
ⶠDerive cone angle from roughness
ⶠBest fit to match IBL
ⶠAccumulate samples
ⶠUse roughness as weight factor
ⶠOn Consoles
ⶠCompute mip chain on same resource
ⶠAvoid unnecessary copies
ⶠSaves ~0.1 ms
Rendering - Convolve Scene And Resolve Reflections
21. ⶠBased on âScreen-Space Cone-Traced Reflectionsâ [Uludag14]
ⶠCreate convolved scene buffer mip chain
ⶠUse previous frame buffer
ⶠIncludes reflections
ⶠAccumulate multiple bounces
ⶠ7x7 separable blur in a single dispatch
ⶠDerive cone angle from roughness
ⶠBest fit to match IBL
ⶠAccumulate samples
ⶠUse roughness as weight factor
ⶠOn Consoles
ⶠCompute mip chain on same resource
ⶠAvoid unnecessary copies
ⶠSaves ~0.1 ms
Rendering - Convolve Scene And Resolve Reflections
22. ⶠDeinterleave samples into LDS (Local Data Share)
ⶠLoad samples into LDS
ⶠExtra samples required for reconstruct neighbour data
ⶠCombine reads with gather
ⶠReconstruct missing samples using neighbors
ⶠTemporal Reprojection
ⶠNeighbors color data already available in LDS âș
ⶠClamp history with 3x3 neighborhood AABB [Karis14]
ⶠUse reversible tone map operator to reduce fireflies [Karis13]
ⶠLocal Data Storage (Grandma's Home Remedy)
ⶠ"Careful With That Axe, Eugene"
ⶠStore separate RGB channels
ⶠPack two color channel into a single slot
Rendering - Deinterleave and Reproject
Loaded Samples into LDS
25. Async Compute - Dependencies
Tile Classification
Convolve Scene
Depth Buffer
Prev Frame Buffer
Deinterleave And Reproject Resolve Reflections
Ray Marching
Main dependencies:
ⶠDepth Buffer
ⶠAvailable after GBuffer
ⶠPrevious Frame Buffer
ⶠAvailable after scene combine
26. ⶠStart computing data in previous frame directly âș
ⶠAsync dispatch Convolve Scene right after scene is resolved
ⶠOverlaps mostly SAT and Post Process
ⶠBandwidth intensive, limit occupancy
ⶠAsync dispatch Tile Classification right after GBuffer
ⶠOverlaps Decal Rendering
ⶠHelps filling the holes in the pipeline
ⶠAsync dispatch Ray Marching
ⶠRemaining Passes
ⶠAsync Dispatch while Shadow Rendering
ⶠFind the right balance with Compute Lighting
ⶠDo not use CS if you can use PS instead!
ⶠOn PC D3D11, no async dispatch available
ⶠOn GCN, going through CB cache is generally faster [Persson14]
Async Compute - Dispatch
28. ⶠUsually few depth samples are enough
ⶠLine segment intersection works great!
ⶠThin objects require more samples
ⶠUse hybrid tracing algorithms [Stachowiak15]
ⶠInterleaved rendering is awesome!
ⶠEasy to use with other passes (e.g. SSAO)
ⶠGPU work queues can be useful
ⶠDispatch only required threads
ⶠCan overlap other Compute jobs (Console, D3D12, Vulkan, etc.)
ⶠReality check!
ⶠScreen space data inherited problems
ⶠExtremely easy to break
ⶠMaybe invest GPU time in something else? [Pettineo11]
Conclusions - What we learnt
29. Conclusions - Performance Table
Tile
Classification
Ray Marching Convolve
Scene
Resolve
Reflections
Deinterleave
and Reproject
Total
0.07 ms 0.21 ms 0.43 ms 0.41 ms 0.27 ms 1.39 ms
Xbox One, SSR @ 720p, (no ESRAM, No Async Compute)
30. References
[ElMonsouri16] Jalal El Mansouri, âRendering Rainbow Six Siegeâ, GDC, 2016
[Stachowiak15] Tomasz Stachowiak, âStochastic Screen-Space Reflectionsâ, SIGGRAPH, 2015
[Timonen15] Ari Silvennoinen and Ville Timonen, âMulti-Scale Global Illumination in Quantum Breakâ, SIGGRAPH, 2015
[McGuire14] Morgan McGuire and Michael Mara, âEfficient GPU Screen-Space Ray Tracingâ, JCGT, 2014
[Uludag14] Yasin Uludag, âHi-Z Screen-Space Cone-Traced Reflectionsâ, In GPU Pro 5, 2014
[Valiant14] Michal Valient, âReflections and Volumetrics of Killzone: Shadow Fallâ, SIGGRAPH, 2014
[Karis14] Brian Karis, âHigh-Quality Temporal Supersamplingâ, SIGGRAPH, 2014
[Wronski14] Bart Wronski, âAssassinâs Creed 4: Road to Next-gen Graphicsâ, GDC, 2014
[Persson14] Emil Persson, âLow-Level Shader Optimization for Next-Gen and DX11â, GDC, 2014
[Pettineo11] Matt Pettineo, â10 Things that need to die for Next-Genâ,
https://mynameismjp.wordpress.com/2011/12/06/things-that-need-to-die/
[Karis13] Brian Karis, âTone Mappingâ, http://graphicrants.blogspot.de/2013/12/tone-mapping.html