SlideShare a Scribd company logo
1 of 64
Download to read offline
The Rendering Technologies of
Tiago Sousa Carsten Wenzel Chris Raine
R&D Principal Graphics Engineer R&D Lead Software Engineer R&D Senior Software Engineer
Crytek
Thin G-Buffer 2.0
● For Crysis 3, wanted:
● Minimize redundant drawcalls
● AB details on G-Buffer with proper glossiness
● Tons of vegetation => Deferred translucency
● Multiplatform friendly
Thin G-Buffer 2.0
Channels Format
Depth AmbID, Decals D24S8
N.x N.y Gloss, Zsign Translucency A8B8G8R8
Albedo Y Albedo Cb,Cr Specular Y Per-Project A8B8G8R8
Target Image
Depth
RG: Normals
B: Glossiness
A: Translucency
R: Albedo Y
G: Albedo CbCr (interleaved)
B: Specular intensity
G-Buffer Packing
 World space normal packed into 2 components (WIKI00)
 Stereographic projection worked ok in practice (also cheap)
 Glossiness + Normal Z sign packed together
z
y
z
x
YX
1
,
1
),( 22
22
2222
X1
1
,
X1
2
,
X1
2
z)y,(x,
Y
YX
Y
Y
Y
X
5.05.0)( ZsignGlossGlossZsign
G-Buffer Packing (2)
 Albedo in Y’CbCr color space (WIKI01)
 Stored in 2 channels via Chrominance Subsampling (WIKI02)
)081.0418.05.0(5.0
5.0331.0168.05.0
114.0587.0299.0'
BGRC
BGRC
BGRY
R
B
)5.0(772.1'
)5.0(714.0)5.0(344.0'
)5.0(402.1'
B
RB
R
CYB
CCYG
CYR
Hybrid Deferred Rendering
 Deferred lighting still processed as usual (SOUSA11)
 L-Buffers now using BW friendlier R11G11B10F formats
 Precision was sufficient, since material properties not applied yet
 Deferred shading composited via fullscreen pass
 For more complex shading such as Hair or Skin, process forward passes
 Allowed us to drop almost all opaque forward passes
 Less Drawcalls, but G-Buffer passes now with higher cost
 Fast Double-Z Prepass for some of the closest geometry helps slightly
 Overall was nice win, on all platforms*
Hybrid Deferred Rendering (2)
Deferred (Red) + Forward (Green)
Thin G-Buffer Benefits
 Unified solution across all platforms
 Deferred Rendering for less BW/Memory than vanilla
 Good for MSAA + avoiding tiled rendering on Xbox360
 Tackle glossiness for transparent geometry on G-Buffer
 Alpha blended cases, e.g. Decals, Deferred Decals, Terrain Layers
 Can composite all such cases directly into G-Buffer
 Avoid need for multipass
 Deferred sub-surface scattering
 Visual + performance win, in particular for vegetation rendering
Thin G-Buffer Hindsights
 Why not pack G-Buffer directly?
 Because we need to be able to blend details into G-Buffer
 Would need to decode –> blend –> encode
 Or could blend such cases into separate targets (bad for MSAA/Consoles)
 Programmable blending would have been nice
 Transparent cases can’t use alpha channel for store*
 sRGB output only for couple channels or all
 Would allow for more interesting and optimal packing schemes
 While at it, stencil write from fragment shader would also be handy
Volumetric Fog Updates
 Density calculation based on fog model established for
Crysis 1 (WENZEL06)
 Deferred pass for opaque geometry
 Per-Vertex approximation for transparent geometry
Volumetric Fog Updates
 Little tuning: Artist controllable gradients (via ToD tool)
 Height based: Density and color for specified top and bottom height
 Radial based: Size, color and lobe around sun position
Volumetric Fog Shadows
 Based on TÓTH09: Don’t accumulate in-scattered light but
shadow contribution along view ray instead
Volumetric fog shadows
 Interleave pass distributes 1024 shadow samples on a 8x8
grid shared by neighboring pixels
 Half resolution destination target
 Gather pass computes final shadow value
 Bilateral filtering was used to minimize ghosting and halos
 Shadow stored in alpha, 8 bit depth in red channel
 Used 8 taps to compare against center full resolution depth
 Max sample distance configurable (~150-200m in C3 levels)
 Cloud shadow texture baked into final result
 Final result modifies fog height and radial color
Naive Upscale
Bilateral Upscale
Silhouette POM
Silhouette POM
 Alternative to tessellation based displacement mapping
 Looked into various approaches, most weren’t practical for production
 Current implementation is based on principle of barycentric
correspondence (JESCHKE07)
Silhouette POM: Steps
 Transform vertices and extrude - VS
 Generate prisms (do not split into tetrahedral) and setup clip planes - GS
 Generally prism sides are bilinear patches, we approximate by a
conservative plane
 Note to IHVs: Emitting per-triangle constants would be nice!
 In theory, on DX11.1, we could emit via UAV output?
 Ray marching - PS
 Compute intersection of view ray with prism in WS, translate to texture
space via (Jeschke07) barycentric correspondence
 Use resulting texture uv and height for entry and exit to trace height field
 Compute final uv and selectively discard pixel (viewer below height map; view
ray leaving prism before hitting terrain)
 Lots of pressure on PS, yet GS is the bottleneck (prism gen)
Silhouette POM
Silhouette POM
Massive Grass
Massive Grass: Simulation
 Grass blade instance:
 A chain of points held together by constraints
 Distance + bending constrains to try maintain local space rest pose
angle per-particle
 Physics collision geometry converted into small sphere set
 Collisions handled as plane constrains
 No stable collision handling, overdamp the instance
 Applied to vegetation meshes via software-skinning
 Exposed parameters per group:
 Stiffness, damping, wind force factor, random variance
Massive Grass: Simulation
Massive Grass: Simulation
Massive Grass: Simulation
Massive Grass: Simulation
Massive Grass: Mesh Merging
 One patch results in N-Meshes
 N is number of materials used
 Instances grouped into 16x16x16 meter patches (yes, volumetric)
 Typical Numbers:
 50k – 70k visible instances on consoles. PC > 100k
 Instances have 18 to 3.6k vertices depending on mesh complexity
 Closest instances simulated every frame
 Based on distance: simulation and time sliced skinning
 Instances removed further away
Massive Grass: Mesh Merging
Massive Grass: Update Loop
 Culling process (for each visible patch):
 Mark visible instances
 Compute LOD
 Check if instance should be skipped in distance
 After culling:
 Allocate (from pool) dynamic VB/IB memory for each patch
 Sample force fields into per-patch buffer (coarse discretization 4x4x4)
 Sample physics for potential colliders, extract collider geometry
 Dispatch sim & skin jobs for each patch
Massive Grass: Challenges
 Efficient buffer management
 Resulting meshes can vary in size per frame
 Naive implementation (C2) resulted in bad perf on PC and out of vram
on consoles due to fragmentation
 Current implementation inspired by “Don’t Throw it all Away” (McDONALD12)
 Large pools for dynamic IB/VB
 Each maintains two free lists (usable and pending)
 Each item in pending list is moved to main free list as soon as GPU
query guarantees GPU done with pool
 1.3 MB consoles main memory and PC 16 MB
Massive Grass: Challenges (2)
 Efficient scheduling:
 Patch instances are divided into small groups
 Sim job kicked off for each group in main thread
 DP in render thread has blocking wait for sim job
 Job considered low-priority
 Important:
 Avoid unnecessary copies, skin directly to final destination
 Reduce throughput and memory requirements (used half & fixed point
precision everywhere)
 PC: ~15 ms, 300 to 600 jobs on worst case scenarios
 Xbox360 ~16ms, 800 jobs; PS3 ~10ms, 100-400 jobs
Massive Grass: Challenges (3)
 Alpha tested geometry, literaly everywhere
 Massive overdraw, also troublesome for MSAA
 Literaly worst case scenario for RSX due to poor z-cull
 Prototyped alternatives (e.g. geometry based)
 Art was not happy with these unfortunately
 End solution: keep it simple
 G-Buffer stage minimalistic
 Consoles: Mostly outputting vertex data
 Art side surface coverage minimization
Anti-aliasing
 Subjective topic: Sharp VS Blurry
 Some PC gamers hate blurry, some hate sharp.
 Some even love 800x600 and no AA
DX11 Deferred MSAA: 101
 The problem:
 Multiple passes and reading/writing from Multisampled Render Targets
 SV_SampleIndex / SV_Coverage system value semantics allow to solve
via multipass for pixel/sample frequency passes (Thibieroz08)
 SV_SampleIndex
 Forces pixel shader execution for each sub-sample
 SV_SampleIndex provides index of the sub-sample currently executed
 Index can be used to fetch sub-sample from your Multisampled RT
 E.g. FooMS.Load( UnnormScreenCoord, nCurrSample)
 SV_Coverage
 Indicates to pixel shader which sub-samples covered during raster stage
 Can also modify sub-sample coverage for custom coverage mask
DX11 Deferred MSAA
 Foundation for almost all our supported AA techniques
 Simple theory => troublesome practice
 At least with fairly complex and deferred based engines
 Disclaimer:
 Non-MSAA friendly code accumulates fast
 Breaks regularly as new techniques added with no care for MSAA
 Pinpoint non-msaa friendly techniques, and update them one by one.
 Rinse and repeat and you’ll get there eventually.
 Will be enforced by default on our future engine versions
Custom Resolve & Per-Sample Mask
 Post G-Buffer, perform a custom msaa resolve:
 Outputs sample 0 for lighting/other msaa dependent passes
 Creates sub-sample mask on same pass, rejecting similar samples
 Tag stencil with sub-sample mask
 How to combine with existing complex techniques that
might be using Stencil Buffer already?
 Reserve 1 bit from stencil buffer
 Update it with sub-sample mask
 Make usage of stencil read/write bitmask to avoid bit override
 Restore whenever a stencil clear occurs
SV_Coverage
Custom Per-Sample Mask
Final Result
Pixel/Sample Frequency Passes
 Ensure disabling sample bit override via stencil write mask
 StencilWriteMask = 0x7F
 Pixel Frequency Passes
 Set stencil read mask to reserved bits for per-pixel regions (~0x80)
 Bind pre-resolved (non-multisampled) targets SRVs
 Render pass as usual
 Sample Frequency Passes
 Set stencil read mask to reserved bit for per-sample regions (0x80)
 Bind multisampled targets SRVs
 Index current sub-sample via SV_SAMPLEINDEX
 Render pass as usual
Alpha Test Super-Sampling
● Alpha testing is a special case
● Default SV_Coverage only applies to triangle edges
● Create your own sub-sample coverage mask
● E.g. check if current sub-sample AT or not and set bit
// 2 thumbs up for standardized MSAA offsets on DX11 (and even documented!)
static const float2 vMSAAOffsets[2] = {float2(0.25, 0.25),float2(-0.25,-0.25)};
const float2 vDDX = ddx(vTexCoord.xy);
const float2 vDDY = ddy(vTexCoord.xy);
[unroll] for(int s = 0; s < nSampleCount; ++s)
{
float2 vTexOffset = vMSAAOffsets[s].x * vDDX + vMSAAOffsets[s].y * vDDY;
float fAlpha = tex2D(DiffuseSmp, vTexCoord + vTexOffset).w;
uCoverageMask |= ((fAlpha-fAlphaRef) >= 0)? (uint(0x1)<<i) : 0;
}
Alpha Test Super-Sampling
Alpha Test SSAA Disabled
Alpha Test Super-Sampling
Alpha Test SSAA Enabled
Corner Cases
 Cascades sun shadow maps:
 Doing it “by the book” gets expensive quickly
 Render shadows as usual at pixel frequency
 Bilateral upscale during deferred shading
composite pass
Corner Cases
 Soft particles (or similar techniques accessing depth):
 Recommendation to tackle via per-sample frequency is quite slow on
real world scenarios
 Max Depth instead works quite ok for most cases and N-times faster
Bad Good
MSAA Friendliness
 MSAA unfriendly techniques, the usual suspects:
 No AA at all or noticeable bright/dark silhouettes
Bad Good
MSAA Friendliness
 MSAA unfriendly techniques, the usual suspects:
 No AA at all or noticeable bright/dark silhouettes
Bad Good
MSAA Friendliness
 Rules of thumb:
 Accessing and/or rendering to Multisampled Render Targets?
 Then you’ll need to care about accessing/outputting correct sub-sample
 Obviously, always minimize BW – avoid fat formats
 The later is always valid, but even more for MSAA cases
MSAA Correctness vs Performance
 Our goal was correctness and quality over performance
 You can always cut some corners as most games doing:
 Alpha to Coverage instead of Alpha Test Super-Sampling
 Or even no Alpha Test AA
 Render only opaque with MSAA
 Then render alpha blended passes withouth MSAA
 Assuming HDR rendering: note that tone mapping is implicitly done post-
resolve resulting is loss of detail on high contrast regions
 Note to IHVs: Having explicit access to HW capabilities
such as EQAA/CSAA would be nice
 Smarter AA combos
Conclusion
● What’s next for CryENGINE ?
● A Big Next Generation leap is finally upon us
● In 2 years time, GPUs will be at ~16 TFLOPS and ridiculous amount
of available memory.
●Extrapolate results from there, without >8 year old consoles slowing progress 
● 4k resolution will bring some interesting challenges/opportunities
● Call to arms - still a lot of problems to solve
● IHVs/Microsoft: PC GPU profilers have a lot to evolve! How about a
unified GPU Profiler, working great for all IHVs?
● Microsoft: Sup with DX11 (lack of) documentation? Where’s DX12?
● You: No great realtime GI / realtime reflections solution yet!
Special Thanks
● Nicolas Thibieroz
● Chris Auty, Carsten Wenzel, Chris Raine, Chris Bolte,
Baldur Karlsson, Andrew Khan, Michael Kopietz, Ivo Zoltan
Frey, Desmond Gayle, Marco Corbetta, Jake Turner, Pierre-
Ives Donzallaz, Magnus Larbrant, Nicolas Schulz, Nick
Kasyan, Vladimir Kajalin..
Uff… lets just make it shorter:
Thanks to the entire Crytek Team ^_^
Questions?
● Tiago@Crytek.com / Twitter: Crytek_Tiago
● Carsten@Crytek.com
● ChristopherR@Crytek.com / Twitter: Cry_Raine
Where are hiring !
References
 WENZEL06 – Wenzel, C. “Real-time Atmospheric Effects in Games”, 2006
 JESCHKE07 - Jeschke, S. et al. “Interactive Smooth and Curved Shell Mapping”, 2007
 THIBIEROZ08 – Thibieroz, N. “Deferred Shading with Multisampling Anti-Aliasing in DirectX10”, 2008
 TÓTH09 – Tóth, B. et al. “Real-time Volumetric Lighting in Participating Media”, 2009
 SOUSA11 - Sousa, T. “CryENGINE 3 Rendering Techniques”, 2011
 McDONALD12 – McDonald, J. “Don’t Throw it all Away”, 2012
 WIKI00 – “Stereographic projection”, http://en.wikipedia.org/wiki/Stereographic_projection
 WIKI01 – “Y’CbCr”, http://en.wikipedia.org/wiki/YCbCr
 WIKI02– “Chroma subsampling”, http://en.wikipedia.org/wiki/Chroma_subsampling
Extra Slides
Massive Grass: Challenges
 Trick: Updating allocation done with Copy-On-Write in case
GPU still using original location
 Consoles: incrementally defragment pools with GPU memory
copies
 Also possible on PC, but more expensive due to CopySubResource
limitations (need scratchpad memory, since CSR won’t allow copies
where Dst/Src are same resource)
 Note to IHVs: Being able to copy from same Dst/Src resource, if non-
overlapping memory regions, would be handy
 Ended up using allocation & usage scheme for static
geometry as well

More Related Content

What's hot

Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lightingozlael ozlael
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...Codemotion
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarinePope Kim
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenBenjamin Glatzel
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2Guerrilla
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Philip Hammer
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderEidos-Montréal
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
 

What's hot (20)

Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space Marine
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the Fallen
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Lighting you up in Battlefield 3
Lighting you up in Battlefield 3Lighting you up in Battlefield 3
Lighting you up in Battlefield 3
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 

Similar to Rendering Technologies from Crysis 3 (GDC 2013)

Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsElectronic Arts / DICE
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360VIKAS SINGH BHADOURIA
 
Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Matthias Trapp
 
Masked Software Occlusion Culling
Masked Software Occlusion CullingMasked Software Occlusion Culling
Masked Software Occlusion CullingIntel® Software
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-featuresRaimundo Renato
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11smashflt
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
Advanced Lighting for Interactive Applications
Advanced Lighting for Interactive ApplicationsAdvanced Lighting for Interactive Applications
Advanced Lighting for Interactive Applicationsstefan_b
 
Look Ma, No Jutter! Optimizing Performance Across Oculus Mobile
Look Ma, No Jutter! Optimizing Performance Across Oculus MobileLook Ma, No Jutter! Optimizing Performance Across Oculus Mobile
Look Ma, No Jutter! Optimizing Performance Across Oculus MobileUnity Technologies
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1qpqpqp
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Tiago Sousa
 

Similar to Rendering Technologies from Crysis 3 (GDC 2013) (20)

Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Xbox
XboxXbox
Xbox
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)
 
Masked Software Occlusion Culling
Masked Software Occlusion CullingMasked Software Occlusion Culling
Masked Software Occlusion Culling
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-features
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Advanced Lighting for Interactive Applications
Advanced Lighting for Interactive ApplicationsAdvanced Lighting for Interactive Applications
Advanced Lighting for Interactive Applications
 
Look Ma, No Jutter! Optimizing Performance Across Oculus Mobile
Look Ma, No Jutter! Optimizing Performance Across Oculus MobileLook Ma, No Jutter! Optimizing Performance Across Oculus Mobile
Look Ma, No Jutter! Optimizing Performance Across Oculus Mobile
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3
 

Recently uploaded

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 

Recently uploaded (20)

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 

Rendering Technologies from Crysis 3 (GDC 2013)

  • 1. The Rendering Technologies of Tiago Sousa Carsten Wenzel Chris Raine R&D Principal Graphics Engineer R&D Lead Software Engineer R&D Senior Software Engineer Crytek
  • 2. Thin G-Buffer 2.0 ● For Crysis 3, wanted: ● Minimize redundant drawcalls ● AB details on G-Buffer with proper glossiness ● Tons of vegetation => Deferred translucency ● Multiplatform friendly
  • 3. Thin G-Buffer 2.0 Channels Format Depth AmbID, Decals D24S8 N.x N.y Gloss, Zsign Translucency A8B8G8R8 Albedo Y Albedo Cb,Cr Specular Y Per-Project A8B8G8R8
  • 10. G: Albedo CbCr (interleaved)
  • 12. G-Buffer Packing  World space normal packed into 2 components (WIKI00)  Stereographic projection worked ok in practice (also cheap)  Glossiness + Normal Z sign packed together z y z x YX 1 , 1 ),( 22 22 2222 X1 1 , X1 2 , X1 2 z)y,(x, Y YX Y Y Y X 5.05.0)( ZsignGlossGlossZsign
  • 13. G-Buffer Packing (2)  Albedo in Y’CbCr color space (WIKI01)  Stored in 2 channels via Chrominance Subsampling (WIKI02) )081.0418.05.0(5.0 5.0331.0168.05.0 114.0587.0299.0' BGRC BGRC BGRY R B )5.0(772.1' )5.0(714.0)5.0(344.0' )5.0(402.1' B RB R CYB CCYG CYR
  • 14. Hybrid Deferred Rendering  Deferred lighting still processed as usual (SOUSA11)  L-Buffers now using BW friendlier R11G11B10F formats  Precision was sufficient, since material properties not applied yet  Deferred shading composited via fullscreen pass  For more complex shading such as Hair or Skin, process forward passes  Allowed us to drop almost all opaque forward passes  Less Drawcalls, but G-Buffer passes now with higher cost  Fast Double-Z Prepass for some of the closest geometry helps slightly  Overall was nice win, on all platforms*
  • 15. Hybrid Deferred Rendering (2) Deferred (Red) + Forward (Green)
  • 16. Thin G-Buffer Benefits  Unified solution across all platforms  Deferred Rendering for less BW/Memory than vanilla  Good for MSAA + avoiding tiled rendering on Xbox360  Tackle glossiness for transparent geometry on G-Buffer  Alpha blended cases, e.g. Decals, Deferred Decals, Terrain Layers  Can composite all such cases directly into G-Buffer  Avoid need for multipass  Deferred sub-surface scattering  Visual + performance win, in particular for vegetation rendering
  • 17. Thin G-Buffer Hindsights  Why not pack G-Buffer directly?  Because we need to be able to blend details into G-Buffer  Would need to decode –> blend –> encode  Or could blend such cases into separate targets (bad for MSAA/Consoles)  Programmable blending would have been nice  Transparent cases can’t use alpha channel for store*  sRGB output only for couple channels or all  Would allow for more interesting and optimal packing schemes  While at it, stencil write from fragment shader would also be handy
  • 18. Volumetric Fog Updates  Density calculation based on fog model established for Crysis 1 (WENZEL06)  Deferred pass for opaque geometry  Per-Vertex approximation for transparent geometry
  • 19. Volumetric Fog Updates  Little tuning: Artist controllable gradients (via ToD tool)  Height based: Density and color for specified top and bottom height  Radial based: Size, color and lobe around sun position
  • 20. Volumetric Fog Shadows  Based on TÓTH09: Don’t accumulate in-scattered light but shadow contribution along view ray instead
  • 21. Volumetric fog shadows  Interleave pass distributes 1024 shadow samples on a 8x8 grid shared by neighboring pixels  Half resolution destination target  Gather pass computes final shadow value  Bilateral filtering was used to minimize ghosting and halos  Shadow stored in alpha, 8 bit depth in red channel  Used 8 taps to compare against center full resolution depth  Max sample distance configurable (~150-200m in C3 levels)  Cloud shadow texture baked into final result  Final result modifies fog height and radial color
  • 25. Silhouette POM  Alternative to tessellation based displacement mapping  Looked into various approaches, most weren’t practical for production  Current implementation is based on principle of barycentric correspondence (JESCHKE07)
  • 26. Silhouette POM: Steps  Transform vertices and extrude - VS  Generate prisms (do not split into tetrahedral) and setup clip planes - GS  Generally prism sides are bilinear patches, we approximate by a conservative plane  Note to IHVs: Emitting per-triangle constants would be nice!  In theory, on DX11.1, we could emit via UAV output?  Ray marching - PS  Compute intersection of view ray with prism in WS, translate to texture space via (Jeschke07) barycentric correspondence  Use resulting texture uv and height for entry and exit to trace height field  Compute final uv and selectively discard pixel (viewer below height map; view ray leaving prism before hitting terrain)  Lots of pressure on PS, yet GS is the bottleneck (prism gen)
  • 30. Massive Grass: Simulation  Grass blade instance:  A chain of points held together by constraints  Distance + bending constrains to try maintain local space rest pose angle per-particle  Physics collision geometry converted into small sphere set  Collisions handled as plane constrains  No stable collision handling, overdamp the instance  Applied to vegetation meshes via software-skinning  Exposed parameters per group:  Stiffness, damping, wind force factor, random variance
  • 35. Massive Grass: Mesh Merging  One patch results in N-Meshes  N is number of materials used  Instances grouped into 16x16x16 meter patches (yes, volumetric)  Typical Numbers:  50k – 70k visible instances on consoles. PC > 100k  Instances have 18 to 3.6k vertices depending on mesh complexity  Closest instances simulated every frame  Based on distance: simulation and time sliced skinning  Instances removed further away
  • 37. Massive Grass: Update Loop  Culling process (for each visible patch):  Mark visible instances  Compute LOD  Check if instance should be skipped in distance  After culling:  Allocate (from pool) dynamic VB/IB memory for each patch  Sample force fields into per-patch buffer (coarse discretization 4x4x4)  Sample physics for potential colliders, extract collider geometry  Dispatch sim & skin jobs for each patch
  • 38. Massive Grass: Challenges  Efficient buffer management  Resulting meshes can vary in size per frame  Naive implementation (C2) resulted in bad perf on PC and out of vram on consoles due to fragmentation  Current implementation inspired by “Don’t Throw it all Away” (McDONALD12)  Large pools for dynamic IB/VB  Each maintains two free lists (usable and pending)  Each item in pending list is moved to main free list as soon as GPU query guarantees GPU done with pool  1.3 MB consoles main memory and PC 16 MB
  • 39. Massive Grass: Challenges (2)  Efficient scheduling:  Patch instances are divided into small groups  Sim job kicked off for each group in main thread  DP in render thread has blocking wait for sim job  Job considered low-priority  Important:  Avoid unnecessary copies, skin directly to final destination  Reduce throughput and memory requirements (used half & fixed point precision everywhere)  PC: ~15 ms, 300 to 600 jobs on worst case scenarios  Xbox360 ~16ms, 800 jobs; PS3 ~10ms, 100-400 jobs
  • 40. Massive Grass: Challenges (3)  Alpha tested geometry, literaly everywhere  Massive overdraw, also troublesome for MSAA  Literaly worst case scenario for RSX due to poor z-cull  Prototyped alternatives (e.g. geometry based)  Art was not happy with these unfortunately  End solution: keep it simple  G-Buffer stage minimalistic  Consoles: Mostly outputting vertex data  Art side surface coverage minimization
  • 41. Anti-aliasing  Subjective topic: Sharp VS Blurry  Some PC gamers hate blurry, some hate sharp.  Some even love 800x600 and no AA
  • 42. DX11 Deferred MSAA: 101  The problem:  Multiple passes and reading/writing from Multisampled Render Targets  SV_SampleIndex / SV_Coverage system value semantics allow to solve via multipass for pixel/sample frequency passes (Thibieroz08)  SV_SampleIndex  Forces pixel shader execution for each sub-sample  SV_SampleIndex provides index of the sub-sample currently executed  Index can be used to fetch sub-sample from your Multisampled RT  E.g. FooMS.Load( UnnormScreenCoord, nCurrSample)  SV_Coverage  Indicates to pixel shader which sub-samples covered during raster stage  Can also modify sub-sample coverage for custom coverage mask
  • 43. DX11 Deferred MSAA  Foundation for almost all our supported AA techniques  Simple theory => troublesome practice  At least with fairly complex and deferred based engines  Disclaimer:  Non-MSAA friendly code accumulates fast  Breaks regularly as new techniques added with no care for MSAA  Pinpoint non-msaa friendly techniques, and update them one by one.  Rinse and repeat and you’ll get there eventually.  Will be enforced by default on our future engine versions
  • 44. Custom Resolve & Per-Sample Mask  Post G-Buffer, perform a custom msaa resolve:  Outputs sample 0 for lighting/other msaa dependent passes  Creates sub-sample mask on same pass, rejecting similar samples  Tag stencil with sub-sample mask  How to combine with existing complex techniques that might be using Stencil Buffer already?  Reserve 1 bit from stencil buffer  Update it with sub-sample mask  Make usage of stencil read/write bitmask to avoid bit override  Restore whenever a stencil clear occurs
  • 48. Pixel/Sample Frequency Passes  Ensure disabling sample bit override via stencil write mask  StencilWriteMask = 0x7F  Pixel Frequency Passes  Set stencil read mask to reserved bits for per-pixel regions (~0x80)  Bind pre-resolved (non-multisampled) targets SRVs  Render pass as usual  Sample Frequency Passes  Set stencil read mask to reserved bit for per-sample regions (0x80)  Bind multisampled targets SRVs  Index current sub-sample via SV_SAMPLEINDEX  Render pass as usual
  • 49. Alpha Test Super-Sampling ● Alpha testing is a special case ● Default SV_Coverage only applies to triangle edges ● Create your own sub-sample coverage mask ● E.g. check if current sub-sample AT or not and set bit // 2 thumbs up for standardized MSAA offsets on DX11 (and even documented!) static const float2 vMSAAOffsets[2] = {float2(0.25, 0.25),float2(-0.25,-0.25)}; const float2 vDDX = ddx(vTexCoord.xy); const float2 vDDY = ddy(vTexCoord.xy); [unroll] for(int s = 0; s < nSampleCount; ++s) { float2 vTexOffset = vMSAAOffsets[s].x * vDDX + vMSAAOffsets[s].y * vDDY; float fAlpha = tex2D(DiffuseSmp, vTexCoord + vTexOffset).w; uCoverageMask |= ((fAlpha-fAlphaRef) >= 0)? (uint(0x1)<<i) : 0; }
  • 50. Alpha Test Super-Sampling Alpha Test SSAA Disabled
  • 51. Alpha Test Super-Sampling Alpha Test SSAA Enabled
  • 52. Corner Cases  Cascades sun shadow maps:  Doing it “by the book” gets expensive quickly  Render shadows as usual at pixel frequency  Bilateral upscale during deferred shading composite pass
  • 53. Corner Cases  Soft particles (or similar techniques accessing depth):  Recommendation to tackle via per-sample frequency is quite slow on real world scenarios  Max Depth instead works quite ok for most cases and N-times faster Bad Good
  • 54. MSAA Friendliness  MSAA unfriendly techniques, the usual suspects:  No AA at all or noticeable bright/dark silhouettes Bad Good
  • 55. MSAA Friendliness  MSAA unfriendly techniques, the usual suspects:  No AA at all or noticeable bright/dark silhouettes Bad Good
  • 56. MSAA Friendliness  Rules of thumb:  Accessing and/or rendering to Multisampled Render Targets?  Then you’ll need to care about accessing/outputting correct sub-sample  Obviously, always minimize BW – avoid fat formats  The later is always valid, but even more for MSAA cases
  • 57. MSAA Correctness vs Performance  Our goal was correctness and quality over performance  You can always cut some corners as most games doing:  Alpha to Coverage instead of Alpha Test Super-Sampling  Or even no Alpha Test AA  Render only opaque with MSAA  Then render alpha blended passes withouth MSAA  Assuming HDR rendering: note that tone mapping is implicitly done post- resolve resulting is loss of detail on high contrast regions  Note to IHVs: Having explicit access to HW capabilities such as EQAA/CSAA would be nice  Smarter AA combos
  • 58. Conclusion ● What’s next for CryENGINE ? ● A Big Next Generation leap is finally upon us ● In 2 years time, GPUs will be at ~16 TFLOPS and ridiculous amount of available memory. ●Extrapolate results from there, without >8 year old consoles slowing progress  ● 4k resolution will bring some interesting challenges/opportunities ● Call to arms - still a lot of problems to solve ● IHVs/Microsoft: PC GPU profilers have a lot to evolve! How about a unified GPU Profiler, working great for all IHVs? ● Microsoft: Sup with DX11 (lack of) documentation? Where’s DX12? ● You: No great realtime GI / realtime reflections solution yet!
  • 59. Special Thanks ● Nicolas Thibieroz ● Chris Auty, Carsten Wenzel, Chris Raine, Chris Bolte, Baldur Karlsson, Andrew Khan, Michael Kopietz, Ivo Zoltan Frey, Desmond Gayle, Marco Corbetta, Jake Turner, Pierre- Ives Donzallaz, Magnus Larbrant, Nicolas Schulz, Nick Kasyan, Vladimir Kajalin.. Uff… lets just make it shorter: Thanks to the entire Crytek Team ^_^
  • 60. Questions? ● Tiago@Crytek.com / Twitter: Crytek_Tiago ● Carsten@Crytek.com ● ChristopherR@Crytek.com / Twitter: Cry_Raine
  • 62. References  WENZEL06 – Wenzel, C. “Real-time Atmospheric Effects in Games”, 2006  JESCHKE07 - Jeschke, S. et al. “Interactive Smooth and Curved Shell Mapping”, 2007  THIBIEROZ08 – Thibieroz, N. “Deferred Shading with Multisampling Anti-Aliasing in DirectX10”, 2008  TÓTH09 – Tóth, B. et al. “Real-time Volumetric Lighting in Participating Media”, 2009  SOUSA11 - Sousa, T. “CryENGINE 3 Rendering Techniques”, 2011  McDONALD12 – McDonald, J. “Don’t Throw it all Away”, 2012  WIKI00 – “Stereographic projection”, http://en.wikipedia.org/wiki/Stereographic_projection  WIKI01 – “Y’CbCr”, http://en.wikipedia.org/wiki/YCbCr  WIKI02– “Chroma subsampling”, http://en.wikipedia.org/wiki/Chroma_subsampling
  • 64. Massive Grass: Challenges  Trick: Updating allocation done with Copy-On-Write in case GPU still using original location  Consoles: incrementally defragment pools with GPU memory copies  Also possible on PC, but more expensive due to CopySubResource limitations (need scratchpad memory, since CSR won’t allow copies where Dst/Src are same resource)  Note to IHVs: Being able to copy from same Dst/Src resource, if non- overlapping memory regions, would be handy  Ended up using allocation & usage scheme for static geometry as well

Editor's Notes

  1. Hi everyone !Welcome to “The Rendering Technologies of Crysis 3” – our latest game, which I’m sure you’ve heard, it has a lot of GRAPHICS ! My name is Tiago Sousa, I’m Cryteks R&amp;D Principal Graphics Engineer. Unfortunately Carsten and Chris couldn’t be today with me on stage, but I’ll do my best to present some of their great work.During past year we’ve made quite some multiplatform and DX11 related updates to our CryENGINE 3. I’ve picked 5 topics for you today, from some of these updates, that I hope you’ll like: - Deferred Rendering - Volumetric Fog - Silhouette POM - Massive Grass - Anti-AliasingEach of the topic would deserve a separate and minucious lecture for itself, but I’ll try to share clearly the topics foundation/concepts from the work we did.Before we start, heads up that I’m assuming most here familiar with CryENGINE 3 rendering, if not please check out our previous GDC/Siggraph/Gamefest talks after this lecture.So, withouth further dues, lets quickly start – we have to cover a lot of ground !
  2. Thin G-Buffer 2.0The first topic we’ll cover is about deferred rendering, what changed hereFor Crysis 3 there was 4 areas we wanted to improve:Minimize redundant drawcalls. One big flaw from deferred lighting is the requirement for the additional shading drawcall, we wanted to get rid of this. Particularly important for MSAA supportAlpha blended details on G-Buffer (decals, deferred decals and similar) with proper glossiness. On crysis 2 (in case you didnt noticed) most decals had a fixed glossiness factor, we wanted art to be able to use nice gloss maps and such.Tons of vegetation on screen – this means we needed to tackle somehow translucency for all deferred light types, including sunMultiplatform friendly: Last but not least, Crysis 3 had the smallest fulltime tech development team ever (2 rendering guys in Frankfurt), so we aimed at generalized solution that either work on all platforms or just DX11 to minimize QA efforts
  3. This was our final G-Buffer layoutEssentially 64bits mrt setup + 32 bits for zbuffer&amp;stencil
  4. Let’s break it down into bits for easier visualization.We start with our final target image, essentially everything is done (shadows, shading, tone mapping, etc)
  5. Depth &amp; StencilThe usualOnly thing is for stencil we do some magic1 bit is reserved to tag dynamic geometry (for masking out deferred decals – a real fix for deferred decals is tricky/expensive)7 bits for tagging ambient areas, so that art can specify diferent ambient for some geometry (while avoiding leaking. We have couple diferent techniques for art convenience)
  6. 2 channels for world space normals storage
  7. For the second target, we have additional material propertiesOn red channel, albedo luminance is stored
  8. On green channel, albedo chrominance is stored, packed via chrominance subsampling – more details soon
  9. Blue channel stores specular intensity. As you know color for specular intensity is mostly needed just for certain metals – for us was an acceptable compromise
  10. G-Buffer packingAs mentioned:Normals are stored in 2 channels. Stereographic projection worked ok in practice, for usWe packed Z-sign together with 7 bits of glossinessImportant:- This little tricks are what allowed us to have glossiness support for alpha blended cases and free 1 channel for storing translucency.
  11. Albedo is stored using Y´CbCr color space. Might look quite some instructions, but it is actually fairly cheap in practice, couple ALUsThis is stored into 2 channels, via chrominance subsampling. Important:Concept here is that the Human Visual System has much lower accuity for color diferences. We actually are much better at checking luminance diferencesThis means in practice we can store chrominance at lower frequency. Several packing schemes exist.
  12. Hybrid Deferred RenderingThis is an old idea from beggining of Crysis 2 times (way back to 2008), but back then we didn’t noticed much benefits, likely due to much simpler levelsImportant:Concept here is to use deferred rendering for everything that is “deferred compatible”, the rest is still processed using forward renderingStep by step:Deferred lighting accumulationstill processed as usual (SOUSA11 - Sousa, T. “CryENGINE 3 Rendering Techniques”, 2011)L-Buffers now using BW friendly R11G11B10F formats. Consoles still same formatsPrecision was sufficient, materials properties are not applied yet – you need the precision mostly when applying material properties.Deferred Shading compositedvia fullscreen passThis is where material properties applied, still uses R16G16B16A16F format. In theory could use lower precision + range scalling has we do on consoles (didn’t try)For more complex shading such as Hair or Skin, still process forwardAllowed drop of almost all opaque forward passesLess Drawcalls, but G-Buffer passes with higher cosZ-Prepass for few nearest geometryImportant:*Up to 10 ms on consoles on fairly heavy scenes, also fairly nice win for MSAA (regular deferred lighting + MSAA work fairly poor togheter)
  13. Here we can see behaviour, red is for all pixels processed via deferred, green for all pixels still foward rendered
  14. To recap what was said:Unified solution for all platformsDeferred rendering using 25% less BW than vanilla deferred. Good for MSAA /avoiding tiled rendering for xbox360Allows tackle glossiness for transparent geometry on g-buffer and also sub surface scattering for all deferred lights
  15. Thin G-Buffer Hindsights:Why not pack G-Buffer directly into a 64 bit target ?Because we need to be able to blend details into G-BufferWould need to decode –&gt; blend –&gt; encodeOr could blend such cases into separate targets (bad for MSAA/Consoles)Programable blending would have been niceAB cases can’t use alpha channel for store (for all MRTs!)*Withouth resorting to multipassWould allow for more interesting and optimal packing schemessRGB output only for couple channels or all While at it, stencil write from fragment shader would also be handy
  16. Volumetric Fog Updates:Mostly same since Crysis 1 times, with couple updatesFog density calculation still same model that Carsten introduces in his “Real Time Atmospheric Effects in Games”, in 2006Still rendered in deferred fashion as fullscreen pass for opaque geometry. One little optimization here was computing distance at which fog contributes or not at all and set minZ accordingly for Depth bounds checking (you could also achieve same by rendering quad at such depth + depth test)For transparents, we still do a per vertex approximation, unless is some visually important/low tessellation case such as water, for such we compute it per-pixel
  17. One update we made, was exposing artist controleable gradients. Height based gradients allow controlling color and density for top and minimum height. The radial gradient allows art to control color/size/and lobe around sun position. Not super physically based, but was one of those things art kept requesting for artistic control
  18. Volumetric Fog ShadowsSomething new we introduced for Crysis 3. Our work is based on “Real Time Volumetric Lighting in Participating Media”, by TOTH et al in 2009Important. Concept here is to not accumulate in-scattered light, we only accumulate shadow contribution along view ray. Fairly simple, imagine you have a volume, discretize it, say divide in 16 points, check if for each point, sample shadow map if its in shadow at that location or not
  19. Technique is fairly simple:We interleave 1k samples on a 8x8 grid, so for each pixel we use 16 taps. This is done of course at half resolutionThen a fullscreen composite pass for computing final shadow value.Bilateral filtering was used to minimize artifactsOn our case, we used 8 taps from a low resolution depth buffer to compare with full resolution depth. All data for composite step stored on same target. 8 bit precision for depth sufficed to tackle most obvious artifacts.Extra:Max sample distance configurable (~150-200m in C3 levels)Cloud shadow texture baked into final resultFinal result modifies height and radial color components of fog
  20. Alternative to tessellation based displacement mappingLooked into various approaches, most weren’t practical for productionCurrent implementation is based on principle of barycentric correspondence introduced (afawk) by JESCHKE07 - Jeschke, S. et al. “Interactive Smooth and Curved Shell Mapping”, 2007
  21. JESCHKE07 - Jeschke, S. et al. “Interactive Smooth and Curved Shell Mapping”, 2007Alternative to tessellation based displacement mappingLooked into various approaches, most weren’t practical for productione.g. needed obj space normal maps, separate shader for fins and shells, very expensive ray prism intersection costs, etcCurrent implementation is based on principle of barycentric correspondence (JES07) Allows tracing ray in obj space and map it back into texture space
  22. Transform vertices and extrude – VSOutput current vertex + extruded version (position, view vector)Generate prisms (do not split into tetrahedral) and setup clip planes - GSGenerally prism sides are bilinear patches, we approximate by a conservative planeNote to IHVs: Emitting per-triangle constants would be nice!Ray marching - PSCompute intersection of view ray with prism in WS, translate to texture space via barycentric correspondenceUse resulting texture uv and height for entry and exit to trace height fieldCompute final uv and selectively discard pixel (viewer below height map; view ray leaving prism before hitting terrain)Lots of pressure on PS, yet GS is the bottleneck (prism gen)
  23. Currently don’t fix up depth buffer for correct intersectionsDo fix up depth in separate target though which is used for deferred passes (shadows, fog, deferred decals, screen space occlusion, etc)Uses same self shadow algorithm that also runs atop of OBM and POMNext projects will make better usage of such tech 
  24. Initial goals: Everything moving on the screen: eg: grass, vegetation, cloth
  25. Red simulated everyframe/ highest detail. Green time sliced update/lower detail (no shadows and such)
  26. MCD12 – McDonald, J. “Don’t Throw it all Away”, 2012Efficient buffer managementResulting meshes can vary in size per frame. Eg: player walking/looking diferent directions can result in more/less vegetation visibleLarge pools for dynamic IB/VBEach maintains two free lists (usable and pending)Each item in pending list is moved to main free list as soon as GPU query guarantees GPU done with pool * (done with rendering)
  27. Efficient scheduling:Patch instances are divided into small groupsSim job kicked off for each group in main threadDP in render thread has blocking wait for sim job (gives full frame of time)Job considered low-priority (= higher priority jobs run before it in work queue)*No copies at all, store directlyImportant:Avoid unnecessary copies, skin directly to final destinationReduce throughput and memory requirements (used half &amp; fixed point precision everywhere)*e.g.: velocity for sim
  28. Alpha tested geometry. Literaly everywhereWorst case scenario for RSX due to fairly poor z-cull. Xbox 360 outperformed PS3 here 2x. Also troublesome for MSAAPrototyped alternatives (e.g geometry based) but art hated them End solution: keep it simpleG-Buffer stage minimalisticConsoles: Mostly outputting vertex dataSurface coverage minimize1 cycle fragment program on rsx + extra cycle due to clip requirement
  29. Just gave a combo of options; let gamers pick their favorite
  30. *alpha tested geometry included*custom coverage mask allows for nifty tricks: e.g. Selective alpha test Super-Sampling, custom ATOC, fancier lod dissolves
  31. *If nothing else works due to already crazy stencil usage – you’ll have to use the poor man version via clip
  32. Custom Per-Sample Mask rejecting similar samples, via depth/normal thresholdOne adittionallittle trick we also do: tag entire quad instead of just pixel, from our profiling helps stencil culling efficiency (due to better spatial coeherency =&gt; entire quad rejected/accepted) – in average about 1ms save
  33. (Tip from Thibieroz) EvaluateAttributeAtSample vs DDX/DDY – DDX/Y are TEX intructions, using EvaluateAttribute will likely perform better
  34. Motion blur and Depth of FieldBoth done at pixel frequencyComposited into MSAA buffer after
  35. Motion blur and Depth of FieldBoth done at pixel frequencyComposited into MSAA buffer after