SlideShare a Scribd company logo
1 of 154
4K Checkerboard
in Battlefield 1 and
Mass Effect Andromeda
Graham Wihlidal
Rendering Engineer
Frostbite Labs
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
1080p
1800p
2160p
1080p 4K
4K1080p
4K1080p
4K1080p
Motivation
► Over the past few years...
► Visual fidelity and complexity has greatly increased
► More memory (order of magnitude)
► Primitive rate (order of magnitude)
► Computational demands (PBR, Dynamic GI, SSR, etc.)
Motivation
► Over the past few years...
► Increases in rendering resolution has not
► Majority of current titles are 900p or 1080p
► It’s time that ‘perceived’ resolution got a bump!
Motivation
► Detailed geometry results in aliasing artifacts
► Lack of high resolution geometry information
► Image quality is reduced in favor of temporal stability
► Frostbite had deferred MSAA support previously
► Maintenance nightmare (i.e. shader permutations)
► Removed to reduce complexity
► Sony added hardware features to PS4™Pro
► Easier and faster decoupling of geometry and shading rate
► Less risk for initial adoption
Motivation
► Reduce shading cost in majority of graphics pipeline
► Shade only a subset of pixels
► Compute geometry information for all pixels
► Some information is lost
► 4k sampling rate  adjacent pixels are strongly correlated
► Assuming they belong to the same surface
► High quality geometry-aware resolve to reconstruct
History
► SIE (WWS ATG): PS3™ Edge MLAA (’09)
► Used Object IDs to drive MLAA edge detection in SOCOM4 [5]
► SIE (WWS ATG): PS4™ AA Prototype (’13, unreleased)
► Use EQAA for higher resolution depth
► Reconstruct full-resolution image, then resolve down
► Layer further AA techniques on resulting image
► Guerrilla Games: ’Killzone: Shadowfall’ (’13)
► Temporal super-resolution with alternating pixel-column
► Difference blend operator [3]
History
► SIE: PS4™Pro Architecture (’13-15)
► Alternating Packed Checkerboard Sampling [15]
► Texture Gradient Adjustment(*)
► Pixel Shader Invocation Control(*)
► High-Resolution Object and Primitive ID Buffers(*)
► SIE (WWS ICE/ATG): 4KCB / 4KG Demos (’15)
► First implementations of these techniques in game titles
► inFamous: First Light, Knack, Uncharted 4 - on PS4™Pro hardware [16,17]
► Ubisoft: Rainbow Six | Siege (‘15)
► MSAA checkerboard implementation [1]
► EA Frostbite | Labs : PS4™Pro Support for Frostbite ('16)
► ‘Battlefield 1’ and ‘Mass Effect Andromeda’
(*) PS4™Pro custom hardware features
Exploration
► Tried a number of high resolution techniques
► Variety of resolutions
► 50” TV at living room distance
► Super-sampling  native 4K
► Looks great!
► Perf timers don’t…
► Reduced resolution to 1800p
► Still too expensive @ 60Hz
720
900
1080
1440
1800
2160
/ 2 =
2160p
29.07ms
PS4™Pro
1800p
21.07ms
PS4™Pro
Exploration
► Variable shading rate is popular
► Checkerboard is a practical idea
► Greatly increase resolution
► Without much performance cost
Exploration
► Stencil out 1x1 blocks
► Using stencil is terrible, no performance gain
► GPU shades in 2x2 quads
► Same cost as rendering native 4K
► Throw away half of the work
► 50% inactive lanes
Exploration
► Stencil out 2x2 blocks
► Sampling distribution is bad; less correlation
► Great coverage in a 2x2 block, then a huge hole
► Blurry dilated color bits every 2nd quad
► End up blurring even more to solve
► What’s the point?
Exploration
► 2x color and 4x depth checkerboard
► Available on any MSAA platform
► Requires all shaders to load from MS textures (lots of changes)
► Sub-optimal – more on this later
► Alternatively, 2x color and 2x depth [1]
Exploration
► 1x color and 4x depth geometry resolve (4K Geometry)
► 1080p to 4K
► Was a good idea, and is similar to SRAA [14]
► Single Pass + IDs + custom reconstruction
► Quality wasn’t high enough
► Abandoned early in favor of 4K CB
► Comparable implementation cost
► More research would improve concept
Exploration
► Settled on “packed checkerboard” technique
► Started with PS4™Pro reference implementation
► Customized + optimized further, and incorporated our own TAA
Frame N Frame N+1 Resolved
Spatial + Temporal Resolve
1800p CB
15.99ms
PS4™Pro
1800p 21.07msPS4™Pro
1800p CB 15.99msPS4™Pro
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
EQAA – What is it?
► EQAA [7] (AMD) is a superset of MSAA
► Possible to store fewer color fragments than depth fragments
► Color <= ID <= Depth
► 4K checkerboard exploits this configuration
► 1x color fragment
► 2x ID fragments
► 2x depth fragments
Shading Quads – 2x Color : 4x Depth
8 Shaded, 2 Stored
ddx
ddx
ddy
ddy
Shading Quads – 1x Color : 2x Depth
4 Shaded, 2 Stored
• 2x method: 2 steps and sqrt(2) steps.
• 4x method: 2 steps and sqrt(8) steps.
ddx
ddy
EQAA Checkerboard Layout
Center
Sample 0
Sample 1
-1/4 0 +1/4
-1/4 0 +1/4
Note: Positions are specified per-quad
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
Object and Primitive ID Buffer
► Generated at a resolution higher than shading resolution
► Each visible geometry sample
► Identifier stored in image buffer
► Uniquely identifies object and primitive
► Instanced draws provide a separate object ID per instance
► Draws take pointer to an array of object IDs, one for each instance
► Tessellated draws are given a primitive ID per input patch
Object and Primitive ID Buffer
► Written during depth-only passes, or main scene render
► Depth/ID-only
► Consumes less memory bandwidth than shading passes
► Performed without shader involvement (on PS4™Pro)
► Allows asynchronous compute jobs running in parallel
► CPU and GPU overhead
► We don’t use full pre-pass
► Alpha tested objects + dominant occluders
► Main shading
► Easier to integrate if no existing depth-only pass
► Competes with memory bandwidth
► ID-only samples don’t pay for shading
Object and Primitive ID Buffer
► 32-bit, 16-bit, or 8-bit
► We use 31-bits wide
► MSB ignored by hardware
► 14-bit object ID + 17-bit primitive ID
► Primitive ID reset to 0 at the end of each instance or draw
Object and Primitive ID Buffer
► Color target bound as 8th MRT
► Uses CB’s data paths and color target tile modes
► Safe to use null pixel shader
► Not treated as a pixel shader target during ID propagation
► IDs can also be exported from vertex shader
► Could reduce number of VS wavefronts active on chip
► Many other interesting use cases
► Not covered in this presentation 
► Can be used in resolve to improve quality of final image [5]
Gradient Adjust
► Horizontal gradient is twice the expected value
► Vertical gradient is stretched and rotated 45˚
► Resembles bad anisotropic filtering
Expected Gradient Real Gradient
ddx
ddy
ddx
ddy
Gradient Adjust
► Need to apply non-uniform rescale
► Rectangular pixels
► LOD Bias
► Doesn’t work – scales uniformly
► Use SampleGrad fetch:
Expected Gradient
Real Gradient
Gradient Adjust
𝑑 𝑢𝑣𝑤 𝑑𝑥
′
𝑑 𝑢𝑣𝑤 𝑑𝑦
′ =
𝑓𝑎𝑐𝑡𝑜𝑟00 𝑓𝑎𝑐𝑡𝑜𝑟01
𝑓𝑎𝑐𝑡𝑜𝑟10 𝑓𝑎𝑐𝑡𝑜𝑟11
∗
𝑑 𝑢𝑣𝑤 𝑑𝑥
𝑑 𝑢𝑣𝑤 𝑑𝑦
Identity (No Adjustment):
𝑑 𝑢𝑣𝑤 𝑑𝑥
′
𝑑 𝑢𝑣𝑤 𝑑𝑦
′ =
1.0 0.0
0.0 1.0
∗
𝑑 𝑢𝑣𝑤 𝑑𝑥
𝑑 𝑢𝑣𝑤 𝑑𝑦
Gradient Adjust
Frame N + 1:
Frame N + 0:
𝑑 𝑢𝑣𝑤 𝑑𝑥
′
𝑑 𝑢𝑣𝑤 𝑑𝑦
′ =
0.5 0.0
−0.5 1.0
∗
𝑑 𝑢𝑣𝑤 𝑑𝑥
𝑑 𝑢𝑣𝑤 𝑑𝑦
𝑑 𝑢𝑣𝑤 𝑑𝑥
′
𝑑 𝑢𝑣𝑤 𝑑𝑦
′ =
0.5 0.0
0.5 1.0
∗
𝑑 𝑢𝑣𝑤 𝑑𝑥
𝑑 𝑢𝑣𝑤 𝑑𝑦
Gradient Adjust
► Manual gradient correction with shader ALU
► ~10% extra cost in main shading
► SampleGrad issue is more expensive than normal fetch (in cy)
► No change in bandwidth / latency
► Fetch and filter are unaffected
► Increased register pressure
► Keep derivatives around
► Need to be careful!
Gradient Adjust
► However…
► Only shade half the pixels
► Not all instructions in the shader are a fetch
► Overall win
Gradient Adjust
► PS4™Pro has special hardware and compiler support!
► Hardware can perform this in the texture unit
► Affine transform stored in texture unit
► Just turn it on  ALU portion is free
Gradient Adjust
Gradient Adjust
► Most cases can be automatically adjusted
► Explicit gradients (ddx/ddy) need to be manually corrected
► Custom filtering
► Virtual texturing
► etc.
Broken Gradients
Programmer Art 
Adjusted Gradients
Programmer Art 
Barycentric Evaluation
Pixel Positions  Sample Positions
Images courtesy of SIE [15]
Barycentric Evaluation
Pixel Positions
Images courtesy of SIE [15]
Barycentric Evaluation
 Sample Positions
Images courtesy of SIE [15]
Alpha Unrolling
► Alpha test computes depth/coverage inside pixel shader
► Instead of relying on scan-converter, like opaque
► By default, pixel shader runs at pixel rate
► All samples of a pixel share output of single shader invocation
► IDs at shading rate instead of full rate
► Serious problem with hole reconstruction
Alpha Unrolling
► Solution: Run samples at coverage rate!
► Generate full resolution depth and IDs
► Each pixel quad is unrolled
► Shading quad created per sample
► Large increase in pixel shader work
► Important to switch off when not needed
Alpha Unrolling
► Run minimal pass to calculate coverage
► Computes coverage/depth (Clip / Depth Write)
► Coverage rate (2x)
► Run expensive shading pass
► Computes color values (Depth Equals)
► Benefits from maximal hidden surface removal
► Pixel rate (1x)
Alpha Unrolling
► Positions need to be invariant!
► Positions written by coverage and shading need to match
► Subtle differences in computation can lead to z-fighting
► Disable “fast math” for everything that goes into the position
Before
Programmer Art 
After
Programmer Art 
Before
Programmer Art 
After
Programmer Art 
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
PS Invoke
► Every sample position…
► If covered by a triangle, can trigger pixel shading
► Checkerboard uses 1x color and 2x depth/IDs
► Half of the shading does not contribute to final image
► How can we prevent over-shading?
► Pixel Shader Invocation Control (PS Invoke)
► Can make samples "non-shading”
► Get the shading work back to what it would be with a single sample
FP16
► PS4™Pro has support for FP16 GCN instructions
► Used throughout checkerboard resolve shader
► 30% performance improvement
EQAA Depth Resolve
► Many passes do not require higher-resolution depth and IDs
► Don’t waste depth block (DB) bandwidth! Can be ½ instead
► Transparent objects + post-processing are candidates
► Do not write depth
► Read from single-sample!
► Resolve to single-sample depth and stencil
► Occurs after partial pre-pass + gbuffer laydown
EQAA Depth Resolve
► Simple approach is to copy depth with compute shader
► Requires depthHTILE decompression 
► Slow!
► Use color block (CB) trick to resolve without decompression!
► AMD Evergreen Acceleration document [8] describes:
► CB copy of depth to color target
► No decompression needed
► Great! But we want a usable depth surface in the end, not color
► Buckle up.... 
EQAA Depth Resolve
► Dummy shader that writes R32F 0.0f
► exp mrt0, 0.0, off, off, off vm done
► Alias the destination depth target as a color target
► 2d non-displayable thin and 1xAA depth micro tiling are the same
► Set DEPTH_COPY bit on DB_RENDER_CONTROL
► PS puts a dummy value on mrt0.x
► DB replaces it later
► The CB writes the depth to the destination Z surface
► Without HTILE!
EQAA Depth Resolve
► Stencil is done in a similar manner
► Set STENCIL_COPY bit in DB_RENDER_CONTROL
► CB writes stencil to G channel
► Destination is a color target aliasing stencil
► Only R channel present
► Adjust CB color info register to swizzle
► G to R on write
► As per COMP_SWAP [4] to configure as SWAP_ALT
EQAA Depth Resolve
EQAA Depth Resolve
► Original HTILE has correct ranges
► but has compressed ZMask codes
► Patch up the HTILE by copying it
► Fast compute shader! (6µs)
► Force ZMask bits to 0xF (expanded)
► Store patched meta data
► HTILE acceleration works on the 1xAA destination
► Any further writes will compress as expected
EQAA Depth Resolve
► HTILE copy:
EQAA Depth Resolve
► This saved us 1ms over basic copy!
► Leaves the source depth and stencil fully compressed
► DBs and CBs do the work
► Which understand compression
► Instead of shader cores doing it
► Would require decompression
► Technique is completely bandwidth-bound 
► ~0.1ms for depth, ~0.1ms for stencil
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
Post Processing
► Trade-off between performance and quality
► Move as much post-processing before CB resolve as possible
► Very time consuming and painful work
► Evolving codebase breaks checkerboard often
► New concept to many
► Auto-test checkerboard pipeline if possible
Post Processing
► Operations that ignore geometry provide little value at 4K
► i.e. SSAO, luminance estimation, etc..
► Limiting color propagation over an edge - run at lower res
► Need to account for checkerboard ”pixel grid”
► Linear sampling of checkerboard surface is problematic
► Aspect ratio of the buffer is different from normal 16:9
► Most cases you can ½ the horizontal filter width
Test Scene - Broken
Test Scene - Fixed
Post Processing
Corrected with:
Offsets UV used for clip-space position reconstruction
1600x1800
• Clear (IDs and Depth)
• Partial Z-Pass
• G-Buffer Laydown
• Resolve AA Depth
• G-Buffer Decals
• HBAO + Shadows
• Tiled Lighting + SSS
• Emissive
• Sky
• Transparency
• Velocity Vectors
3200x1800
• CB Resolve + Temporal AA
• Motion Blur
• Foreground Transparency
• Gaussian Pyramid
• Final Post-Processing
• Silhouette Outlines
3840x2160 • Display Mapping + Resample
BF1 : PS4™Pro
1600x1800
• Clear (IDs and Depth)
• Partial Z-Pass
• G-Buffer Laydown
• Resolve AA Depth
• G-Buffer Decals
• HBAO + Shadows
• Tiled Lighting + SSS
• Emissive
• Sky
• Transparency
• Velocity Vectors
3200x1800
• CB Resolve + Temporal AA
• Sprite Depth-of-Field
• Motion Blur
• Foreground Transparency
• Gaussian Pyramid
• Final Post-Processing
• Silhouette Outlines
3840x2160 • Display Mapping + Resample
MEA : PS4™Pro
Checkerboard Resolve
► Spatial Component
► Color Bounding Box
► Differential Blend Operator
► Object and Primitive IDs
► Temporal Component
► Sub-Pixel Jittering
► Velocity-Based Reprojection
► Neighborhood Clamping
Spatial Component
► Color Bounding Box
► Differential Blend Operator
► Object and Primitive IDs
Color Bounding Box
► Checkerboard resolve does not use depth
► Use color to determine if edges are soft
► Avoid contribution to a pixel from different objects
► If all neighbors are different objects, average is used
Color Bounding Box
A
B
C
Image courtesy of SIE [15]
Color Bounding Box
Copy Single Sample  Comparison Heuristic
Color Bounding Box
Copy Single Sample
Color Bounding Box
 Comparison Heuristic
Differential Blend Operator
Neighborhood Average  Differential Blending
Images courtesy of SIE [15]
Differential Blend Operator
Differential Blending  Differential Blending + IDs
Images courtesy of SIE [15]
Differential Blend Operator
Images courtesy of SIE [15]
Neighborhood Average
Differential Blend Operator
Images courtesy of SIE [15]
 Differential Blending + IDs
Differential Blend Operator
Object and Primitive IDs
Object ID
Primitive ID


Object ID
Primitive ID


Object ID
Primitive ID


Object ID
Primitive ID


Images courtesy of SIE [15]
Temporal Component
► Similar to existing research - [2][11][12]
► Not covered here
► Sub-Pixel Jittering
► Neighborhood Clamping
► Velocity-Based Reprojection
► Updated for checkerboard
Velocity-Based Reprojection
► Objects and surfaces are moving
► Pixel grid is stationary
► Need to correlate 3D surfaces in motion
► Surfaces write per-pixel velocity vectors
► Dilate velocities to keep anti-aliased edges
► Use front-most velocity in 3x3 window
► Depth is expensive to fetch!
Dilate Velocity Separately
For Each Pixel
2 Pixels Per Thread
Pass-through and Filtered
Estimate Velocity Variation
Use Average
Velocity
Is
Similar?
YES
NO
X
Pass-through Pixel
Filtered Pixel
X
Checkerboard Input
LDS
Half Res (CB)
Spatial Resolve
LDS
Full Res (No CB)
Fetch History
Temporal AA
LDS Sharpen Filter
History (Full)
Final Frame (Full)
2 Pixels Per Thread
Pass-through and Filtered
Fetch ID
(Not in LDS)
Reprojection
Checkerboard Resolve
Sharpen Filter
► Blurred image lacks high-frequency components
► Hard to discern objects, visual cortex lacks information
► Ganglion cells respond acutely to high-frequency components [6]
Sharpen Filter
► Amplify (recover) high-frequency components
► Greatly enhance visual quality
► Performed after image reconstruction in temporal AA
► Be careful to not reintroduce aliasing
► (false high-frequency)
► Be careful to not introduce ringing
► (over-sharpening)
Before
After
Before
After
Before
After
Post Processing
► Once checkerboard is in enabled, brace yourself
► Every single artifact now looks like a checkerboard artifact
► Add lots of debug overlays to help debug
► Teach others how to diagnose!
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Future Work
Render Target Aliasing
► Biggest pain of 4K transition was memory!
► Majority of our render targets were to blame
► At the time, no fancy memory management of them
► Extra effort spent here impacted other 4K improvements
► “11th hour” explicit aliasing to ship – saved ~230mb
► Future Titles
► See: FrameGraph: Extensible Rendering Architecture in Frostbite
Dynamic Resolution Scaling
► Developed by DICE and Microsoft
► Plays nicely with 4K checkerboard
► Checkerboard always active
► Dynamic scaled initial frame resolution
► Determined with running performance heuristic
Dynamic Resolution Scaling
► BF1 contains a number of infrequently running GPU tasks
► Caused resolution to slightly adjust almost every frame
► Not an issue
► Worked with jitter to provide variation in subpixel detail
► Tried preventing upscale if camera wasn’t moving
► Resulted in noticeably lower quality image
Dynamic Resolution Scaling
►No one complained about it!
Agenda
► Motivation
► Configuration
► Features
► Optimizations
► Post Processing
► Pipeline
► Conclusion
1800p CB
15.99ms
PS4™Pro
1800p
21.07ms
Performance
► Total: 15.99 ms
► Clear IDs: 0.57 ms
► Copy Expanded HTile [3x]: 0.02 ms
► Resolve EQAA Depth [3x]: 0.69 ms
► CB Resolve + Temporal AA: 1.15 ms
► Everything Else: 13.56 ms
► 1800p CB vs 1800p (saves 5.08 ms)
PS4™Pro
Performance
Timer 1800p 1800p CB
G-Buffer 4.25 ms 3.11 ms
Shadows 4.13 ms 2.63 ms
Lighting 2.31 ms 1.61 ms
Sky 0.63 ms 0.32 ms
Half Res 1.54 ms 0.77 ms
Velocity 0.57 ms 0.25 ms
HBAO 2.47 ms 1.08 ms
PS4™Pro
1800p CB
23.42ms
PS4™Pro
1800p
36.82ms
Performance
► Total: 23.42 ms
► Clear IDs: 0.41 ms
► Copy Expanded HTile [3x]: 0.02 ms
► Resolve EQAA Depth [3x]: 0.80 ms
► CB Resolve + Temporal AA: 2.00 ms
► Everything Else: 20.19 ms
► 1800p CB vs 1800p (saves 13.40 ms)
PS4™Pro
Performance
Timer 1800p 1800p CB
G-Buffer 5.11 ms 4.41 ms
Shadows 3.70 ms 3.32 ms
Lighting 13.5 ms 7.32 ms
Sky 0.50 ms 0.29 ms
Half Res 2.53 ms 1.48 ms
Velocity 0.32 ms 0.27 ms
HBAO 3.11 ms 1.27 ms
PS4™Pro
Future Work
► Further improvements to checkerboard resolve
► Alternative approaches to alpha testing
► Alpha mask export
► Pixel rate coverage (not sample)
► More uses for IDs
► Temporally stable object and primitive IDs?
► Replace stencil-based tagging [9]
► Help eliminate ghosting with temporal re-projection
► Remove heuristics in favor of comparisons
Future Work
► Packed checkerboard on other platforms
► Hardware features like ID buffer only exist on PS4™Pro
► Reasonable workarounds exist for Xbox™ and base PS4™.
► Need EQAA + programmable sample locations
► Vulkan + DirectX12
► Driver support for efficient EQAA depth resolve
► Leave source compressed
► Bandwidth-bound
Future Work
► Filtered visibility buffer [10] + decoupled shading
► G-buffers are challenging at high resolutions
► Classic deferred shading isn’t the answer
► Decouple g-buffer from screen resolution
Special Thanks
► Tobias Berghoff
► Nicolas Serres
► Tim Dann
► Tomasz Stachowiak
► Yasin Uludag
► Keven Cantin
► Mark Cerny
► Johan Andersson
► Christina Coffin
► Chris Brennan
► Timothy Lottes
► Rob Srinivasiah
► Jason Scanlin
► Dave Simpson
► Bart Wronski
► Martin Fuller
► James Stanard
► Ivan Nevraev
► Colin Barré-Brisebois
► Matthäus Chajdas
► Frostbite Rendering
► Sucker Punch
References
► [1] Rendering ‘Rainbox Six | Siege’ – Jalal El Mansouri
► [2] High Quality Temporal Supersampling – Brian Karis
► [3] Taking Killzone Shadow Fall Image Quality Into The Next Generation – Michal Valient
► [4] Radeon Southern Islands 3D/Compute Register Reference Guide – AMD
► [5] MLAA on PS3 – Tobias Berghoff, Cedric Perthuis
► [6] Hubel, D. H., and T. N. Wiesel. 1979. "Brain Mechanisms of Vision." Scientific American 241(3), pp. 150–162.
► [7] EQAA Modes for AMD 6900 Series Graphics Cards – AMD
► [8] Radeon Evergreen/Northern Islands Acceleration - AMD
► [9] Temporal Antialiasing in Uncharted 4 – Naughty Dog
► [10] The Filtered and Culled Visibility Buffer – Wolfgang Engel
► [11] Real-Time Global Illumination and Reflections in Dust 514 – Hugh Malan
► [12] Anti-Aliasing Methods in CryENGINE 3 – Tiago Sousa
► [13] Dynamic Resolution Rendering – Doug Binks
► [14] Subpixel Reconstruction Antialiasing – NVIDIA
► [15] "PlayStation®4 SDK / PS4™Pro High Resolution Technologies " - SIE (Tobias Berghoff, Tim Dann)
► [16] "Implementing 4K Checkerboard - Lessons from 4K Knack and U4 Demos", Jason Scanlin - SIE WWS ICE, May 2016
► [17] "Making the inFAMOUS 4K Demo", Tobias Berghoff - SIE WWS ATG, May 2016
Questions?
Graham Wihlidal
graham@frostbite.com
Twitter - @gwihlidal
Thank You!

More Related Content

What's hot

Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Johan Andersson
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
guest11b095
 

What's hot (20)

Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
Light prepass
Light prepassLight prepass
Light prepass
 

Viewers also liked

Deferred shading
Deferred shadingDeferred shading
Deferred shading
Frank Chao
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
Electronic Arts / DICE
 

Viewers also liked (7)

Practical Occlusion Culling in Killzone 3
Practical Occlusion Culling in Killzone 3Practical Occlusion Culling in Killzone 3
Practical Occlusion Culling in Killzone 3
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 

Similar to 4K Checkerboard in Battlefield 1 and Mass Effect Andromeda

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
changehee lee
 
NVIDIA effects GDC09
NVIDIA effects GDC09NVIDIA effects GDC09
NVIDIA effects GDC09
IGDA_London
 
Alex_Vlachos_Advanced_VR_Rendering_GDC2015
Alex_Vlachos_Advanced_VR_Rendering_GDC2015Alex_Vlachos_Advanced_VR_Rendering_GDC2015
Alex_Vlachos_Advanced_VR_Rendering_GDC2015
Alex Vlachos
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
Droidcon Berlin
 

Similar to 4K Checkerboard in Battlefield 1 and Mass Effect Andromeda (20)

A Scalable Real-Time Many-Shadowed-Light Rendering System
A Scalable Real-Time Many-Shadowed-Light Rendering SystemA Scalable Real-Time Many-Shadowed-Light Rendering System
A Scalable Real-Time Many-Shadowed-Light Rendering System
 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
4,000 Adams at 90 Frames Per Second | Yi Fei Boon
4,000 Adams at 90 Frames Per Second | Yi Fei Boon4,000 Adams at 90 Frames Per Second | Yi Fei Boon
4,000 Adams at 90 Frames Per Second | Yi Fei Boon
 
Gpu microprocessors
Gpu microprocessorsGpu microprocessors
Gpu microprocessors
 
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
 
Developing Next-Generation Games with Stage3D (Molehill)
Developing Next-Generation Games with Stage3D (Molehill) Developing Next-Generation Games with Stage3D (Molehill)
Developing Next-Generation Games with Stage3D (Molehill)
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics Rendering
 
NVIDIA effects GDC09
NVIDIA effects GDC09NVIDIA effects GDC09
NVIDIA effects GDC09
 
Alex_Vlachos_Advanced_VR_Rendering_GDC2015
Alex_Vlachos_Advanced_VR_Rendering_GDC2015Alex_Vlachos_Advanced_VR_Rendering_GDC2015
Alex_Vlachos_Advanced_VR_Rendering_GDC2015
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
Edge detection-based post-processing in Warlords of Draenor
Edge detection-based post-processing in Warlords of DraenorEdge detection-based post-processing in Warlords of Draenor
Edge detection-based post-processing in Warlords of Draenor
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
Realtime Per Face Texture Mapping (PTEX)
Realtime Per Face Texture Mapping (PTEX)Realtime Per Face Texture Mapping (PTEX)
Realtime Per Face Texture Mapping (PTEX)
 
Xbox
XboxXbox
Xbox
 
201801 SER332 Lecture 02
201801 SER332 Lecture 02201801 SER332 Lecture 02
201801 SER332 Lecture 02
 

More from Electronic Arts / DICE

More from Electronic Arts / DICE (20)

GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeSIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingCEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismCEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
 
Creativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsCreativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural Systems
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDShiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Battlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + MantleBattlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + Mantle
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
 
Modular Rigging in Battlefield 3
Modular Rigging in Battlefield 3Modular Rigging in Battlefield 3
Modular Rigging in Battlefield 3
 

Recently uploaded

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

4K Checkerboard in Battlefield 1 and Mass Effect Andromeda

  • 1. 4K Checkerboard in Battlefield 1 and Mass Effect Andromeda Graham Wihlidal Rendering Engineer Frostbite Labs
  • 2. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 8.
  • 10. 1800p
  • 11. 2160p
  • 12.
  • 14.
  • 16.
  • 18.
  • 20. Motivation ► Over the past few years... ► Visual fidelity and complexity has greatly increased ► More memory (order of magnitude) ► Primitive rate (order of magnitude) ► Computational demands (PBR, Dynamic GI, SSR, etc.)
  • 21. Motivation ► Over the past few years... ► Increases in rendering resolution has not ► Majority of current titles are 900p or 1080p ► It’s time that ‘perceived’ resolution got a bump!
  • 22.
  • 23. Motivation ► Detailed geometry results in aliasing artifacts ► Lack of high resolution geometry information ► Image quality is reduced in favor of temporal stability ► Frostbite had deferred MSAA support previously ► Maintenance nightmare (i.e. shader permutations) ► Removed to reduce complexity ► Sony added hardware features to PS4™Pro ► Easier and faster decoupling of geometry and shading rate ► Less risk for initial adoption
  • 24. Motivation ► Reduce shading cost in majority of graphics pipeline ► Shade only a subset of pixels ► Compute geometry information for all pixels ► Some information is lost ► 4k sampling rate  adjacent pixels are strongly correlated ► Assuming they belong to the same surface ► High quality geometry-aware resolve to reconstruct
  • 25. History ► SIE (WWS ATG): PS3™ Edge MLAA (’09) ► Used Object IDs to drive MLAA edge detection in SOCOM4 [5] ► SIE (WWS ATG): PS4™ AA Prototype (’13, unreleased) ► Use EQAA for higher resolution depth ► Reconstruct full-resolution image, then resolve down ► Layer further AA techniques on resulting image ► Guerrilla Games: ’Killzone: Shadowfall’ (’13) ► Temporal super-resolution with alternating pixel-column ► Difference blend operator [3]
  • 26. History ► SIE: PS4™Pro Architecture (’13-15) ► Alternating Packed Checkerboard Sampling [15] ► Texture Gradient Adjustment(*) ► Pixel Shader Invocation Control(*) ► High-Resolution Object and Primitive ID Buffers(*) ► SIE (WWS ICE/ATG): 4KCB / 4KG Demos (’15) ► First implementations of these techniques in game titles ► inFamous: First Light, Knack, Uncharted 4 - on PS4™Pro hardware [16,17] ► Ubisoft: Rainbow Six | Siege (‘15) ► MSAA checkerboard implementation [1] ► EA Frostbite | Labs : PS4™Pro Support for Frostbite ('16) ► ‘Battlefield 1’ and ‘Mass Effect Andromeda’ (*) PS4™Pro custom hardware features
  • 27. Exploration ► Tried a number of high resolution techniques ► Variety of resolutions ► 50” TV at living room distance ► Super-sampling native 4K ► Looks great! ► Perf timers don’t… ► Reduced resolution to 1800p ► Still too expensive @ 60Hz 720 900 1080 1440 1800 2160 / 2 =
  • 30. Exploration ► Variable shading rate is popular ► Checkerboard is a practical idea ► Greatly increase resolution ► Without much performance cost
  • 31. Exploration ► Stencil out 1x1 blocks ► Using stencil is terrible, no performance gain ► GPU shades in 2x2 quads ► Same cost as rendering native 4K ► Throw away half of the work ► 50% inactive lanes
  • 32. Exploration ► Stencil out 2x2 blocks ► Sampling distribution is bad; less correlation ► Great coverage in a 2x2 block, then a huge hole ► Blurry dilated color bits every 2nd quad ► End up blurring even more to solve ► What’s the point?
  • 33.
  • 34. Exploration ► 2x color and 4x depth checkerboard ► Available on any MSAA platform ► Requires all shaders to load from MS textures (lots of changes) ► Sub-optimal – more on this later ► Alternatively, 2x color and 2x depth [1]
  • 35. Exploration ► 1x color and 4x depth geometry resolve (4K Geometry) ► 1080p to 4K ► Was a good idea, and is similar to SRAA [14] ► Single Pass + IDs + custom reconstruction ► Quality wasn’t high enough ► Abandoned early in favor of 4K CB ► Comparable implementation cost ► More research would improve concept
  • 36. Exploration ► Settled on “packed checkerboard” technique ► Started with PS4™Pro reference implementation ► Customized + optimized further, and incorporated our own TAA Frame N Frame N+1 Resolved
  • 37.
  • 38.
  • 39.
  • 44. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 45. EQAA – What is it? ► EQAA [7] (AMD) is a superset of MSAA ► Possible to store fewer color fragments than depth fragments ► Color <= ID <= Depth ► 4K checkerboard exploits this configuration ► 1x color fragment ► 2x ID fragments ► 2x depth fragments
  • 46. Shading Quads – 2x Color : 4x Depth 8 Shaded, 2 Stored ddx ddx ddy ddy
  • 47. Shading Quads – 1x Color : 2x Depth 4 Shaded, 2 Stored • 2x method: 2 steps and sqrt(2) steps. • 4x method: 2 steps and sqrt(8) steps. ddx ddy
  • 48. EQAA Checkerboard Layout Center Sample 0 Sample 1 -1/4 0 +1/4 -1/4 0 +1/4 Note: Positions are specified per-quad
  • 49. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 50. Object and Primitive ID Buffer ► Generated at a resolution higher than shading resolution ► Each visible geometry sample ► Identifier stored in image buffer ► Uniquely identifies object and primitive ► Instanced draws provide a separate object ID per instance ► Draws take pointer to an array of object IDs, one for each instance ► Tessellated draws are given a primitive ID per input patch
  • 51. Object and Primitive ID Buffer ► Written during depth-only passes, or main scene render ► Depth/ID-only ► Consumes less memory bandwidth than shading passes ► Performed without shader involvement (on PS4™Pro) ► Allows asynchronous compute jobs running in parallel ► CPU and GPU overhead ► We don’t use full pre-pass ► Alpha tested objects + dominant occluders ► Main shading ► Easier to integrate if no existing depth-only pass ► Competes with memory bandwidth ► ID-only samples don’t pay for shading
  • 52. Object and Primitive ID Buffer ► 32-bit, 16-bit, or 8-bit ► We use 31-bits wide ► MSB ignored by hardware ► 14-bit object ID + 17-bit primitive ID ► Primitive ID reset to 0 at the end of each instance or draw
  • 53. Object and Primitive ID Buffer ► Color target bound as 8th MRT ► Uses CB’s data paths and color target tile modes ► Safe to use null pixel shader ► Not treated as a pixel shader target during ID propagation ► IDs can also be exported from vertex shader ► Could reduce number of VS wavefronts active on chip ► Many other interesting use cases ► Not covered in this presentation  ► Can be used in resolve to improve quality of final image [5]
  • 54.
  • 55.
  • 56.
  • 57. Gradient Adjust ► Horizontal gradient is twice the expected value ► Vertical gradient is stretched and rotated 45˚ ► Resembles bad anisotropic filtering Expected Gradient Real Gradient ddx ddy ddx ddy
  • 58. Gradient Adjust ► Need to apply non-uniform rescale ► Rectangular pixels ► LOD Bias ► Doesn’t work – scales uniformly ► Use SampleGrad fetch: Expected Gradient Real Gradient
  • 59. Gradient Adjust 𝑑 𝑢𝑣𝑤 𝑑𝑥 ′ 𝑑 𝑢𝑣𝑤 𝑑𝑦 ′ = 𝑓𝑎𝑐𝑡𝑜𝑟00 𝑓𝑎𝑐𝑡𝑜𝑟01 𝑓𝑎𝑐𝑡𝑜𝑟10 𝑓𝑎𝑐𝑡𝑜𝑟11 ∗ 𝑑 𝑢𝑣𝑤 𝑑𝑥 𝑑 𝑢𝑣𝑤 𝑑𝑦 Identity (No Adjustment): 𝑑 𝑢𝑣𝑤 𝑑𝑥 ′ 𝑑 𝑢𝑣𝑤 𝑑𝑦 ′ = 1.0 0.0 0.0 1.0 ∗ 𝑑 𝑢𝑣𝑤 𝑑𝑥 𝑑 𝑢𝑣𝑤 𝑑𝑦
  • 60. Gradient Adjust Frame N + 1: Frame N + 0: 𝑑 𝑢𝑣𝑤 𝑑𝑥 ′ 𝑑 𝑢𝑣𝑤 𝑑𝑦 ′ = 0.5 0.0 −0.5 1.0 ∗ 𝑑 𝑢𝑣𝑤 𝑑𝑥 𝑑 𝑢𝑣𝑤 𝑑𝑦 𝑑 𝑢𝑣𝑤 𝑑𝑥 ′ 𝑑 𝑢𝑣𝑤 𝑑𝑦 ′ = 0.5 0.0 0.5 1.0 ∗ 𝑑 𝑢𝑣𝑤 𝑑𝑥 𝑑 𝑢𝑣𝑤 𝑑𝑦
  • 61. Gradient Adjust ► Manual gradient correction with shader ALU ► ~10% extra cost in main shading ► SampleGrad issue is more expensive than normal fetch (in cy) ► No change in bandwidth / latency ► Fetch and filter are unaffected ► Increased register pressure ► Keep derivatives around ► Need to be careful!
  • 62. Gradient Adjust ► However… ► Only shade half the pixels ► Not all instructions in the shader are a fetch ► Overall win
  • 63. Gradient Adjust ► PS4™Pro has special hardware and compiler support! ► Hardware can perform this in the texture unit ► Affine transform stored in texture unit ► Just turn it on  ALU portion is free
  • 65. Gradient Adjust ► Most cases can be automatically adjusted ► Explicit gradients (ddx/ddy) need to be manually corrected ► Custom filtering ► Virtual texturing ► etc.
  • 68. Barycentric Evaluation Pixel Positions  Sample Positions Images courtesy of SIE [15]
  • 70. Barycentric Evaluation  Sample Positions Images courtesy of SIE [15]
  • 71.
  • 72. Alpha Unrolling ► Alpha test computes depth/coverage inside pixel shader ► Instead of relying on scan-converter, like opaque ► By default, pixel shader runs at pixel rate ► All samples of a pixel share output of single shader invocation ► IDs at shading rate instead of full rate ► Serious problem with hole reconstruction
  • 73. Alpha Unrolling ► Solution: Run samples at coverage rate! ► Generate full resolution depth and IDs ► Each pixel quad is unrolled ► Shading quad created per sample ► Large increase in pixel shader work ► Important to switch off when not needed
  • 74. Alpha Unrolling ► Run minimal pass to calculate coverage ► Computes coverage/depth (Clip / Depth Write) ► Coverage rate (2x) ► Run expensive shading pass ► Computes color values (Depth Equals) ► Benefits from maximal hidden surface removal ► Pixel rate (1x)
  • 75. Alpha Unrolling ► Positions need to be invariant! ► Positions written by coverage and shading need to match ► Subtle differences in computation can lead to z-fighting ► Disable “fast math” for everything that goes into the position
  • 80. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 81. PS Invoke ► Every sample position… ► If covered by a triangle, can trigger pixel shading ► Checkerboard uses 1x color and 2x depth/IDs ► Half of the shading does not contribute to final image ► How can we prevent over-shading? ► Pixel Shader Invocation Control (PS Invoke) ► Can make samples "non-shading” ► Get the shading work back to what it would be with a single sample
  • 82. FP16 ► PS4™Pro has support for FP16 GCN instructions ► Used throughout checkerboard resolve shader ► 30% performance improvement
  • 83. EQAA Depth Resolve ► Many passes do not require higher-resolution depth and IDs ► Don’t waste depth block (DB) bandwidth! Can be ½ instead ► Transparent objects + post-processing are candidates ► Do not write depth ► Read from single-sample! ► Resolve to single-sample depth and stencil ► Occurs after partial pre-pass + gbuffer laydown
  • 84. EQAA Depth Resolve ► Simple approach is to copy depth with compute shader ► Requires depthHTILE decompression  ► Slow! ► Use color block (CB) trick to resolve without decompression! ► AMD Evergreen Acceleration document [8] describes: ► CB copy of depth to color target ► No decompression needed ► Great! But we want a usable depth surface in the end, not color ► Buckle up.... 
  • 85. EQAA Depth Resolve ► Dummy shader that writes R32F 0.0f ► exp mrt0, 0.0, off, off, off vm done ► Alias the destination depth target as a color target ► 2d non-displayable thin and 1xAA depth micro tiling are the same ► Set DEPTH_COPY bit on DB_RENDER_CONTROL ► PS puts a dummy value on mrt0.x ► DB replaces it later ► The CB writes the depth to the destination Z surface ► Without HTILE!
  • 86. EQAA Depth Resolve ► Stencil is done in a similar manner ► Set STENCIL_COPY bit in DB_RENDER_CONTROL ► CB writes stencil to G channel ► Destination is a color target aliasing stencil ► Only R channel present ► Adjust CB color info register to swizzle ► G to R on write ► As per COMP_SWAP [4] to configure as SWAP_ALT
  • 88.
  • 89.
  • 90.
  • 91. EQAA Depth Resolve ► Original HTILE has correct ranges ► but has compressed ZMask codes ► Patch up the HTILE by copying it ► Fast compute shader! (6µs) ► Force ZMask bits to 0xF (expanded) ► Store patched meta data ► HTILE acceleration works on the 1xAA destination ► Any further writes will compress as expected
  • 92. EQAA Depth Resolve ► HTILE copy:
  • 93. EQAA Depth Resolve ► This saved us 1ms over basic copy! ► Leaves the source depth and stencil fully compressed ► DBs and CBs do the work ► Which understand compression ► Instead of shader cores doing it ► Would require decompression ► Technique is completely bandwidth-bound  ► ~0.1ms for depth, ~0.1ms for stencil
  • 94. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 95. Post Processing ► Trade-off between performance and quality ► Move as much post-processing before CB resolve as possible ► Very time consuming and painful work ► Evolving codebase breaks checkerboard often ► New concept to many ► Auto-test checkerboard pipeline if possible
  • 96. Post Processing ► Operations that ignore geometry provide little value at 4K ► i.e. SSAO, luminance estimation, etc.. ► Limiting color propagation over an edge - run at lower res ► Need to account for checkerboard ”pixel grid” ► Linear sampling of checkerboard surface is problematic ► Aspect ratio of the buffer is different from normal 16:9 ► Most cases you can ½ the horizontal filter width
  • 97. Test Scene - Broken
  • 98. Test Scene - Fixed
  • 99. Post Processing Corrected with: Offsets UV used for clip-space position reconstruction
  • 100.
  • 101. 1600x1800 • Clear (IDs and Depth) • Partial Z-Pass • G-Buffer Laydown • Resolve AA Depth • G-Buffer Decals • HBAO + Shadows • Tiled Lighting + SSS • Emissive • Sky • Transparency • Velocity Vectors 3200x1800 • CB Resolve + Temporal AA • Motion Blur • Foreground Transparency • Gaussian Pyramid • Final Post-Processing • Silhouette Outlines 3840x2160 • Display Mapping + Resample BF1 : PS4™Pro
  • 102. 1600x1800 • Clear (IDs and Depth) • Partial Z-Pass • G-Buffer Laydown • Resolve AA Depth • G-Buffer Decals • HBAO + Shadows • Tiled Lighting + SSS • Emissive • Sky • Transparency • Velocity Vectors 3200x1800 • CB Resolve + Temporal AA • Sprite Depth-of-Field • Motion Blur • Foreground Transparency • Gaussian Pyramid • Final Post-Processing • Silhouette Outlines 3840x2160 • Display Mapping + Resample MEA : PS4™Pro
  • 103. Checkerboard Resolve ► Spatial Component ► Color Bounding Box ► Differential Blend Operator ► Object and Primitive IDs ► Temporal Component ► Sub-Pixel Jittering ► Velocity-Based Reprojection ► Neighborhood Clamping
  • 104. Spatial Component ► Color Bounding Box ► Differential Blend Operator ► Object and Primitive IDs
  • 105. Color Bounding Box ► Checkerboard resolve does not use depth ► Use color to determine if edges are soft ► Avoid contribution to a pixel from different objects ► If all neighbors are different objects, average is used
  • 106. Color Bounding Box A B C Image courtesy of SIE [15]
  • 107. Color Bounding Box Copy Single Sample  Comparison Heuristic
  • 108. Color Bounding Box Copy Single Sample
  • 109. Color Bounding Box  Comparison Heuristic
  • 110. Differential Blend Operator Neighborhood Average  Differential Blending Images courtesy of SIE [15]
  • 111. Differential Blend Operator Differential Blending  Differential Blending + IDs Images courtesy of SIE [15]
  • 112. Differential Blend Operator Images courtesy of SIE [15] Neighborhood Average
  • 113. Differential Blend Operator Images courtesy of SIE [15]  Differential Blending + IDs
  • 115. Object and Primitive IDs Object ID Primitive ID   Object ID Primitive ID   Object ID Primitive ID   Object ID Primitive ID   Images courtesy of SIE [15]
  • 116. Temporal Component ► Similar to existing research - [2][11][12] ► Not covered here ► Sub-Pixel Jittering ► Neighborhood Clamping ► Velocity-Based Reprojection ► Updated for checkerboard
  • 117. Velocity-Based Reprojection ► Objects and surfaces are moving ► Pixel grid is stationary ► Need to correlate 3D surfaces in motion ► Surfaces write per-pixel velocity vectors ► Dilate velocities to keep anti-aliased edges ► Use front-most velocity in 3x3 window ► Depth is expensive to fetch!
  • 118. Dilate Velocity Separately For Each Pixel 2 Pixels Per Thread Pass-through and Filtered Estimate Velocity Variation Use Average Velocity Is Similar? YES NO X Pass-through Pixel Filtered Pixel X
  • 119.
  • 120. Checkerboard Input LDS Half Res (CB) Spatial Resolve LDS Full Res (No CB) Fetch History Temporal AA LDS Sharpen Filter History (Full) Final Frame (Full) 2 Pixels Per Thread Pass-through and Filtered Fetch ID (Not in LDS) Reprojection Checkerboard Resolve
  • 121. Sharpen Filter ► Blurred image lacks high-frequency components ► Hard to discern objects, visual cortex lacks information ► Ganglion cells respond acutely to high-frequency components [6]
  • 122. Sharpen Filter ► Amplify (recover) high-frequency components ► Greatly enhance visual quality ► Performed after image reconstruction in temporal AA ► Be careful to not reintroduce aliasing ► (false high-frequency) ► Be careful to not introduce ringing ► (over-sharpening)
  • 123. Before
  • 124. After
  • 125. Before
  • 126. After
  • 127. Before
  • 128. After
  • 129.
  • 130. Post Processing ► Once checkerboard is in enabled, brace yourself ► Every single artifact now looks like a checkerboard artifact ► Add lots of debug overlays to help debug ► Teach others how to diagnose!
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Future Work
  • 137. Render Target Aliasing ► Biggest pain of 4K transition was memory! ► Majority of our render targets were to blame ► At the time, no fancy memory management of them ► Extra effort spent here impacted other 4K improvements ► “11th hour” explicit aliasing to ship – saved ~230mb ► Future Titles ► See: FrameGraph: Extensible Rendering Architecture in Frostbite
  • 138.
  • 139. Dynamic Resolution Scaling ► Developed by DICE and Microsoft ► Plays nicely with 4K checkerboard ► Checkerboard always active ► Dynamic scaled initial frame resolution ► Determined with running performance heuristic
  • 140. Dynamic Resolution Scaling ► BF1 contains a number of infrequently running GPU tasks ► Caused resolution to slightly adjust almost every frame ► Not an issue ► Worked with jitter to provide variation in subpixel detail ► Tried preventing upscale if camera wasn’t moving ► Resulted in noticeably lower quality image
  • 141. Dynamic Resolution Scaling ►No one complained about it!
  • 142. Agenda ► Motivation ► Configuration ► Features ► Optimizations ► Post Processing ► Pipeline ► Conclusion
  • 144. Performance ► Total: 15.99 ms ► Clear IDs: 0.57 ms ► Copy Expanded HTile [3x]: 0.02 ms ► Resolve EQAA Depth [3x]: 0.69 ms ► CB Resolve + Temporal AA: 1.15 ms ► Everything Else: 13.56 ms ► 1800p CB vs 1800p (saves 5.08 ms) PS4™Pro
  • 145. Performance Timer 1800p 1800p CB G-Buffer 4.25 ms 3.11 ms Shadows 4.13 ms 2.63 ms Lighting 2.31 ms 1.61 ms Sky 0.63 ms 0.32 ms Half Res 1.54 ms 0.77 ms Velocity 0.57 ms 0.25 ms HBAO 2.47 ms 1.08 ms PS4™Pro
  • 147. Performance ► Total: 23.42 ms ► Clear IDs: 0.41 ms ► Copy Expanded HTile [3x]: 0.02 ms ► Resolve EQAA Depth [3x]: 0.80 ms ► CB Resolve + Temporal AA: 2.00 ms ► Everything Else: 20.19 ms ► 1800p CB vs 1800p (saves 13.40 ms) PS4™Pro
  • 148. Performance Timer 1800p 1800p CB G-Buffer 5.11 ms 4.41 ms Shadows 3.70 ms 3.32 ms Lighting 13.5 ms 7.32 ms Sky 0.50 ms 0.29 ms Half Res 2.53 ms 1.48 ms Velocity 0.32 ms 0.27 ms HBAO 3.11 ms 1.27 ms PS4™Pro
  • 149. Future Work ► Further improvements to checkerboard resolve ► Alternative approaches to alpha testing ► Alpha mask export ► Pixel rate coverage (not sample) ► More uses for IDs ► Temporally stable object and primitive IDs? ► Replace stencil-based tagging [9] ► Help eliminate ghosting with temporal re-projection ► Remove heuristics in favor of comparisons
  • 150. Future Work ► Packed checkerboard on other platforms ► Hardware features like ID buffer only exist on PS4™Pro ► Reasonable workarounds exist for Xbox™ and base PS4™. ► Need EQAA + programmable sample locations ► Vulkan + DirectX12 ► Driver support for efficient EQAA depth resolve ► Leave source compressed ► Bandwidth-bound
  • 151. Future Work ► Filtered visibility buffer [10] + decoupled shading ► G-buffers are challenging at high resolutions ► Classic deferred shading isn’t the answer ► Decouple g-buffer from screen resolution
  • 152. Special Thanks ► Tobias Berghoff ► Nicolas Serres ► Tim Dann ► Tomasz Stachowiak ► Yasin Uludag ► Keven Cantin ► Mark Cerny ► Johan Andersson ► Christina Coffin ► Chris Brennan ► Timothy Lottes ► Rob Srinivasiah ► Jason Scanlin ► Dave Simpson ► Bart Wronski ► Martin Fuller ► James Stanard ► Ivan Nevraev ► Colin Barré-Brisebois ► Matthäus Chajdas ► Frostbite Rendering ► Sucker Punch
  • 153. References ► [1] Rendering ‘Rainbox Six | Siege’ – Jalal El Mansouri ► [2] High Quality Temporal Supersampling – Brian Karis ► [3] Taking Killzone Shadow Fall Image Quality Into The Next Generation – Michal Valient ► [4] Radeon Southern Islands 3D/Compute Register Reference Guide – AMD ► [5] MLAA on PS3 – Tobias Berghoff, Cedric Perthuis ► [6] Hubel, D. H., and T. N. Wiesel. 1979. "Brain Mechanisms of Vision." Scientific American 241(3), pp. 150–162. ► [7] EQAA Modes for AMD 6900 Series Graphics Cards – AMD ► [8] Radeon Evergreen/Northern Islands Acceleration - AMD ► [9] Temporal Antialiasing in Uncharted 4 – Naughty Dog ► [10] The Filtered and Culled Visibility Buffer – Wolfgang Engel ► [11] Real-Time Global Illumination and Reflections in Dust 514 – Hugh Malan ► [12] Anti-Aliasing Methods in CryENGINE 3 – Tiago Sousa ► [13] Dynamic Resolution Rendering – Doug Binks ► [14] Subpixel Reconstruction Antialiasing – NVIDIA ► [15] "PlayStation®4 SDK / PS4™Pro High Resolution Technologies " - SIE (Tobias Berghoff, Tim Dann) ► [16] "Implementing 4K Checkerboard - Lessons from 4K Knack and U4 Demos", Jason Scanlin - SIE WWS ICE, May 2016 ► [17] "Making the inFAMOUS 4K Demo", Tobias Berghoff - SIE WWS ATG, May 2016