SlideShare a Scribd company logo
1 of 53
RENDERING BATTLEFIELD 4
WITH MANTLE
Johan Andersson – Electronic Arts
2
3
DX11 Mantle
Avg: 78 fps
Min: 42 fps
Core i7-3970x, AMD Radeon R9 290x, 1080p ULTRA
Avg: 120 fps
Min: 94 fps+58%!
4
BF4 MANTLE GOALS
Goals:
– Significantly improve CPU performance
– More consistent & stable performance
– Improve GPU performance where possible
– Add support for a new Mantle rendering
backend in a live game
 Minimize changes to engine interfaces
 Compatible with built PC content
– Work on wide set of hardware
 APU to quad-GPU
 But x64 only (32-bit Windows needs to die)
Non-goals:
– Design new renderer from scratch for Mantle
– Take advantage of asymmetric MGPU
(APU+discrete)
– Optimize video memory consumption
5
BF4 MANTLE STRATEGIC GOALS
 Prove that low-level graphics APIs work outside of consoles
 Push the industry towards low-level graphics APIs everywhere
 Build a foundation for the future that we can build great games on
6
SHADERS
7
SHADERS
 Shader resource bind points replaced with a resource table object - descriptor set
– This is how the hardware accesses the shader resources
– Flat list of images, buffers and samplers used by any of the shader stages
– Vertex shader streams converted to vertex shader buffer loads
 Engine assign each shader resource to specific slot in the descriptor set(s)
– Can share slots between shader stages = smaller descriptor sets
– The mapping takes a while to wrap one’s head around
8
SHADER CONVERSION
 DX11 bytecode shaders gets converted to AMDIL & mapping applied using ILC tool
– Done at load time
– Don’t have to change our shaders!
 Have full source & control over the process
 Could write AMDIL directly or use other frontends if wanted
9
DESCRIPTOR SETS
 Very simple usage in BF4: for each draw call write flat list of resources
–Essentially direct replacement of SetTexture/SetConstantBuffer/SetInputStream
 Single dynamic descriptor set object per frame
 Sub-allocate for each draw call and write list of resources
 ~15000 resource slots written per frame in BF4, still very fast
10
DESCRIPTOR SETS
11
DESCRIPTOR SETS – FUTURE OPTIMIZATIONS
 Use static descriptor sets when possible
 Reduce resource duplication by reusing & sharing more across shader stages
 Nested descriptor sets
12
COMPUTE PIPELINES
 1:1 mapping between pipeline & shader
 No state built into pipeline
 Can execute in parallel with rendering
 ~100 compute pipelines in BF4
13
GRAPHICS PIPELINES
 All graphics shader stages combined to a single pipeline object together with important graphics state
 ~10000 graphics pipelines in BF4 on a single level, ~25 MB of video memory
 Could use smaller working pool of active state objects to keep reasonable amount in memory
– Have not been required for us
14
PRE-BUILDING PIPELINES
 Graphics pipeline creation is expensive operation, do at load time instead of runtime!
– Creating one of our graphics pipelines take ~10-60 ms each
– Pre-build using N parallel low-priority jobs
– Avoid 99.9% of runtime stalls caused by pipeline creation!
 Requires knowing the graphics pipeline state that will be used with the shaders
– Primitive type
– Render target formats
– Render target write masks
– Blend modes
 Not fully trivial to know all state, may require engine changes / pre-defining use cases
– Important to design for!
15
PIPELINE CACHE
 Cache built pipelines both in memory cache and disk cache
– Improved loading times
– Max 300 MB
– Simple LRU policy
– LZ4 compressed (free)
 Database signature:
– Driver version
– Vendor ID
– Device ID
16
MEMORY
17
MEMORY MANAGEMENT
 Mantle devices exposes multiple memory heaps with characteristics
– Can be different between devices, drivers and OS:es
 User explicitly places resources in wanted heaps
– Driver suggests preferred heaps when creating objects, not a requirement
Type Size Page CPU access GPU
Read
GPU
Write
CPU
Read
CPU
Write
Local 256 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 130 170 0.0058 2.8
Local 4096 MB 65535 130 180 0 0
Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 2.6 2.6 0.1 3.3
Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent 2.6 2.6 3.2 2.9
18
FROSTBITE MEMORY HEAPS
 System Shared Mapped
– CPU memory that is GPU visible.
– Write combined & persistently mapped = easy
& fast to write to in parallel at any time
 System Shared Pinned
– CPU cached for readback.
– Not used much
 Video Shared
– GPU memory accessible by CPU. Used for
descriptor sets and dynamic buffers
– Max 256 MB (legacy constraint)
– Avoid keeping persistently mapped as WDMM
doesn’t like this and can decide to move it back
to CPU memory 
 Video Private
– GPU private memory.
– Used for render targets, textures and other
resources CPU does not need to access
19
MEMORY REFERENCES
 WDDM needs to know which memory allocations are referenced for each command buffer
– In order to make sure they are resident and not paged out
– Max ~1700 memory references are supported
– Overhead with having lots of references
 Engine needs to keep track of what memory is referenced while building the command buffers
– Easy & fast to do
– Each reference is either read-only or read/write
– We use a simple global list of references shared for all command buffers.
20
MEMORY POOLING
 Pooling memory allocations were required for us
– Sub allocate within larger 1 – 32 MB chunks
– All resources stored memory handle + offset
– Not as elegant as just void* on consoles
– Fragmentation can be a concern, not too much issues for us in practice
 GPU virtual memory mapping is fully supported, can simplify & optimize management
21
OVERCOMMITTING VIDEO MEMORY
 Avoid overcommitting video memory!
– Will lead to severe stalls as VidMM moves blocks and moves memory back and forth
– VidMM is a black box 
– One of the biggest issues we ran into during development
 Recommendations
– Balance memory pools
– Make sure to use read-only memory references
– Use memory priorities
22
MEMORY PRIORITIES
 Setting priorities on the memory allocations helps VidMM choose what to page out when it has to
 5 priority levels
– Very high = Render targets with MSAA
– High = Render targets and UAVs
– Normal = Textures
– Low = Shader & constant buffers
– Very low = vertex & index buffers
23
MEMORY RESIDENCY FUTURE
 For best results manage which resources are in video memory yourself & keep only ~80% used
– Avoid all stalls
– Can async DMA in and out
 We are thinking of redesigning to fully avoid possibility of overcommitting
 Hoping WDDM’s memory residency management can be simplified & improved in the future
24
RESOURCE MANAGEMENT
25
RESOURCE LIFETIMES
 App manages lifetime of all resources
– Have to make sure GPU is not using an object or memory while we are freeing it on the CPU
– How we’ve always worked with GPUs on the consoles
– Multi-GPU adds some additional complexity that consoles do not have
 We keep track of lifetimes on a per frame granularity
– Queues for object destruction & free memory operations
– Add to queue at any time on the CPU
– Process queues when GPU command buffers for the frame are done executing
– Tracked with command buffer fences
26
LINEAR FRAME ALLOCATOR
 We use multiple linear allocators with Mantle for both transient buffers & images
– Used for huge amount of small constant data and other GPU frame data that CPU writes
– Easy to use and very low overhead
– Don’t have to care about lifetimes or state
 Fixed memory buffers for each frame
– Super cheap sub-allocation from from any thread
– If full, use heap allocation (also fast due to pooling)
 Alternative: ring buffers
– Requires being able to stall & drain pipeline at any allocation if full, additional complexity for us
27
TILING
 Textures should be tiled for performance
– Explicitly handled in Mantle, user selects linear or tiled
– Some formats (BC) can’t be accessed as linear by the GPU
 On consoles we handle tiling offline as part of our data processing pipeline
– We know the exact tiling formats and have separate resources per platform
 For Mantle
– Tiling formats are opaque, can be different between GPU architectures and image types
– Tile textures with DMA image upload from SystemShared to VideoPrivate
 Linear source, tiled destination
 Free
28
COMMAND BUFFERS
29
COMMAND BUFFERS
 Command buffers are the atomic unit of work dispatched to the GPU
– Separate creation from execution
– No “immediate context” a la DX11 that can execute work at any call
– Makes resource synchronization and setup significantly easier & faster
 Typical BF4 scenes have around ~50 command buffers per frame
– Reasonable tradeoff for us with submission overhead vs CPU load-balancing
30
COMMAND BUFFER SOURCES
 Frostbite has 2 separate sources of command buffers
– World rendering
 Rendering the world with tons of objects, lots of draw calls. Have all frame data up front
 All resources except for render targets are read-only
 Generated in parallel up front each frame
– Immediate rendering (“the rest”)
 Setting up rendering and doing lighting, post-fx, virtual texturing, compute, etc
 Managing resource state, memory and running on different queues (graphics, compute, DMA)
 Sequentially generated in a single job, simulate an immediate context by splitting the command buffer
 Both are very important and have different requirements
31
RESOURCE TRANSITIONS
 Key design in Mantle to significantly lower driver overhead & complexity
– Explicit hazard tracking by the app/engine
– Drives architecture-specific caches & compression
– AMD: FMASK, CMASK, HTILE
– Enables explicit memory management
 Examples:
– Optimal render target writes → Graphics shader read-only
– Compute shader write-only → DrawIndirect arguments
 Mantle has a strong validation layer that tracks transitions which is a major help
32
MANAGING RESOURCE TRANSITIONS
 Engines need a clear design on how to handle state transitions
 Multiple approaches possible:
– Sequential in-order command buffers
 Generate one command buffer at the time in order
 Transition resources on-demand when doing operation on them, very simple
 Recommendation: start with this
– Out-of-order multiple command buffers
 Track state per command buffer, fix up transitions when order of command buffers is known
– Hybrid approaches & more
33
MANAGING RESOURCE TRANSITIONS IN FROSTBITE
 Current approach in Frostbite is quite basic:
– We keep track of a single state for each resource (not subresource)
– The “immediate rendering” transition resources as needed depending on operation
– The out of order “world rendering” command buffers don’t need to transition states
 Already have write access to MRTs and read-access to all resources setup outside them
 Avoids the problem of them not knowing the state during generation
 Works now but as we do more general parallel rendering it will have to change
– Track resource state for each command buffer & fixup between command buffers
34
DYNAMIC STATE OBJECTS
 Graphics state is only set with the pipeline object and 5 dynamic state objects
– State objects: color blend, raster, viewport, depth-stencil, MSAA
– No other parameters such as in DX11 with stencil ref or SetViewport functions
 Frostbite use case:
– Pre-create when possible
– Otherwise on-demand creation (hash map)
– Only ~100 state objects!
 Still possible to end up with lots of state objects
– Esp. with state object float & integer values (depth bounds, depth bias, viewport)
– But no need to store all permutations in memory, objects are fast to create & app manages lifetimes
35
QUEUES
36
QUEUES
 Universal queue can do both graphics, compute and presents
 We use also use additional queues to parallelize GPU operations:
– DMA queue – Improve perf with faster transfers & avoiding idling graphics will transfering
– Compute queue - Improve perf by utilizing idle ALU and update resources simultaneously with gfx
 More GPUs = more queues!
37
 Order of execution within a queue is sequential
 Synchronize multiple queues with GPU semaphores (signal & wait)
 Also works across multiple GPUs
Compute
Graphics
QUEUES SYNCHRONIZATION
S
Wait
W
S
38
QUEUES SYNCHRONIZATION CONT
 Started out with explicit semaphores
– Error prone to handle when having lots of different semaphores & queues
– Difficult to visualize & debug
 Switched to more representation more similar to a job graph
 Just a model on top of the semaphores
39
GPU JOB GRAPH
 Each GPU job has list of dependencies (other command buffers)
 Dependencies has to finish first before job can run on its queue
 The dependencies can be from any queue
 Was easier to work with, debug and visualize
 Really extendable going forward
Graphics 1 Graphics 2
DMA
Compute
Graphics 2
40
ASYNC DMA
 AMD GPUs have dedicated hardware DMA engines, let’s use them!
– Uploading through DMA is faster than on universal queue, even if blocking
– DMA have alignment restrictions, have to support falling back to copies on universal queue
 Use case: Frame buffer & texture uploads
– Used by resource initial data uploads and our UpdateSubresource
– Guaranteed to be finished before the GPU universal queue starts rendering the frame
 Use case: Multi-GPU frame buffer copy
– Peer-to-peer copy of the frame buffer to the GPU that will present it
41
ASYNC COMPUTE
 Frostbite has lots of compute shader passes that could run in parallel with graphics work
– HBAO, blurring, classification, tile-based lighting, etc
 Running as async compute can improve GPU performance by utilizing ”free” ALU
– For example while doing shadowmap rendering (ROP bound)
42
ASYNC COMPUTE – TILE-BASED LIGHTING
 3 sequential compute shaders
– Input: zbuffer & gbuffer
– Output: HDR texture/UAV
 Runs in parallel with graphics pipeline that renders to other targets
Compute
Graphics
TileZ
Gbuffer Shadowmaps Reflection Distort Transp
Cull lights Lighting
S
SWait
W
43
ASYNC COMPUTE – TILE-BASED LIGHTING
 We manually prepare the resources for the async compute
– Important to not access the resources on other queues at the same time (unless read-only state)
– Have to transition resources on the queue that last used it
 Up to 80% faster in our initial tests, but not fully reliable
– But is a pretty small part of the frame time
– Not in BF4 yet
Compute
Graphics
TileZ
Gbuffer Shadowmaps Reflection Distort Transp
Cull lights Lighting
S
SWait
W
44
MULTI-GPU
45
MULTI-GPU
 Multi-GPU alternatives:
– AFR – Alternate Frame Rendering (1-4 GPUs of the same power)
– Heterogeneous AFR – 1 small + 1 big GPU (APU + Discrete)
– SFR – Split Frame Rendering
– Multi-GPU Job Graph – Primary strong GPU + slave GPUs helping
 Frostbite supports AFR natively
– No synchronization points within the frame
– For resources that are not rendered every frame: re-render resources for each GPU
 Example: sky envmap update on weather change
 With Mantle multi-GPU is explicit and we have to build support for it ourselves
46
MULTI-GPU AFR WITH MANTLE
 All resources explicitly duplicated on each GPU with async DMA
– Hidden internally in our rendering abstraction
 Every frame alternate which GPU we build command buffers for and are using resources from
 Our UpdateSubresource has to make sure it updates resources on all GPU
 Presenting the screen has to in some modes copy the frame buffer to the GPU that owns the display
 Bonus:
– Can simulate multi-GPU mode even with single GPU!
– Multi-GPU works in windowed mode!
47
 GPUs are independently rendering & presenting to the screen – can cause micro-stuttering
– Frames are not presented in a regular intervals
– Frame rate can be high but presentation & gameplay is not smooth
– FCAT is a good tool to analyse this
MULTI-GPU ISSUES
GPU0
GPU1
Frame 0 P
Frame 1 P
Frame 2 P
Frame 3 P
GPU0
GPU1
Irregular
presentation
interval
48
 GPUs are independently rendering & presenting to the screen – can cause micro-stuttering
– Frames are not presented in a regular intervals
– Frame rate can be high but presentation & gameplay is not smooth
– FCAT is a good tool to analyse this
 We need to introduce dependency & dampening between the GPUs to alleviate this – frame pacing
MULTI-GPU ISSUES
GPU0
GPU1
Frame 0 P
Frame 1 P
Frame 2 P
Frame 3 P
Ideal
presentation
interval
49
FRAME PACING
 Measure average frame rate on each GPU
– Short history (10-30 frames)
– Filter out spikes
 Insert delay on the GPU before each present
– Force the frame times to become more regular and GPUs to align
– Delay value is based on the calculate avg frame rate
GPU0
GPU1
Frame 0 P
Frame 1 P
Frame 2 P
Frame 3 P
GPU0
GPU1
Delay
D
50
CONCLUSION
51
MANTLE DEV RECOMMENDATIONS
 The validation layer is a critical friend!
 You’ll end up with a lot of object & memory management code, try share with console code
 Make sure you have control over memory usage and can avoid overcommitting video memory
 Build a robust solution for resource state management early
 Figure out how to pre-create your graphics pipelines, can require engine design changes
 Build for multi-GPU support from the start, easier than to retrofit
52
FUTURE
 Second wave of Frostbite Mantle titles
 Adapt Frostbite core rendering layer based on learnings from Mantle
– Refine binding & buffer updates to further reduce overhead
– Virtual memory management
– More async compute & async DMAs
– Multi-GPU job graph R&D
 Linux
– Would like to see how our Mantle renderer behaves with different memory management & driver model
53
QUESTIONS?
Email: johan@frostbite.com
Web: http://frostbite.com
Twitter: @repi

More Related Content

What's hot

Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The SurgePhilip Hammer
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbiteElectronic Arts / DICE
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lightingozlael ozlael
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbiteElectronic Arts / DICE
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War IIISlide_N
 

What's hot (20)

Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Masked Occlusion Culling
Masked Occlusion CullingMasked Occlusion Culling
Masked Occlusion Culling
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
The Unique Lighting of Mirror's Edge
The Unique Lighting of Mirror's EdgeThe Unique Lighting of Mirror's Edge
The Unique Lighting of Mirror's Edge
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
 

Viewers also liked

FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontElectronic Arts / DICE
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)Electronic Arts / DICE
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsElectronic Arts / DICE
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Johan Andersson
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game EngineJohan Andersson
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Electronic Arts / DICE
 
Executable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itExecutable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itElectronic Arts / DICE
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 

Viewers also liked (20)

Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Battlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + MantleBattlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + Mantle
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Introduction to Data Oriented Design
Introduction to Data Oriented DesignIntroduction to Data Oriented Design
Introduction to Data Oriented Design
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
 
Executable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itExecutable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight it
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 

Similar to Rendering Battlefield 4 with Mantle

Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationedlangley
 
UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6edlangley
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent MemorySwaminathan Sundararaman
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
DB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlDB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlFlorence Dubois
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 
Sony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development DivisionSony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development DivisionSlide_N
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleJames Saint-Rossy
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmicguest40fc7cd
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailInternet World
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linuxmountpoint.io
 
Hardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processorsHardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processorsAchille Peternier
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciênciaCampus Party Brasil
 

Similar to Rendering Battlefield 4 with Mantle (20)

Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Graphics APIs
Low-level Graphics APIsLow-level Graphics APIs
Low-level Graphics APIs
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentation
 
UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent Memory
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
DB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlDB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and control
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
Sony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development DivisionSony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development Division
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
Hardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processorsHardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processors
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
 

More from Electronic Arts / DICE

GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeSIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeElectronic Arts / DICE
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingElectronic Arts / DICE
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanElectronic Arts / DICE
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingCEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingElectronic Arts / DICE
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismCEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismElectronic Arts / DICE
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringElectronic Arts / DICE
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...Electronic Arts / DICE
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingElectronic Arts / DICE
 
Creativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsCreativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsElectronic Arts / DICE
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDShiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDElectronic Arts / DICE
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
 

More from Electronic Arts / DICE (17)

GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeSIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingCEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismCEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
 
Creativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsCreativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural Systems
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDShiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
Modular Rigging in Battlefield 3
Modular Rigging in Battlefield 3Modular Rigging in Battlefield 3
Modular Rigging in Battlefield 3
 

Recently uploaded

Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...
Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...
Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...Apsara Of India
 
VIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service Kolhapur
VIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service KolhapurVIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service Kolhapur
VIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service KolhapurRiya Pathan
 
Call Girl Nashik Saloni 7001305949 Independent Escort Service Nashik
Call Girl Nashik Saloni 7001305949 Independent Escort Service NashikCall Girl Nashik Saloni 7001305949 Independent Escort Service Nashik
Call Girl Nashik Saloni 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...
VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...
VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...Neha Kaur
 
Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448ont65320
 
↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...
↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...
↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...noor ahmed
 
College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...
College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...
College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...anamikaraghav4
 
Call Girl Nashik Amaira 7001305949 Independent Escort Service Nashik
Call Girl Nashik Amaira 7001305949 Independent Escort Service NashikCall Girl Nashik Amaira 7001305949 Independent Escort Service Nashik
Call Girl Nashik Amaira 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...
Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...
Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...anamikaraghav4
 
Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7
Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7
Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7Riya Pathan
 
College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...
College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...
College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...perfect solution
 
VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...
VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...
VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...anamikaraghav4
 
VIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur Escorts
VIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur EscortsVIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur Escorts
VIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Air-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment Booking
Air-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment BookingAir-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment Booking
Air-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment BookingRiya Pathan
 
Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...
Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...
Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...anamikaraghav4
 
Beyond Bar & Club Udaipur CaLL GiRLS 09602870969
Beyond Bar & Club Udaipur CaLL GiRLS 09602870969Beyond Bar & Club Udaipur CaLL GiRLS 09602870969
Beyond Bar & Club Udaipur CaLL GiRLS 09602870969Apsara Of India
 
Low Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service Ajmer
Low Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service AjmerLow Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service Ajmer
Low Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service AjmerRiya Pathan
 
👙 Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service
👙  Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service👙  Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service
👙 Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Serviceanamikaraghav4
 
Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...
Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...
Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...Riya Pathan
 
GV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICE
GV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICEGV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICE
GV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICEApsara Of India
 

Recently uploaded (20)

Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...
Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...
Contact:- 8860008073 Call Girls in Karnal Escort Service Available at Afforda...
 
VIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service Kolhapur
VIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service KolhapurVIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service Kolhapur
VIP Call Girl Kolhapur Aashi 8250192130 Independent Escort Service Kolhapur
 
Call Girl Nashik Saloni 7001305949 Independent Escort Service Nashik
Call Girl Nashik Saloni 7001305949 Independent Escort Service NashikCall Girl Nashik Saloni 7001305949 Independent Escort Service Nashik
Call Girl Nashik Saloni 7001305949 Independent Escort Service Nashik
 
VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...
VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...
VIP Call Girls Darjeeling Aaradhya 8250192130 Independent Escort Service Darj...
 
Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448
 
↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...
↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...
↑Top Model (Kolkata) Call Girls Sonagachi ⟟ 8250192130 ⟟ High Class Call Girl...
 
College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...
College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...
College Call Girls New Alipore - For 7001035870 Cheap & Best with original Ph...
 
Call Girl Nashik Amaira 7001305949 Independent Escort Service Nashik
Call Girl Nashik Amaira 7001305949 Independent Escort Service NashikCall Girl Nashik Amaira 7001305949 Independent Escort Service Nashik
Call Girl Nashik Amaira 7001305949 Independent Escort Service Nashik
 
Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...
Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...
Verified Call Girls Esplanade - [ Cash on Delivery ] Contact 8250192130 Escor...
 
Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7
Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7
Kolkata Call Girl Bara Bazar 👉 8250192130 ❣️💯 Available With Room 24×7
 
College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...
College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...
College Call Girl in Rajiv Chowk Delhi 9634446618 Short 1500 Night 6000 Best ...
 
VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...
VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...
VIP Call Girls Sonagachi - 8250192130 Escorts Service 50% Off with Cash ON De...
 
VIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur Escorts
VIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur EscortsVIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur Escorts
VIP Call Girls Nagpur Megha Call 7001035870 Meet With Nagpur Escorts
 
Air-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment Booking
Air-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment BookingAir-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment Booking
Air-Hostess Call Girls Shobhabazar | 8250192130 At Low Cost Cash Payment Booking
 
Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...
Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...
Russian Call Girl South End Park - Call 8250192130 Rs-3500 with A/C Room Cash...
 
Beyond Bar & Club Udaipur CaLL GiRLS 09602870969
Beyond Bar & Club Udaipur CaLL GiRLS 09602870969Beyond Bar & Club Udaipur CaLL GiRLS 09602870969
Beyond Bar & Club Udaipur CaLL GiRLS 09602870969
 
Low Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service Ajmer
Low Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service AjmerLow Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service Ajmer
Low Rate Call Girls Ajmer Anika 8250192130 Independent Escort Service Ajmer
 
👙 Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service
👙  Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service👙  Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service
👙 Kolkata Call Girls Shyam Bazar 💫💫7001035870 Model escorts Service
 
Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...
Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...
Independent Hatiara Escorts ✔ 8250192130 ✔ Full Night With Room Online Bookin...
 
GV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICE
GV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICEGV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICE
GV'S 24 CLUB & BAR CONTACT 09602870969 CALL GIRLS IN UDAIPUR ESCORT SERVICE
 

Rendering Battlefield 4 with Mantle

  • 1. RENDERING BATTLEFIELD 4 WITH MANTLE Johan Andersson – Electronic Arts
  • 2. 2
  • 3. 3 DX11 Mantle Avg: 78 fps Min: 42 fps Core i7-3970x, AMD Radeon R9 290x, 1080p ULTRA Avg: 120 fps Min: 94 fps+58%!
  • 4. 4 BF4 MANTLE GOALS Goals: – Significantly improve CPU performance – More consistent & stable performance – Improve GPU performance where possible – Add support for a new Mantle rendering backend in a live game  Minimize changes to engine interfaces  Compatible with built PC content – Work on wide set of hardware  APU to quad-GPU  But x64 only (32-bit Windows needs to die) Non-goals: – Design new renderer from scratch for Mantle – Take advantage of asymmetric MGPU (APU+discrete) – Optimize video memory consumption
  • 5. 5 BF4 MANTLE STRATEGIC GOALS  Prove that low-level graphics APIs work outside of consoles  Push the industry towards low-level graphics APIs everywhere  Build a foundation for the future that we can build great games on
  • 7. 7 SHADERS  Shader resource bind points replaced with a resource table object - descriptor set – This is how the hardware accesses the shader resources – Flat list of images, buffers and samplers used by any of the shader stages – Vertex shader streams converted to vertex shader buffer loads  Engine assign each shader resource to specific slot in the descriptor set(s) – Can share slots between shader stages = smaller descriptor sets – The mapping takes a while to wrap one’s head around
  • 8. 8 SHADER CONVERSION  DX11 bytecode shaders gets converted to AMDIL & mapping applied using ILC tool – Done at load time – Don’t have to change our shaders!  Have full source & control over the process  Could write AMDIL directly or use other frontends if wanted
  • 9. 9 DESCRIPTOR SETS  Very simple usage in BF4: for each draw call write flat list of resources –Essentially direct replacement of SetTexture/SetConstantBuffer/SetInputStream  Single dynamic descriptor set object per frame  Sub-allocate for each draw call and write list of resources  ~15000 resource slots written per frame in BF4, still very fast
  • 11. 11 DESCRIPTOR SETS – FUTURE OPTIMIZATIONS  Use static descriptor sets when possible  Reduce resource duplication by reusing & sharing more across shader stages  Nested descriptor sets
  • 12. 12 COMPUTE PIPELINES  1:1 mapping between pipeline & shader  No state built into pipeline  Can execute in parallel with rendering  ~100 compute pipelines in BF4
  • 13. 13 GRAPHICS PIPELINES  All graphics shader stages combined to a single pipeline object together with important graphics state  ~10000 graphics pipelines in BF4 on a single level, ~25 MB of video memory  Could use smaller working pool of active state objects to keep reasonable amount in memory – Have not been required for us
  • 14. 14 PRE-BUILDING PIPELINES  Graphics pipeline creation is expensive operation, do at load time instead of runtime! – Creating one of our graphics pipelines take ~10-60 ms each – Pre-build using N parallel low-priority jobs – Avoid 99.9% of runtime stalls caused by pipeline creation!  Requires knowing the graphics pipeline state that will be used with the shaders – Primitive type – Render target formats – Render target write masks – Blend modes  Not fully trivial to know all state, may require engine changes / pre-defining use cases – Important to design for!
  • 15. 15 PIPELINE CACHE  Cache built pipelines both in memory cache and disk cache – Improved loading times – Max 300 MB – Simple LRU policy – LZ4 compressed (free)  Database signature: – Driver version – Vendor ID – Device ID
  • 17. 17 MEMORY MANAGEMENT  Mantle devices exposes multiple memory heaps with characteristics – Can be different between devices, drivers and OS:es  User explicitly places resources in wanted heaps – Driver suggests preferred heaps when creating objects, not a requirement Type Size Page CPU access GPU Read GPU Write CPU Read CPU Write Local 256 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 130 170 0.0058 2.8 Local 4096 MB 65535 130 180 0 0 Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 2.6 2.6 0.1 3.3 Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent 2.6 2.6 3.2 2.9
  • 18. 18 FROSTBITE MEMORY HEAPS  System Shared Mapped – CPU memory that is GPU visible. – Write combined & persistently mapped = easy & fast to write to in parallel at any time  System Shared Pinned – CPU cached for readback. – Not used much  Video Shared – GPU memory accessible by CPU. Used for descriptor sets and dynamic buffers – Max 256 MB (legacy constraint) – Avoid keeping persistently mapped as WDMM doesn’t like this and can decide to move it back to CPU memory   Video Private – GPU private memory. – Used for render targets, textures and other resources CPU does not need to access
  • 19. 19 MEMORY REFERENCES  WDDM needs to know which memory allocations are referenced for each command buffer – In order to make sure they are resident and not paged out – Max ~1700 memory references are supported – Overhead with having lots of references  Engine needs to keep track of what memory is referenced while building the command buffers – Easy & fast to do – Each reference is either read-only or read/write – We use a simple global list of references shared for all command buffers.
  • 20. 20 MEMORY POOLING  Pooling memory allocations were required for us – Sub allocate within larger 1 – 32 MB chunks – All resources stored memory handle + offset – Not as elegant as just void* on consoles – Fragmentation can be a concern, not too much issues for us in practice  GPU virtual memory mapping is fully supported, can simplify & optimize management
  • 21. 21 OVERCOMMITTING VIDEO MEMORY  Avoid overcommitting video memory! – Will lead to severe stalls as VidMM moves blocks and moves memory back and forth – VidMM is a black box  – One of the biggest issues we ran into during development  Recommendations – Balance memory pools – Make sure to use read-only memory references – Use memory priorities
  • 22. 22 MEMORY PRIORITIES  Setting priorities on the memory allocations helps VidMM choose what to page out when it has to  5 priority levels – Very high = Render targets with MSAA – High = Render targets and UAVs – Normal = Textures – Low = Shader & constant buffers – Very low = vertex & index buffers
  • 23. 23 MEMORY RESIDENCY FUTURE  For best results manage which resources are in video memory yourself & keep only ~80% used – Avoid all stalls – Can async DMA in and out  We are thinking of redesigning to fully avoid possibility of overcommitting  Hoping WDDM’s memory residency management can be simplified & improved in the future
  • 25. 25 RESOURCE LIFETIMES  App manages lifetime of all resources – Have to make sure GPU is not using an object or memory while we are freeing it on the CPU – How we’ve always worked with GPUs on the consoles – Multi-GPU adds some additional complexity that consoles do not have  We keep track of lifetimes on a per frame granularity – Queues for object destruction & free memory operations – Add to queue at any time on the CPU – Process queues when GPU command buffers for the frame are done executing – Tracked with command buffer fences
  • 26. 26 LINEAR FRAME ALLOCATOR  We use multiple linear allocators with Mantle for both transient buffers & images – Used for huge amount of small constant data and other GPU frame data that CPU writes – Easy to use and very low overhead – Don’t have to care about lifetimes or state  Fixed memory buffers for each frame – Super cheap sub-allocation from from any thread – If full, use heap allocation (also fast due to pooling)  Alternative: ring buffers – Requires being able to stall & drain pipeline at any allocation if full, additional complexity for us
  • 27. 27 TILING  Textures should be tiled for performance – Explicitly handled in Mantle, user selects linear or tiled – Some formats (BC) can’t be accessed as linear by the GPU  On consoles we handle tiling offline as part of our data processing pipeline – We know the exact tiling formats and have separate resources per platform  For Mantle – Tiling formats are opaque, can be different between GPU architectures and image types – Tile textures with DMA image upload from SystemShared to VideoPrivate  Linear source, tiled destination  Free
  • 29. 29 COMMAND BUFFERS  Command buffers are the atomic unit of work dispatched to the GPU – Separate creation from execution – No “immediate context” a la DX11 that can execute work at any call – Makes resource synchronization and setup significantly easier & faster  Typical BF4 scenes have around ~50 command buffers per frame – Reasonable tradeoff for us with submission overhead vs CPU load-balancing
  • 30. 30 COMMAND BUFFER SOURCES  Frostbite has 2 separate sources of command buffers – World rendering  Rendering the world with tons of objects, lots of draw calls. Have all frame data up front  All resources except for render targets are read-only  Generated in parallel up front each frame – Immediate rendering (“the rest”)  Setting up rendering and doing lighting, post-fx, virtual texturing, compute, etc  Managing resource state, memory and running on different queues (graphics, compute, DMA)  Sequentially generated in a single job, simulate an immediate context by splitting the command buffer  Both are very important and have different requirements
  • 31. 31 RESOURCE TRANSITIONS  Key design in Mantle to significantly lower driver overhead & complexity – Explicit hazard tracking by the app/engine – Drives architecture-specific caches & compression – AMD: FMASK, CMASK, HTILE – Enables explicit memory management  Examples: – Optimal render target writes → Graphics shader read-only – Compute shader write-only → DrawIndirect arguments  Mantle has a strong validation layer that tracks transitions which is a major help
  • 32. 32 MANAGING RESOURCE TRANSITIONS  Engines need a clear design on how to handle state transitions  Multiple approaches possible: – Sequential in-order command buffers  Generate one command buffer at the time in order  Transition resources on-demand when doing operation on them, very simple  Recommendation: start with this – Out-of-order multiple command buffers  Track state per command buffer, fix up transitions when order of command buffers is known – Hybrid approaches & more
  • 33. 33 MANAGING RESOURCE TRANSITIONS IN FROSTBITE  Current approach in Frostbite is quite basic: – We keep track of a single state for each resource (not subresource) – The “immediate rendering” transition resources as needed depending on operation – The out of order “world rendering” command buffers don’t need to transition states  Already have write access to MRTs and read-access to all resources setup outside them  Avoids the problem of them not knowing the state during generation  Works now but as we do more general parallel rendering it will have to change – Track resource state for each command buffer & fixup between command buffers
  • 34. 34 DYNAMIC STATE OBJECTS  Graphics state is only set with the pipeline object and 5 dynamic state objects – State objects: color blend, raster, viewport, depth-stencil, MSAA – No other parameters such as in DX11 with stencil ref or SetViewport functions  Frostbite use case: – Pre-create when possible – Otherwise on-demand creation (hash map) – Only ~100 state objects!  Still possible to end up with lots of state objects – Esp. with state object float & integer values (depth bounds, depth bias, viewport) – But no need to store all permutations in memory, objects are fast to create & app manages lifetimes
  • 36. 36 QUEUES  Universal queue can do both graphics, compute and presents  We use also use additional queues to parallelize GPU operations: – DMA queue – Improve perf with faster transfers & avoiding idling graphics will transfering – Compute queue - Improve perf by utilizing idle ALU and update resources simultaneously with gfx  More GPUs = more queues!
  • 37. 37  Order of execution within a queue is sequential  Synchronize multiple queues with GPU semaphores (signal & wait)  Also works across multiple GPUs Compute Graphics QUEUES SYNCHRONIZATION S Wait W S
  • 38. 38 QUEUES SYNCHRONIZATION CONT  Started out with explicit semaphores – Error prone to handle when having lots of different semaphores & queues – Difficult to visualize & debug  Switched to more representation more similar to a job graph  Just a model on top of the semaphores
  • 39. 39 GPU JOB GRAPH  Each GPU job has list of dependencies (other command buffers)  Dependencies has to finish first before job can run on its queue  The dependencies can be from any queue  Was easier to work with, debug and visualize  Really extendable going forward Graphics 1 Graphics 2 DMA Compute Graphics 2
  • 40. 40 ASYNC DMA  AMD GPUs have dedicated hardware DMA engines, let’s use them! – Uploading through DMA is faster than on universal queue, even if blocking – DMA have alignment restrictions, have to support falling back to copies on universal queue  Use case: Frame buffer & texture uploads – Used by resource initial data uploads and our UpdateSubresource – Guaranteed to be finished before the GPU universal queue starts rendering the frame  Use case: Multi-GPU frame buffer copy – Peer-to-peer copy of the frame buffer to the GPU that will present it
  • 41. 41 ASYNC COMPUTE  Frostbite has lots of compute shader passes that could run in parallel with graphics work – HBAO, blurring, classification, tile-based lighting, etc  Running as async compute can improve GPU performance by utilizing ”free” ALU – For example while doing shadowmap rendering (ROP bound)
  • 42. 42 ASYNC COMPUTE – TILE-BASED LIGHTING  3 sequential compute shaders – Input: zbuffer & gbuffer – Output: HDR texture/UAV  Runs in parallel with graphics pipeline that renders to other targets Compute Graphics TileZ Gbuffer Shadowmaps Reflection Distort Transp Cull lights Lighting S SWait W
  • 43. 43 ASYNC COMPUTE – TILE-BASED LIGHTING  We manually prepare the resources for the async compute – Important to not access the resources on other queues at the same time (unless read-only state) – Have to transition resources on the queue that last used it  Up to 80% faster in our initial tests, but not fully reliable – But is a pretty small part of the frame time – Not in BF4 yet Compute Graphics TileZ Gbuffer Shadowmaps Reflection Distort Transp Cull lights Lighting S SWait W
  • 45. 45 MULTI-GPU  Multi-GPU alternatives: – AFR – Alternate Frame Rendering (1-4 GPUs of the same power) – Heterogeneous AFR – 1 small + 1 big GPU (APU + Discrete) – SFR – Split Frame Rendering – Multi-GPU Job Graph – Primary strong GPU + slave GPUs helping  Frostbite supports AFR natively – No synchronization points within the frame – For resources that are not rendered every frame: re-render resources for each GPU  Example: sky envmap update on weather change  With Mantle multi-GPU is explicit and we have to build support for it ourselves
  • 46. 46 MULTI-GPU AFR WITH MANTLE  All resources explicitly duplicated on each GPU with async DMA – Hidden internally in our rendering abstraction  Every frame alternate which GPU we build command buffers for and are using resources from  Our UpdateSubresource has to make sure it updates resources on all GPU  Presenting the screen has to in some modes copy the frame buffer to the GPU that owns the display  Bonus: – Can simulate multi-GPU mode even with single GPU! – Multi-GPU works in windowed mode!
  • 47. 47  GPUs are independently rendering & presenting to the screen – can cause micro-stuttering – Frames are not presented in a regular intervals – Frame rate can be high but presentation & gameplay is not smooth – FCAT is a good tool to analyse this MULTI-GPU ISSUES GPU0 GPU1 Frame 0 P Frame 1 P Frame 2 P Frame 3 P GPU0 GPU1 Irregular presentation interval
  • 48. 48  GPUs are independently rendering & presenting to the screen – can cause micro-stuttering – Frames are not presented in a regular intervals – Frame rate can be high but presentation & gameplay is not smooth – FCAT is a good tool to analyse this  We need to introduce dependency & dampening between the GPUs to alleviate this – frame pacing MULTI-GPU ISSUES GPU0 GPU1 Frame 0 P Frame 1 P Frame 2 P Frame 3 P Ideal presentation interval
  • 49. 49 FRAME PACING  Measure average frame rate on each GPU – Short history (10-30 frames) – Filter out spikes  Insert delay on the GPU before each present – Force the frame times to become more regular and GPUs to align – Delay value is based on the calculate avg frame rate GPU0 GPU1 Frame 0 P Frame 1 P Frame 2 P Frame 3 P GPU0 GPU1 Delay D
  • 51. 51 MANTLE DEV RECOMMENDATIONS  The validation layer is a critical friend!  You’ll end up with a lot of object & memory management code, try share with console code  Make sure you have control over memory usage and can avoid overcommitting video memory  Build a robust solution for resource state management early  Figure out how to pre-create your graphics pipelines, can require engine design changes  Build for multi-GPU support from the start, easier than to retrofit
  • 52. 52 FUTURE  Second wave of Frostbite Mantle titles  Adapt Frostbite core rendering layer based on learnings from Mantle – Refine binding & buffer updates to further reduce overhead – Virtual memory management – More async compute & async DMAs – Multi-GPU job graph R&D  Linux – Would like to see how our Mantle renderer behaves with different memory management & driver model