SlideShare a Scribd company logo
1 of 69
Download to read offline
Bringing AAA graphics to mobile platforms
Niklas Smedberg
Senior Engine Programmer, Epic Games
Who Am I
● A.k.a. “Smedis”
● Platform team at Epic Games
● Unreal Engine
● 15 years in the industry
● 30 years of programming
● C64 demo scene
Content
● Hardware
● How it works under the hood
● Case study: ImgTec SGX GPU
● Software
● How to apply this knowledge to bring console
graphics to mobile platforms
Mobile Graphics Processors
● The feature support is there:
● Shaders
● Render to texture
● Depth textures
● MSAA
● But is the performance there?
● Yes. And it keeps getting better!
Mobile GPU Architecture
● Tile-based deferred rendering GPU
● Very different from desktop or consoles
● Common on smartphones and tablets
● ImgTec SGX GPUs fall into this category
● There are other tile-based GPUs (e.g. ARM Mali)
● Other mobile GPU types
● NVIDIA Tegra is more traditional
Tile-Based Mobile GPU
TLDR Summary:
● Split the screen into tiles
● E.g. 16x16 or 32x32 pixels
● The GPU fits an entire tile on chip
● Process all drawcalls for one tile
● Repeat for each tile to fill the screen
● Each tile is written to RAM as it finishes
(For illustration purposes only)
ImgTec Process
Software Command
Buffer
Vertex
Frontend
Vertex
Processing
Tiling Parameter
Buffer
Pixel
Frontend
Pixel
Processing
Frame
Buffer
Vertex
Processing
Vertex Frontend
● Vertex Frontend reads from GPU command buffer
● Distributes vertex primitives to all GPU cores
● Splits drawcalls into fixed chunks of vertices
● GPU cores process vertices independently
● Continues until the end of the scene
Software Command
Buffer
Vertex
Frontend
Vertex
Processing
Vertex
Processing
Vertex processing (Per GPU Core)
Vertex Setup
(VDM)
Pre-Shader
Shader (Vertex)
(USSE)
Parameter
Buffer
(RAM)
Tiling
(TA)
Software Command
Buffer
Vertex
Frontend
Vertex
Processing
Vertex Setup
Vertex Setup
(VDM)
Pre-Shader
Shader (Vertex)
(USSE)
Parameter
Buffer
(RAM)
Tiling
(TA)
Receives commands from Vertex Frontend
Vertex Pre-Shader
Vertex Setup
(VDM)
Pre-Shader
Shader (Vertex)
(USSE)
Parameter
Buffer
(RAM)
Tiling
(TA)
Fetches input data (attributes and uniforms)
Vertex Shader
Vertex Setup
(VDM)
Pre-Shader
Shader (Vertex)
(USSE)
Parameter
Buffer
(RAM)
Tiling
(TA)
Universal Scalable Shader Engine
Executes the vertex shader program, multithreaded
Tiling
Vertex Setup
(VDM)
Pre-Shader
Shader (Vertex)
(USSE)
Parameter
Buffer
(RAM)
Tiling
(TA)
Optimizes vertex shader output
Bins resulting primitives into tile data
Parameter Buffer
Vertex Setup
(VDM)
Pre-Shader
Shader (Vertex)
(USSE)
Parameter
Buffer
(RAM)
Tiling
(TA)
Stored in system memory
You don’t want to overflow this buffer!
Pixel
Processing
Pixel
Processing
Pixel Frontend
● Reads Parameter Buffer
● Distributes pixel processing to all cores
● One whole tile at a time
● A tile is processed in full on one GPU core
● Tiles are processed in parallel on multi-core GPUs
Parameter
Buffer
Pixel
Frontend
Pixel
Processing
Frame
Buffer
Pixel processing (Per GPU Core)
Pixel Setup
(PDM)
Pre-Shader
Shader (Pixel)
(USSE)
Frame Buffer
(RAM)
Pixel Back-end
Parameter
Buffer
Pixel
Frontend
Pixel
Processing
Frame
Buffer
Pixel Setup
Pixel Setup
(PDM)
Pre-Shader
Shader (Pixel)
(USSE)
Frame Buffer
(RAM)
Pixel Back-end
Receives tile commands from Pixel Frontend
Fetches vertexshader output from Parameter Buffer
Triangle rasterization; Calculate interpolator values
Depth/stencil test; Hidden Surface Removal
Pixel Pre-Shader
Pixel Setup
(PDM)
Pre-Shader
Shader (Pixel)
(USSE)
Frame Buffer
(RAM)
Pixel Back-end
Fills in interpolator and uniform data
Kicks off non-dependent texture reads
Pixel Shader
Pixel Setup
(PDM)
Pre-Shader
Shader (Pixel)
(USSE)
Frame Buffer
(RAM)
Pixel Back-end
Multithreaded ALUs
Each thread can be vertices or pixels
Can have multiple USSEs in each GPU core
Pixel Back-end
Pixel Setup
(PDM)
Pre-Shader
Shader (Pixel)
(USSE)
Frame Buffer
(RAM)
Pixel Back-end
Triggered when all pixels in the tile are finished
Performs data conversions, MSAA-downsampling
Writes finished tile color/depth/stencil to memory
Shader Unit Caveats
● Shader programs without dynamic flow-control:
● 4 vertices/pixels per instruction
● Shader programs with dynamic flow-control:
● 1 vertex/pixel per instruction
● Alpha-blending is in the shader
● Not separate specialized hardware
● Shader patching may occur when you switch state
● (More on how to avoid shader patching later)
Rendering Techniques
● How to take advantage of this GPU?
Mobile is the new PC
● Wide feature and performance range
● Scalable graphics are back
● User graphics settings are back
● Low/medium/high/ultra
● Render buffer size scaling
● Testing 100 SKUs is back
Graphics Settings
Render target is on die
● MSAA is cheap and use less memory
● Only the resolved data in RAM
● Have seen 0-5 ms cost for MSAA
● Be wary of buffer restores (color or depth)
● No bandwidth cost for alpha-blend
● Cheap depth/stencil testing
“Free” hidden surface removal
● Specific to ImgTec SGX GPU
● Eliminates all background pixels
● Eliminates overdraw
● Only for opaque
Mobile vs Console
● Very large CPU overhead for OpenGL ES API
● Max CPU usage at 100-300 drawcalls
● Avoid too much data per scene
● Parameter buffer between vertex & pixel processing
● Save bandwidth and GPU flushes
● Shader patching
● Some render states cause the shader to be modified and
recompiled by the driver
● E.g. alpha-blend settings, vertex input, color write masks, etc
Alpha-test / Discard
● Conditional z-writes can be very slow
● Instead of writing out Z ahead of time,
the “Pixel setup” (PDM) won’t submit more
fragments until the pixelshader has
determined visibility for current pixels.
● Use alpha-blend instead of alpha-test
● Fit the geometry to visible pixels
Alpha-blended, form-fitted geometry
Alpha-blended, form-fitted geometry
Render Buffer Management (1 of 2)
● Each render target is a whole new scene
● Avoid switching render target back and forth!
● Can cause a full restore:
● Copies full color/depth/stencil from RAM into Tile
Memory at the beginning of the scene
● Can cause a full resolve:
● Copies full color/depth/stencil from Tile Memory into
RAM at the end of the scene
Render Buffer Management (2 of 2)
● Avoid buffer restore
● Clear everything! Color/depth/stencil
● A clear just sets some dirty bits in a register
● Avoid buffer resolve
● Use discard extension (GL_EXT_discard_framebuffer)
● See usage case for shadows
● Avoid unnecessarily different FBO combos
● Don’t let the driver think it needs to start resolving and
restoring any buffers!
Texture Lookups
● Don’t perform texture lookups in the pixel shader!
● Let the “pre-shader” queue them up ahead of time
● I.e. avoid dependent texture lookups
● Don’t manipulate texture coordinate with math
● Move all math to vertex shader and pass down
● Don't use .zw components for texture coordinates
● Will be handled as a dependent texture lookup
● Only use .xy and pass other data in .zw
Mobile Material System
● Full Unreal Engine materials are too complicated
Mobile Material System
● Initial idea:
● Pre-render into a single texture
Mobile Material System
● Current solution:
● Pre-render components into
separate textures
● Add mobile-specific settings
● Feature support driven by artists
Mobile Material Shaders
● One hand-written ubershader
● Lots of #ifdef for all features
● Exposed as fixed settings in the artist UI
● Checkboxes, lists, values, etc
Material Example: Rim Lighting
Material Example: Vertex Animation
Shader Offline Processing
● Run C pre-processor offline
● Reduces in-game compile time
● Eliminates duplicates at off-line time
Shader Compiling
● Compile all shaders at startup
● Avoids hitching at run-time
● Compile on the GL thread, while loading on Game thread
● Compiling is not enough
● Must issue dummy drawcalls!
● Remember how certain states affect shaders!
● May need experimenting to avoid shader patching
E.g. alpha-blend states, color write masks
God Rays
God Rays
● Initially ported Xbox straight to PS Vita
● Worked, but was very slow
● But for Infinity Blade II, on a cell phone!?
● We first thought it was impossible
● But let’s have a deeper look
God Rays
● Port to OpenGL ES 2.0
● Use fewer texture lookups
● Worse quality
● And still very slow
Optimizations For Mobile
● Move all math to vertex shader
● No dependent texture reads!
● Pass down data through interpolators
● But, now we’re out of interpolators 
● Split radial filter into 4 draw calls
● 4 x 8 = 32 texture lookups total (equiv. 256)
● Went from 30 ms to 5 ms
Original
Shader
Mobile
Shader
God Rays
● Original Scene
● No God Rays
1st Pass
● Downsample Scene
● Identify pixels
● RGB: Scene color
● A: Occlusion factor
● Resolve to texture:
● “Unblurred source”
2nd Pass
● Average 8 lookups
● From “Unblurred source”
● 1st quarter vector
● Uses 8 .xy interpolators
● Opaque draw call
3rd Pass
● Average +8 lookups
● From “Unblurred source”
● 2nd quarter vector
● Uses 8 .xy interpolators
● Additive draw call
● Resolve to texture:
● “Blurred source”
4th Pass
● Average 8 lookups
● From “Blurred source”
● 1st half vector
● Uses 8 .xy interpolators
● Opaque draw call
5th Pass
● Average +8 lookups
● From “Blurred source”
● 2nd half vector
● Uses 8 .xy interpolators
● Additive draw call
● Resolve final result
6th Pass
● Clear the final buffer
● Avoids buffer restore
● Opaque fullscreen
● Screenblend apply
● Blend in pixelshader
Character
Shadows
Character Shadows
● Ported one type of shadows from Xbox:
● Projected, modulated dynamic shadows
● Fairly standard method
● Generate shadow depth buffer
● Stencil potential pixels
● Compare shadow depth and scene depth
● Darken affected pixels
Character Shadows
1. Project character depth from light view
Character Shadows
2. Reproject into camera view
Character Shadows
3. Compare with SceneDepth and modulate
Character Shadows
4. Draw character on top (no self-shadow)
Shadow Optimizations (1 of 2)
● Shadow depth first in the frame
● Avoids a rendertarget switch (resolve & restore!)
● Resolve SceneDepth just before shadows*
● Write out tile depth to RAM to read as texture
● Keep rendering in the same tile
● Unfortunately no API for this in OpenGL ES
Shadow Optimizations (2 of 2)
● Optimize color buffer usage for shadow
● We only need the depth buffer!
● Unnecessary buffer, but required in OpenGL ES
● Clear (avoid restore) and disable color writes
● Use glDiscardFrameBuffer() to avoid resolve
● Could encode depth in F16 / RGBA8 color instead
● Draw screen-space quad instead of cube
● Avoids a dependent texture lookup
Tool Tips:
● Use an OpenGL ES wrapper on PC
● Almost “WYSIWYG”
● Debug in Visual Studio
● Apple Xcode GL debugger, iOS 5
● Full capture of one frame
● Shows each drawcall, states in separate pane
● Shows all resources used by each drawcall
● Shows shader source code + all uniform values
Next Generation
ImgTec “Rogue” (6xxx series):
20x
ImgTec 6xxx series
● 100+ GFLOPS (scalable to TFLOPS range)
● DirectX 10, OpenGL ES “Halti”
● PVRTC 2
● Improved memory bandwidth usage
● Improved latency hiding
Questions?

More Related Content

What's hot

Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanElectronic Arts / DICE
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11smashflt
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
 
[UniteKorea2013] The Unity Rendering Pipeline
[UniteKorea2013] The Unity Rendering Pipeline[UniteKorea2013] The Unity Rendering Pipeline
[UniteKorea2013] The Unity Rendering PipelineWilliam Hugo Yang
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCQLOC
 
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case Study
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case StudyThe PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case Study
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case StudyGuerrilla
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
Shader Programming With Unity
Shader Programming With UnityShader Programming With Unity
Shader Programming With UnityMindstorm Studios
 
Game Engine Architecture
Game Engine ArchitectureGame Engine Architecture
Game Engine ArchitectureAttila Jenei
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn
 
OpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsOpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsJungsoo Nam
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsMark Kilgard
 
Android open gl2_droidcon_2014
Android open gl2_droidcon_2014Android open gl2_droidcon_2014
Android open gl2_droidcon_2014Droidcon Berlin
 
How we optimized our Game - Jake & Tess' Finding Monsters Adventure
How we optimized our Game - Jake & Tess' Finding Monsters AdventureHow we optimized our Game - Jake & Tess' Finding Monsters Adventure
How we optimized our Game - Jake & Tess' Finding Monsters AdventureFelipe Lira
 
Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 

What's hot (20)

Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
[UniteKorea2013] The Unity Rendering Pipeline
[UniteKorea2013] The Unity Rendering Pipeline[UniteKorea2013] The Unity Rendering Pipeline
[UniteKorea2013] The Unity Rendering Pipeline
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case Study
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case StudyThe PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case Study
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case Study
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
Shader Programming With Unity
Shader Programming With UnityShader Programming With Unity
Shader Programming With Unity
 
Game Engine Architecture
Game Engine ArchitectureGame Engine Architecture
Game Engine Architecture
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
OpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsOpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIs
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
Android open gl2_droidcon_2014
Android open gl2_droidcon_2014Android open gl2_droidcon_2014
Android open gl2_droidcon_2014
 
How we optimized our Game - Jake & Tess' Finding Monsters Adventure
How we optimized our Game - Jake & Tess' Finding Monsters AdventureHow we optimized our Game - Jake & Tess' Finding Monsters Adventure
How we optimized our Game - Jake & Tess' Finding Monsters Adventure
 
Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 

Similar to Smedberg niklas bringing_aaa_graphics

Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Alexander Dolbilov
 
The next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesThe next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesPooya Eimandar
 
OpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUOpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUJiansong Chen
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360VIKAS SINGH BHADOURIA
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Unity Technologies
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemGuerrilla
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdfmraaaaa
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalMichalis Kamburelis
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentDavid Galeano
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading LanguageJungsoo Nam
 

Similar to Smedberg niklas bringing_aaa_graphics (20)

Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)
 
The next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesThe next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game Engines
 
OpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUOpenGL ES and Mobile GPU
OpenGL ES and Mobile GPU
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 
Xbox
XboxXbox
Xbox
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...
 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
Chapter-3.pdf
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Gpu microprocessors
Gpu microprocessorsGpu microprocessors
Gpu microprocessors
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in Pascal
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading Language
 

More from changehee lee

Fortugno nick design_and_monetization
Fortugno nick design_and_monetizationFortugno nick design_and_monetization
Fortugno nick design_and_monetizationchangehee lee
 
[Kgc2013] 모바일 엔진 개발기
[Kgc2013] 모바일 엔진 개발기[Kgc2013] 모바일 엔진 개발기
[Kgc2013] 모바일 엔진 개발기changehee lee
 
모바일 엔진 개발기
모바일 엔진 개발기모바일 엔진 개발기
모바일 엔진 개발기changehee lee
 
Mobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraphMobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraphchangehee lee
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)changehee lee
 
개발자여! 스터디를 하자!
개발자여! 스터디를 하자!개발자여! 스터디를 하자!
개발자여! 스터디를 하자!changehee lee
 
[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희changehee lee
 
Gamificated game developing
Gamificated game developingGamificated game developing
Gamificated game developingchangehee lee
 
Windows to reality getting the most out of direct3 d 10 graphics in your games
Windows to reality   getting the most out of direct3 d 10 graphics in your gamesWindows to reality   getting the most out of direct3 d 10 graphics in your games
Windows to reality getting the most out of direct3 d 10 graphics in your gameschangehee lee
 
Basic ofreflectance kor
Basic ofreflectance korBasic ofreflectance kor
Basic ofreflectance korchangehee lee
 
Valve handbook low_res
Valve handbook low_resValve handbook low_res
Valve handbook low_reschangehee lee
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipelinechangehee lee
 

More from changehee lee (20)

Visual shock vol.2
Visual shock   vol.2Visual shock   vol.2
Visual shock vol.2
 
Shader compilation
Shader compilationShader compilation
Shader compilation
 
Fortugno nick design_and_monetization
Fortugno nick design_and_monetizationFortugno nick design_and_monetization
Fortugno nick design_and_monetization
 
카툰 렌더링
카툰 렌더링카툰 렌더링
카툰 렌더링
 
[Kgc2013] 모바일 엔진 개발기
[Kgc2013] 모바일 엔진 개발기[Kgc2013] 모바일 엔진 개발기
[Kgc2013] 모바일 엔진 개발기
 
Paper games 2013
Paper games 2013Paper games 2013
Paper games 2013
 
모바일 엔진 개발기
모바일 엔진 개발기모바일 엔진 개발기
모바일 엔진 개발기
 
V8
V8V8
V8
 
Wecanmakeengine
WecanmakeengineWecanmakeengine
Wecanmakeengine
 
Mobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraphMobile crossplatformchallenges siggraph
Mobile crossplatformchallenges siggraph
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
 
개발자여! 스터디를 하자!
개발자여! 스터디를 하자!개발자여! 스터디를 하자!
개발자여! 스터디를 하자!
 
[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희
 
Light prepass
Light prepassLight prepass
Light prepass
 
Gamificated game developing
Gamificated game developingGamificated game developing
Gamificated game developing
 
Windows to reality getting the most out of direct3 d 10 graphics in your games
Windows to reality   getting the most out of direct3 d 10 graphics in your gamesWindows to reality   getting the most out of direct3 d 10 graphics in your games
Windows to reality getting the most out of direct3 d 10 graphics in your games
 
Basic ofreflectance kor
Basic ofreflectance korBasic ofreflectance kor
Basic ofreflectance kor
 
C++11(최지웅)
C++11(최지웅)C++11(최지웅)
C++11(최지웅)
 
Valve handbook low_res
Valve handbook low_resValve handbook low_res
Valve handbook low_res
 
Ndc12 이창희 render_pipeline
Ndc12 이창희 render_pipelineNdc12 이창희 render_pipeline
Ndc12 이창희 render_pipeline
 

Recently uploaded

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Smedberg niklas bringing_aaa_graphics

  • 1. Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games
  • 2. Who Am I ● A.k.a. “Smedis” ● Platform team at Epic Games ● Unreal Engine ● 15 years in the industry ● 30 years of programming ● C64 demo scene
  • 3. Content ● Hardware ● How it works under the hood ● Case study: ImgTec SGX GPU ● Software ● How to apply this knowledge to bring console graphics to mobile platforms
  • 4.
  • 5.
  • 6.
  • 7. Mobile Graphics Processors ● The feature support is there: ● Shaders ● Render to texture ● Depth textures ● MSAA ● But is the performance there? ● Yes. And it keeps getting better!
  • 8. Mobile GPU Architecture ● Tile-based deferred rendering GPU ● Very different from desktop or consoles ● Common on smartphones and tablets ● ImgTec SGX GPUs fall into this category ● There are other tile-based GPUs (e.g. ARM Mali) ● Other mobile GPU types ● NVIDIA Tegra is more traditional
  • 9. Tile-Based Mobile GPU TLDR Summary: ● Split the screen into tiles ● E.g. 16x16 or 32x32 pixels ● The GPU fits an entire tile on chip ● Process all drawcalls for one tile ● Repeat for each tile to fill the screen ● Each tile is written to RAM as it finishes (For illustration purposes only)
  • 10. ImgTec Process Software Command Buffer Vertex Frontend Vertex Processing Tiling Parameter Buffer Pixel Frontend Pixel Processing Frame Buffer
  • 11. Vertex Processing Vertex Frontend ● Vertex Frontend reads from GPU command buffer ● Distributes vertex primitives to all GPU cores ● Splits drawcalls into fixed chunks of vertices ● GPU cores process vertices independently ● Continues until the end of the scene Software Command Buffer Vertex Frontend Vertex Processing Vertex Processing
  • 12. Vertex processing (Per GPU Core) Vertex Setup (VDM) Pre-Shader Shader (Vertex) (USSE) Parameter Buffer (RAM) Tiling (TA) Software Command Buffer Vertex Frontend Vertex Processing
  • 13. Vertex Setup Vertex Setup (VDM) Pre-Shader Shader (Vertex) (USSE) Parameter Buffer (RAM) Tiling (TA) Receives commands from Vertex Frontend
  • 14. Vertex Pre-Shader Vertex Setup (VDM) Pre-Shader Shader (Vertex) (USSE) Parameter Buffer (RAM) Tiling (TA) Fetches input data (attributes and uniforms)
  • 15. Vertex Shader Vertex Setup (VDM) Pre-Shader Shader (Vertex) (USSE) Parameter Buffer (RAM) Tiling (TA) Universal Scalable Shader Engine Executes the vertex shader program, multithreaded
  • 17. Parameter Buffer Vertex Setup (VDM) Pre-Shader Shader (Vertex) (USSE) Parameter Buffer (RAM) Tiling (TA) Stored in system memory You don’t want to overflow this buffer!
  • 18. Pixel Processing Pixel Processing Pixel Frontend ● Reads Parameter Buffer ● Distributes pixel processing to all cores ● One whole tile at a time ● A tile is processed in full on one GPU core ● Tiles are processed in parallel on multi-core GPUs Parameter Buffer Pixel Frontend Pixel Processing Frame Buffer
  • 19. Pixel processing (Per GPU Core) Pixel Setup (PDM) Pre-Shader Shader (Pixel) (USSE) Frame Buffer (RAM) Pixel Back-end Parameter Buffer Pixel Frontend Pixel Processing Frame Buffer
  • 20. Pixel Setup Pixel Setup (PDM) Pre-Shader Shader (Pixel) (USSE) Frame Buffer (RAM) Pixel Back-end Receives tile commands from Pixel Frontend Fetches vertexshader output from Parameter Buffer Triangle rasterization; Calculate interpolator values Depth/stencil test; Hidden Surface Removal
  • 21. Pixel Pre-Shader Pixel Setup (PDM) Pre-Shader Shader (Pixel) (USSE) Frame Buffer (RAM) Pixel Back-end Fills in interpolator and uniform data Kicks off non-dependent texture reads
  • 22. Pixel Shader Pixel Setup (PDM) Pre-Shader Shader (Pixel) (USSE) Frame Buffer (RAM) Pixel Back-end Multithreaded ALUs Each thread can be vertices or pixels Can have multiple USSEs in each GPU core
  • 23. Pixel Back-end Pixel Setup (PDM) Pre-Shader Shader (Pixel) (USSE) Frame Buffer (RAM) Pixel Back-end Triggered when all pixels in the tile are finished Performs data conversions, MSAA-downsampling Writes finished tile color/depth/stencil to memory
  • 24. Shader Unit Caveats ● Shader programs without dynamic flow-control: ● 4 vertices/pixels per instruction ● Shader programs with dynamic flow-control: ● 1 vertex/pixel per instruction ● Alpha-blending is in the shader ● Not separate specialized hardware ● Shader patching may occur when you switch state ● (More on how to avoid shader patching later)
  • 25. Rendering Techniques ● How to take advantage of this GPU?
  • 26. Mobile is the new PC ● Wide feature and performance range ● Scalable graphics are back ● User graphics settings are back ● Low/medium/high/ultra ● Render buffer size scaling ● Testing 100 SKUs is back
  • 28. Render target is on die ● MSAA is cheap and use less memory ● Only the resolved data in RAM ● Have seen 0-5 ms cost for MSAA ● Be wary of buffer restores (color or depth) ● No bandwidth cost for alpha-blend ● Cheap depth/stencil testing
  • 29. “Free” hidden surface removal ● Specific to ImgTec SGX GPU ● Eliminates all background pixels ● Eliminates overdraw ● Only for opaque
  • 30. Mobile vs Console ● Very large CPU overhead for OpenGL ES API ● Max CPU usage at 100-300 drawcalls ● Avoid too much data per scene ● Parameter buffer between vertex & pixel processing ● Save bandwidth and GPU flushes ● Shader patching ● Some render states cause the shader to be modified and recompiled by the driver ● E.g. alpha-blend settings, vertex input, color write masks, etc
  • 31. Alpha-test / Discard ● Conditional z-writes can be very slow ● Instead of writing out Z ahead of time, the “Pixel setup” (PDM) won’t submit more fragments until the pixelshader has determined visibility for current pixels. ● Use alpha-blend instead of alpha-test ● Fit the geometry to visible pixels
  • 34. Render Buffer Management (1 of 2) ● Each render target is a whole new scene ● Avoid switching render target back and forth! ● Can cause a full restore: ● Copies full color/depth/stencil from RAM into Tile Memory at the beginning of the scene ● Can cause a full resolve: ● Copies full color/depth/stencil from Tile Memory into RAM at the end of the scene
  • 35. Render Buffer Management (2 of 2) ● Avoid buffer restore ● Clear everything! Color/depth/stencil ● A clear just sets some dirty bits in a register ● Avoid buffer resolve ● Use discard extension (GL_EXT_discard_framebuffer) ● See usage case for shadows ● Avoid unnecessarily different FBO combos ● Don’t let the driver think it needs to start resolving and restoring any buffers!
  • 36. Texture Lookups ● Don’t perform texture lookups in the pixel shader! ● Let the “pre-shader” queue them up ahead of time ● I.e. avoid dependent texture lookups ● Don’t manipulate texture coordinate with math ● Move all math to vertex shader and pass down ● Don't use .zw components for texture coordinates ● Will be handled as a dependent texture lookup ● Only use .xy and pass other data in .zw
  • 37. Mobile Material System ● Full Unreal Engine materials are too complicated
  • 38. Mobile Material System ● Initial idea: ● Pre-render into a single texture
  • 39. Mobile Material System ● Current solution: ● Pre-render components into separate textures ● Add mobile-specific settings ● Feature support driven by artists
  • 40. Mobile Material Shaders ● One hand-written ubershader ● Lots of #ifdef for all features ● Exposed as fixed settings in the artist UI ● Checkboxes, lists, values, etc
  • 43. Shader Offline Processing ● Run C pre-processor offline ● Reduces in-game compile time ● Eliminates duplicates at off-line time
  • 44. Shader Compiling ● Compile all shaders at startup ● Avoids hitching at run-time ● Compile on the GL thread, while loading on Game thread ● Compiling is not enough ● Must issue dummy drawcalls! ● Remember how certain states affect shaders! ● May need experimenting to avoid shader patching E.g. alpha-blend states, color write masks
  • 46. God Rays ● Initially ported Xbox straight to PS Vita ● Worked, but was very slow ● But for Infinity Blade II, on a cell phone!? ● We first thought it was impossible ● But let’s have a deeper look
  • 47. God Rays ● Port to OpenGL ES 2.0 ● Use fewer texture lookups ● Worse quality ● And still very slow
  • 48. Optimizations For Mobile ● Move all math to vertex shader ● No dependent texture reads! ● Pass down data through interpolators ● But, now we’re out of interpolators  ● Split radial filter into 4 draw calls ● 4 x 8 = 32 texture lookups total (equiv. 256) ● Went from 30 ms to 5 ms
  • 51. God Rays ● Original Scene ● No God Rays
  • 52. 1st Pass ● Downsample Scene ● Identify pixels ● RGB: Scene color ● A: Occlusion factor ● Resolve to texture: ● “Unblurred source”
  • 53. 2nd Pass ● Average 8 lookups ● From “Unblurred source” ● 1st quarter vector ● Uses 8 .xy interpolators ● Opaque draw call
  • 54. 3rd Pass ● Average +8 lookups ● From “Unblurred source” ● 2nd quarter vector ● Uses 8 .xy interpolators ● Additive draw call ● Resolve to texture: ● “Blurred source”
  • 55. 4th Pass ● Average 8 lookups ● From “Blurred source” ● 1st half vector ● Uses 8 .xy interpolators ● Opaque draw call
  • 56. 5th Pass ● Average +8 lookups ● From “Blurred source” ● 2nd half vector ● Uses 8 .xy interpolators ● Additive draw call ● Resolve final result
  • 57. 6th Pass ● Clear the final buffer ● Avoids buffer restore ● Opaque fullscreen ● Screenblend apply ● Blend in pixelshader
  • 59. Character Shadows ● Ported one type of shadows from Xbox: ● Projected, modulated dynamic shadows ● Fairly standard method ● Generate shadow depth buffer ● Stencil potential pixels ● Compare shadow depth and scene depth ● Darken affected pixels
  • 60. Character Shadows 1. Project character depth from light view
  • 61. Character Shadows 2. Reproject into camera view
  • 62. Character Shadows 3. Compare with SceneDepth and modulate
  • 63. Character Shadows 4. Draw character on top (no self-shadow)
  • 64. Shadow Optimizations (1 of 2) ● Shadow depth first in the frame ● Avoids a rendertarget switch (resolve & restore!) ● Resolve SceneDepth just before shadows* ● Write out tile depth to RAM to read as texture ● Keep rendering in the same tile ● Unfortunately no API for this in OpenGL ES
  • 65. Shadow Optimizations (2 of 2) ● Optimize color buffer usage for shadow ● We only need the depth buffer! ● Unnecessary buffer, but required in OpenGL ES ● Clear (avoid restore) and disable color writes ● Use glDiscardFrameBuffer() to avoid resolve ● Could encode depth in F16 / RGBA8 color instead ● Draw screen-space quad instead of cube ● Avoids a dependent texture lookup
  • 66. Tool Tips: ● Use an OpenGL ES wrapper on PC ● Almost “WYSIWYG” ● Debug in Visual Studio ● Apple Xcode GL debugger, iOS 5 ● Full capture of one frame ● Shows each drawcall, states in separate pane ● Shows all resources used by each drawcall ● Shows shader source code + all uniform values
  • 68. ImgTec 6xxx series ● 100+ GFLOPS (scalable to TFLOPS range) ● DirectX 10, OpenGL ES “Halti” ● PVRTC 2 ● Improved memory bandwidth usage ● Improved latency hiding