Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Next generation mobile gp us and rendering techniques - niklas smedberg

2,161 views

Published on

This session provides a detailed analysis of next-generation mobile graphics hardware and points out special caveats and opportunities that are specific to mobile. This is followed by an in-depth explanation of advanced rendering techniques that were previously only considered for high-end PCs or dedicated gaming consoles, but which have now been brought over to mobile devices in Unreal Engine 4 - including features such as physically-based rendering, HDR and image-based lighting.

Published in: Technology
  • Login to see the comments

Next generation mobile gp us and rendering techniques - niklas smedberg

  1. 1. Next-gen Mobile GPUs and Rendering Techniques Game Connection, October 29-31, 2014 Niklas Smedberg Senior Engine Programmer, Epic Games
  2. 2. Game Connection, October 29-31, 2014 Introduction • Niklas Smedberg, a.k.a. “Smedis” – 17 years in the game industry – Graphics programmer since the C64 demo scene
  3. 3. Game Connection, October 29-31, 2014 Content • Next-gen Hardware – Tile-based GPU vs direct-rendering GPU • Next-gen Rendering Techniques – Unreal Engine 4 – Previous goal: Bringing AAA Console graphics to mobile (DONE!) – New goal: Bringing AAA PC graphics to mobile (DONE!??)
  4. 4. Next Generation Mobile Hardware Game Connection, October 29-31, 2014 • Big leap in features and performance • Full-featured (e.g. OpenGL ES 3.1+AEP, DirectX 10/11) • Peak performance now comparable to consoles (Xbox 360/PS3) – About 300+ GFLOPS and 26 GB/s • New goal: Bring AAA PC graphics to mobile
  5. 5. Performance Trends (FP16 GFLOPS) Game Connection, October 29-31, 2014 6.4 12.8 25.5 154 300+ 350 300 250 200 150 100 50 0 2010 2011 2012 2013 2014 2010 SGX 535 2011 SGX 543MP2 2012 SGX 543MP3 2013 G6430 2014 Adreno, K1, GX6650
  6. 6. Game Connection, October 29-31, 2014 iOS 8 Metal • New rendering API for iOS • Better match for the hardware – Like a “game console API” – Much more efficient on the CPU – Exposes hardware features • 20x faster on our rendering thread – OpenGL ES API: 30% of renderthread – Metal API: 1.6% of renderthread • All power and perf responsibility is now in the hands of the developer!
  7. 7. Tech Demo: Zen Garden Game Connection, October 29-31, 2014
  8. 8. Game Connection, October 29-31, 2014 Tile-based Mobile GPU • Mobile GPUs are usually tile-based (next-gen too) – Tile-based: ImgTec, Qualcomm*, ARM – Direct: NVIDIA, Intel, Vivante * Qualcomm Adreno can render either tile-based or direct to frame buffer – Extension: GL_QCOM_binning_control
  9. 9. Game Connection, October 29-31, 2014 Tile-Based Mobile GPU Summary: • Split the screen into tiles – E.g. 32x32 pixels (ImgTec) or 300x300 (Qualcomm) • The whole tile fits within GPU, on chip • Process all drawcalls for one tile – Write out final tile results to RAM • Repeat for each tile to fill the image in RAM
  10. 10. ImgTec Tile-based Rendering Process Game Vertex Processing Tile Data (RAM) Game Connection, October 29-31, 2014 Pixel Processing (Top-most only) Frame Buffer (RAM) Cmd Buffer (RAM) Hidden Surface Removal Tile Memory Per Tile:
  11. 11. Game Connection, October 29-31, 2014 ImgTec Series 6 G6430 • One GPU core • Four shader units (USC) – 16-way scalar • FP16: Two SOP per clock – (a*b + c*d) • FP32: Two MADD per clock – (a*b + c) • 154 GFLOPS @ 400 MHz – 16-bit floating point
  12. 12. Game Connection, October 29-31, 2014 FP16 Is Faster Than FP32 • ImgTec Series 6: 50% faster – FP16 pipeline: Two SOP per clock – FP32 pipeline: Two MADD per clock • ImgTec Series 6XT: 100% faster – FP16 pipeline: Four MADD per clock – FP32 pipeline: Two MADD per clock • Qualcomm Snapdragon: 100% faster
  13. 13. Game Connection, October 29-31, 2014 ImgTec Rendering Tips • Hidden Surface Removal – For opaque only – Don’t keep alpha-test enabled all the time – Don’t keep “discard” keyword in shader source, even if it’s not used • Group opaque drawcalls together • Sort on state, not distance
  14. 14. Game Connection, October 29-31, 2014 Framebuffer Resolve/Restore • Expensive to switch Frame Buffer Object on Tile-based GPUs – Saves the current FBO to RAM – Reloads the new FBO from RAM • Best performance: – A single rendertarget for the entire frame – No post-processing passes • Does not apply to NVIDIA Tegra GPUs! – This made it simpler for us to make our “Rivalry” tech demo for K1
  15. 15. Game Connection, October 29-31, 2014 Framebuffer Resolve/Restore • Clear ALL FBO attachments after new frame/rendertarget – Clear after eglSwapBuffers / glBindFramebuffer – Avoids reloading FBO from RAM – NOTE: Do NOT clear unnecessary on non-tile-based GPUs (e.g. NVIDIA) • Discard unused attachments before new frame/rendertarget – Discard before eglSwapBuffers / glBindFramebuffer – Avoids saving unused FBO attachments to RAM – glDiscardFramebufferEXT / glInvalidateFramebuffer
  16. 16. Game Connection, October 29-31, 2014 iOS Performance Profiling • Screenshot from Xcode, which shows: – How we clear FBO at the beginning of every render pass – Other important performance info
  17. 17. Game Connection, October 29-31, 2014 Programmable Blending • GL_EXT_shader_framebuffer_fetch (gl_LastFragData) • Reads current pixel background “for free” • Potential uses: – Custom color blending – Blend by background depth value (depth in alpha) • E.g. Soft intersection against world geometry for particles – Tone-mapping within tile-memory – Deferred shading without resolving G-buffer • Stay on GPU and avoid expensive round-trip to RAM • See also: GL_EXT_shader_pixel_local_storage
  18. 18. Soul: Programmable Blending Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  19. 19. Core Optimization: Opaque Draw Ordering Game Connection, October 29-31, 2014 • All platforms, 1. Group draws by material (shader) to reduce state changes • Then for all platforms except ImgTec, 2. Skybox last: 5 ms/frame savings (vs drawing skybox first) 3. Sort groups nearest first : extra 3 ms/frame savings 4. Sort inside groups nearest first : extra 7 ms/frame savings • (Timings from a UE4 map on a Qualcomm-based device)
  20. 20. Game Connection, October 29-31, 2014 Optimization: Resolution • Resolution does not match GPU performance! • Use custom resolution to match perf/pixel • Examples of same GPU: – iPhone 5S = 0.7 Mpix (1136x640) – iPad Air = 3.1 Mpix (2048x1536), more than 4 times slower! • Other examples: – Prev-Gen Console = 0.9 Mpix (1280x720) – Current-Gen Console = 2.1 Mpix (1920x1080)
  21. 21. Single Content, Multiple Platforms Game Connection, October 29-31, 2014 • Core motivating factor in designing UE4 – Authoring consistency between PC and Mobile • Authoring environment for both platforms – Physically-based shading model – High dynamic range linear color space – High quality post processing
  22. 22. Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143 Comparison: PC
  23. 23. Comparison: Mobile Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  24. 24. Game Connection, October 29-31, 2014 Consistency: Material Editor • One material for many platforms – Artist authored feature levels to scale shader perf from PC to mobile • Using cross compiler tool to retarget HLSL into GLSL
  25. 25. Directional Light + SDF Shadows Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  26. 26. Consistency: HDR Directional Lightmaps Game Connection, October 29-31, 2014 • Two compressed + One uncompressed textures: 1. HDR color with log luma encoding 2. World space 1st order spherical harmonic luma directionality 3. SDF shadow texture • Optimized for Mobile and PVRTC compression: – PVRTC (ImgTec) lacks separately compressed encoding for alpha – PC color: RGB/Luma, LogLuma in Alpha – Mobile color: RGB/Luma * LogLuma, no Alpha (LogLuma derived with ALU) – Mobile: A single dynamic directional light as the “sun”
  27. 27. Image-Based Lighting Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  28. 28. Game Connection, October 29-31, 2014 Image-Based Lighting • Selects a mip-level in an IBL cubemap based on per-pixel roughness – Same as PC, filtering same as PC • PC: FP16 • Mobile: Decode(RGBM)^2 • PC: Blend multiple cubemaps per surface and parallax correction • Mobile: One infinite-distance cubemap per object
  29. 29. 1/2 DOF Downsample 1/2 DOF Filter Tonemapping Game Connection, October 29-31, 2014 Mobile Post Processing Pipeline FP16 RGB=Color A=Depth Anti-Aliasing 1/4 DOF Near Dilation 1/4 Lightshaft Filter, Pass 1 1/4 Final Filter Passes and Merge {Light Shafts, Bloom, Vignette} Bloom Filter Tree 1/4 Lightshaft Filter, Pass 2 1/4 Smart Reduction Light Shaft (Sun) Mask A=Depth to A=CoC+Sun [conversion done on chip if possible] 1/8 1/16 1/32 1/8 1/16 1/32 1/64
  30. 30. Mobile Depth of Field Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  31. 31. Packed Circle of Confusion and Sun Intensity Game Connection, October 29-31, 2014 • Optimization done for Depth of Field and Light Shafts – Represent sun intensity and circle of confusion (CoC) in one FP16 value – Saves needing an extra render target Depth (0 to 65504) CoC (0 to 1) Sun Intensity (1 to 65504) 0=Max Near Bokeh 0.5=In Focus 1=Max Far Bokeh and No Sun 65504=Max Sun (still at Max Far Bokeh)
  32. 32. Vignette + Bloom + Light Shafts Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  33. 33. Vignette + Bloom + Light Shafts: Optimized Game Connection, October 29-31, 2014 • Composited at quarter resolution – Using pre-multiplied alpha FP16 – Also performs the final filtering passes for bloom and light shafts • Optimized to minimize rendertarget switches • 3 effects applied to scene with one full-res TEX fetch – Applied in the tonemap pass
  34. 34. Game Connection, October 29-31, 2014 Mobile Light Shafts • Filtered at quarter resolution • Monochromatic with artist controlled tint on final composite – Bloom and light shaft down-sample pass shared – RGB = color for bloom, A = light shaft intensity • Light shaft filter runs in tonemapped space (8-bit/channel) – Applied in linear (reverse tonemap before tint and composite) • Filtering done by 3 passes of 8 taps
  35. 35. Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143 Mobile Bloom
  36. 36. Game Connection, October 29-31, 2014 Mobile Bloom • Lower quality version to minimize resolves – Limited effect radius and less passes • Turns out to be faster and higher-quality – Standard hierarchical algorithm with some optimizations – Down-sample from 1:1/4 res first (shared with light shaft) – Then down-sample in 1:1/2 resolution passes – Single pass circle-based filter (instead of two-pass Gaussian) • 15 taps on circle during down-sampling • 7 taps for both circles during up-sample+merge pass
  37. 37. Film Post Tonemapping Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  38. 38. Film Post Tonemapping: Mobile Shader Game Connection, October 29-31, 2014 Film Post RGBA8 LDR sRGB Output Quantization Grain (optional) Shadow Tint (optional) Color Matrix {Saturation, Channel Mixer} (optional) Tint, Tonemap Curve Linear to sRGB RGBA16F HDR Linear Color Blend in DOF (optional) 1/2 DOF Film Grain Jitter (optional) Blend in {Bloom, Vignette, Light Shafts} (optional) Film Grain (optional) 1/4 Pre-multiplied Alpha
  39. 39. Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143 Anti-Aliasing
  40. 40. Game Connection, October 29-31, 2014 Anti-Aliasing on Mobile • Using a spatial & temporal anti-aliasing filter – 2x temporal super-sampling (higher quality surface shading) – Blending two jittered frames after tonemapping (32 bpp, perceptual color space) – Some special logic to remove ghosting/judder (not using motion re-projection) • Not currently using MSAA – Needed super-sampled shading for HDR physically based shading model Pixel
  41. 41. All Techniques Combined Game DeGvealmopee Cr Cononnefecrteionnc,e O, Mctoabrcehr 295-3219, 20143
  42. 42. Game Connection, October 29-31, 2014 Android Extension Pack • Released with Android “L” • OpenGL ES 3.1 plus a specific set of extensions – Compute shaders – Tessellation – ASTC – Detailed spec on www.khronos.org: GL_ANDROID_extension_pack_es31a • Full UE4 desktop rendering pipeline on Android! – Deferred rendering with G-buffer – Physically-based shading – Image-based lighting – Solid 30 FPS on NVIDIA K1 mobile GPU
  43. 43. Tech Demo: Rivalry Game Connection, October 29-31, 2014
  44. 44. Game Connection, October 29-31, 2014 Unreal Engine 4 Full source code available! unrealengine.com Includes all C++, shaders, tools, content $19/mo

×