In recent years, there has been a significant maturation in ray tracing technology. With Vulkan officially embracing ray tracing within its specifications, and mobile device GPUs beginning to offer support, the landscape is evolving rapidly. This agenda promises to delve into the foundational principles of ray tracing, the integration of ray tracing into Vulkan, and the essential rendering pipeline of Lumen in UE5. Furthermore, it will offer invaluable insights from content creators on the most effective strategies for maximizing the performance of Lumen on mobile.
4. Arm’s Most Efficient GPUs Ever
All improvements are compared to the same configuration of Immortalis-G715,
implemented on the same silicon process
S y s t e m - l e v e l E f f i c i e n c y
Up to 40%
less memory bandwidth usage
G P U E f f i c i e n c y
Average 15%
more performance per Watt
H i g h e s t P e r f o r m a n c e
Average 15%
more peak performance
H D R R e n d e r i n g
Architectural throughput
for 64bpp Texturing
2x
Hardware
Ray Tracing
5. Steel Arms is the latest Immortalis based demo from Arm to pioneer the new
frontier of next-gen graphics technology.
Created with Unreal Engine 5.3, the demo brings desktop level Bloom, Motion
Blur and DOF effects, alongside PBR to smartphone. With the power of
Immortalis, Steel Arms unleashes the full potential of Ray Tracing for shadows
and Lumen, opening a new era of mobile graphics beyond rasterization.
7. ● Object by object
● Triangles projected onto screen
● Check pixel coverage
● Use Z-Buffer for visibility
Rasterization
8. ● Pixel by pixel
● Cast a ray from camera to pixel
● Check triangle intersection
● Use closest-hit for visibility
● More rays for more complex
rendering
Ray Tracing
10. ● Ray queries can be used to
perform ray traversal and get a
result back in any shader stage
● Other than requiring
acceleration structures, ray
queries are performed using
only a set of new shader
instructions
Vulkan Ray Query
11. ● Optimised data structure
● Minimises intersection tests
● Quickly find what a ray has hit
● User can control the topology
● Bottom Level (BLAS)
● Contain index and vertex data
● Hierarchical bounding volumes
● Top Level (TLAS)
● BLAS grouped in instances with
● Transform data (animations)
● Custom ID (materials)
Acceleration Structure TLAS
Instance Instance Instance
BLAS BLAS BLAS
Instance
12. GLSL Sample
Ray queries are initialized with an
acceleration structure to query
against, ray flags determining
properties of the traversal, a cull
mask, and a geometric
description of the ray being
traced.
rayQueryEXT rq;
rayQueryInitializeEXT(rq, accStruct,
gl_RayFlagsTerminateOnFirstHitEXT |
gl_RayFlagsOpaqueEXT,
cullMask,
origin, tMin, direction, tMax);
// Traverse the acceleration structure
rayQueryProceedEXT(rq);
// Check intersections (if any)
if (rayQueryGetIntersectionTypeEXT(
rq, true)!=
gl_RayQueryCommittedIntersectionNoneEXT)
{
// In shadow
}
14. Lumen Lighting Pipeline
●Update surface cache
●Lumen scene lighting
● Direct lighting
● Indirect lighting trace
● Generate final lighting
●Lumen screen probe gather
● Place screen space probes
● Probe trace
● Screen trace
● Near lighting trace
● Distant lighting trace
●Lumen reflection trace
●Direct lighting
Update
Surface Cache
Lumen Scene
Lighting
Last Frame
Radiance
Cache
Screen Probe
Gather
Reflection
Trace
15. Lumen Scene
●Lumen scene is a simplified scene
description
●Use 2 data to descript the scene
●Signed Distance Field
● Only be used when doing software ray tracing
● Hardware ray tracing will use acceleration structure
instead
●Surface Cache
● Used to cache material data
● Quick sample the material data when ray hit
● For both software and hardware ray tracing
19. Lumen Scene Lighting
●Direct Lighting
● Tiled deferred shading in surface space
● Each tile can control the max number of light sources
● Can control lighting update rate
●Indirect Lighting from radiosity
● Use last frame cache data as radiosity source
● Place hemispherical probes on top of surface cache to gather radiosity
● Can control the number of probes and gather rays
●Finally store direct and indirect lighting in final lighting atlas
20. Lumen Screen Probe Gather
●Place probes on pixels using the GBuffer
●Adaptive downsampling
●Trace from probes and sample radiance cache atlas to generate screen space
radiance cache
●Screen probe only trace to 2.0 meters
●Place world probes around screen probes to gather distant lighting
16 4
8
Source: SIGGRAPH 2022 - Lumen: Real-time Global Illumination in Unreal Engine 5
21. Lumen Screen Probe Gather
Screen Trace + Near Lighting Trace + Distant Lighting Trace
= Screen Space Radiance
22. Lumen Reflection Trace
●When roughness is higher than MaxRoughnessToTrace then reuse the
screen space radiance cache
●When roughness is lower than MaxRoughnessToTrace then do the extra ray
tracing
●Same tracing pipeline of screen probe tracing
●MaxRoughnessToTrace can be customized
25. How To Enable Hardware Ray Tracing on Mobile
●Enable SM5 shader format
●r.Android.DisableVulkanSM5Support=0
●Enable deferred shading mode
●Enable Support Hardware Ray Tracing
●Enable Use Hardware Ray Tracing when available
●r.RayTracing.AllowInline=1
27. BVH optimization
●Exclude the objects which are not contributing to lighting from ray tracing
●Reduce the overlap of meshes
●Use instanced static mesh to reduce the memory usage of BLAS
●Skinned mesh needs update BLAS at run-time
●Use higher LOD level of skinned mesh for ray tracing
●May cause artifact when using hardware ray tracing shadow
30. Ray Query Shader
Optimization
FLumenMinimalRayResult TraceLumenMinimalRay(
in RaytracingAccelerationStructure TLAS,
FRayDesc Ray,
inout FRayTracedLightingContext Context)
{
FLumenMinimalPayload Payload =
(FLumenMinimalPayload)0;
FLumenMinimalRayResult MinimalRayResult
= InitLumenMinimalRayResult();
//uint RayFlags =
RAY_FLAG_FORCE_NON_OPAQUE; // Run any-
hit shader
uint RayFlags = 0;
In shader
LumenHardwareRayTracingCommon.ush
, the ray query flag is set to
RAY_FLAG_FORCE_NON_OPAQUE
which will use slow path of ray traversal on
mobile. Change it to 0 can speed up the
ray traversal performance up to 32%
on Immortalis G720.
From 30 fps to 40 fps in Steel Arms case.
31. Lumen General Setting Optimization
●Lumen Scene Detail
● Higher value can make sure smaller objects can also contribute to Lumen lighting but will also
increase GPU cost
●Final Gather Quality
● Control the density of the screen probes, higher value increase GPU cost
● 1.0 should reach a good balance between performance and quality for mobile game
●Max Trace Distance
● Control how far the ray tracing will go, keep it small can decrease GPU cost
● Don’t set it bigger than the size of the scene
32. Lumen General Setting Optimization
●Scene Capture Cache Resolution Scale
● Control the surface cache resolution, smaller value can save memory
●Lumen Scene Lighting Update Speed
● Can keep it low if the lighting changes are slow to save GPU cost
● 0.5 ~ 1.0 should reach a good balance between performance and quality for mobile game
●Final Gather Lighting Update Speed
● Can keep it low if slow lighting propagation is acceptable
● 0.5 ~ 1.0 should reach a good balance between performance and quality for mobile game
33. Lumen General Setting Optimization
●Reflection Quality
● Control the reflection tracing quality
●Ray Lighting Mode
● Hit Lighting is available when using hardware ray tracing, it evaluates direct lighting instead
of using surface cache
● Hit Lighting mode has higher quality with higher GPU cost
● Hit Lighting mode can reflect direct lighting of skinned mesh
●Max Reflection Bounces
● Control the amount of reflection bounces, higher value has higher GPU cost
34. Lumen Scene Lighting Optimization
●r.LumenScene.DirectLighting.MaxLightsPerTile
● Control the maximum number of lights per tile for direct lighting evaluation
●r.LumenScene.DirectLighting.UpdateFactor
● Control the per frame update area of direct lighting, higher value improve the performance
●r.LumenScene.Radiosity.UpdateFactor
● Control the per frame update area of indirect lighting, higher value improve the performance
35. Lumen Scene Lighting Optimization
●r.LumenScene.Radiosity.ProbeSpacing
● Control the density of probes, higher value improve the performance by placing less probes
●r.LumenScene.Radiosity.HemisphereProbeResolution
● The resolution of probe, lower value can save memory
●r.LumenScene.FarField
● Set it to 0 if you don’t need far-field hardware ray tracing
●r.DistanceFields.SupportEvenIfHardwareRayTracingSupported
● Set it to 0 if you don’t need software Lumen support, save memory and scene update cost
36. Lumen Screen Probe Gather Optimization
●r.Lumen.ScreenProbeGather.RadianceCache.ProbeResolution
● Control the probe atlas texture size, lower value save the memory
●r.Lumen.ScreenProbeGather.RadianceCache.NumProbesToTraceBudget
● Control the number of probes to be updated per frame, lower value improves the performance
●r.Lumen.ScreenProbeGather.DownsampleFactor
● Factor to downsample the GI resolution, higher value improves the performance
37. Lumen Screen Probe Gather Optimization
●r.Lumen.ScreenProbeGather.TracingOctahedronResolution
● Control the number of rays per screen probe, lower value improves the performance
●r.Lumen.ScreenProbeGather.ScreenTraces
● Using screen trace or not
●r.Lumen.ScreenProbeGather.ScreenTraces.HZBTraversal.FullResDepth
● Using full resolution depth for screen trace or not. Set 0 to improve the performance
●r.Lumen.ScreenProbeGather.ShortRangeAO
● Enable short range ambient occlusion or not
39. Lumen Reflection Optimization
●r.Lumen.Reflections.RadianceCache
● Resuse the radiance cache for reflection or not, set 1 to speed up ray tracing
●r.Lumen.Reflections.DownsampleFactor
● Downsample factor for reflection, higher value improves the performance
●r.Lumen.Reflections.MaxRoughnessToTrace
● Set the max roughness value for which dedicated reflection rays should be traced
● Otherwise the reflection will reuse the screen space radiance cache