The document discusses rendering techniques for high quality characters in an unannounced game project called A1. It covers skin rendering using subsurface scattering with multiple scattering approximations. It also covers hair rendering using ordered independent transparency with a linked list approach integrated into UE4, as well as a physically based shading model for hair. Future work discussed includes improvements to subsurface scattering, lighting, and shadowing for transparent and translucent materials.
2. • Unannounced project of Nexon / New IP
In Development
By game development experts
Target Platform: High-end PC
- 60 FPS at Ultra Quality / Low-spec PCs will be okay at Medium Quality
• We first announce our project at this conference
A1
3. • Programming Session: Character Rendering
Including resources and implementation in development
World shown in this talk is a temporary
All content can be changed in development
A1@NDC 2016
• 04/27 15:20 Art Session by Art Director
• 04/27 17:05 Programming Session by Senior Gameplay Programmer
4. • Competitive, stunning visuals
Unprecedented quality in Korea
We show current results and technology at this conference
• UE4 + @@@
Use powerful rendering features of UE4
Plus, new rendering/animation/VFX features made by our team
Visual of A1
7. • Multiple scattering
Dipole diffusion approximation
• Included in UE4
Integration of Jimenez’s implementation
We use it without any changes
SSSSS Screen Space SubSurface Scattering
Activision R&D. Property of Activision Publishing. Not Actual Gameplay
12. • No transmission
Ignoring irradiance from outside of visible surfaces
Lack of a screen-space approach
There’s a solution but it isn’t included in UE4
• Low frequency lighting
Can’t handle strong scattering at a short distance
Limit: UE4 SSSSS
Burley, “Extending the Disney BRDF to a BSDF with Integrated Subsurface Scattering”, SIGGRAPH 2015
13. • Transmission (Backlit)
• Back scattering
• Higher frequency lighting
Single Scattering
Single Scattering Multiple Scattering (dipole)
15. + =Single + Multiple Scattering
* Single scattering is exaggerated to show different looks on the presentation screen.
16. ground truth Ki09
• Introduced in ShaderX7
Hyunwoo Ki, “Real-Time Subsurface Scattering using Shadow Maps”
Store translucent irradiance into multiple shadow maps (RSM/TSM style)
Approximate scattering distance by ray marching with shadow map projection
Estimate radiance using the stored irradiance and the scattering distance
• Integrated this technique into UE4 with small changes
Deferred Single Scattering
Single Scattering using Shadow Maps
17. • Shoot rays from the camera
Refract rays on the surfaces
• Draw stepping distance samples
Exploit Quasi Monte Carlo sampling
• Project xp onto shadow maps
To approximate incident scattering distance, Si
Review: [Ki09] Ray Marching
2 0
2
11
,, iiiiit
xs
oi
xs
otos
A π xiiiiiooiiooo
ωdsdω,xLωFeω,ωpeωFx
dAdωωnω,xLω,;xω,xSω,xL
itioto
i
N
i
M
j
iit
xs
oi
xs
otos ω,xEeω,ωpeωFx itioto
0 0
,,
tos lo g
18. 1) Deferred Shadows Pass:
Ki’s ray marching (QMC + shadow projection)
Assume that irradiance and normal are equal for all sampling steps -> Only use the shadow depth map (no additional buffers!)
Constant scattering parameters: limited by G-buffers and performance reason
Output: scalar scattering transfer instead of shadow amount (single channel)
2) Deferred Lighting Pass:
SS = intensity * (scattering transfer * SSS color) * bidirectional Fresnel transmittance * HG phase function
Output: direct lighting + single scattering
• Physically incorrect but plausible looks
Deferred Single Scattering
N
i
M
j
ss
iitoiots
toi
eω,xEω,ωpωF
0 0
,,
2) Lighting Pass 1) Shadowing Pass
20. • Transmission on thin body parts
Backlit effects
• Brighter skin surfaces
With texture masking / artistic choice
• Added approx. 1 ms per light
At a closeup view (worst case)
Rendering Results
21. * Single scattering is exaggerated to show different looks on the presentation screen.
22. • Using dithered, temporal sampling
Like volumetric lighting: Killzone: Shadow Fall, Loads of Fallen, INSIDE, etc.
Temporal reprojection
• Replace shadow rendering for skin
Currently we use conventional PCF (UE4 default)
But we expect that volumetric attenuation by SS can represent shadowing
Future Work: Single Scattering
24. • Layered card mesh with alpha texture
• + Hair strands mesh
• Optional textures:
Color, normal, roughness, AO, specular noise, etc.
• Similar to Destiny and The Order: 1886
Modeling and Texturing
25. • Transparency
Alpha blending: per-pixel drawing order
Lighting: deferred lighting? (incompatible)
Shadowing: deferred shadowing? (incompatible)
• Physically based shading model
Not fit to GGX
Problem
28. • Need per-pixel, sorted alpha blending
• Choice: K-Buffer style
Per-Pixel Linked List (PPLL)
DX11 Unordered Access View (UAV)
Based on an AMD TressFX 2.0 sample
We integrated this into UE4
Order Independent Transparency (OIT)
29. • Head UAV: RWTexture2D<uint>
Head indices for each linked list of pixels on the screen
• PPLL UAV: RWStructuredBuffer<FOitLinkedListDataElement>
Container to store all fragments being shaded
An element is added when each fragment is drawn
Review: PPLL OIT
uint PixelCount = OitLinkedListPPLLDataUAV.IncrementCounter();
int2 UAVTargetIndex = int2(SVPosition.xy);
uint OldStartOffset;
InterlockedExchange(OitLinkedListHeadAddrUAV[UAVTargetIndex], PixelCount, OldStartOffset);
OitLinkedListPPLLDataUAV[PixelCount] = NewElement;
30. • K-Buffer
Do manual alpha blending for the front-most K fragments: sorted
Do manual alpha blending for the remainder fragments: unsorted
• See references in detail
Review: PPLL OIT
float4 BlendTransparency(float4 FragmentColor, float4 FinalColor)
{
float4 OutColor;
OutColor.xyz = mad(-FinalColor.xyz, FragmentColor.w, FinalColor.xyz) + FragmentColor.xyz * FragmentColor.w;
OutColor.w = mad(-FinalColor.w, FragmentColor.w, FinalColor.w);
return OutColor;
}
33. • Serious pixel over draws due to layered geometry
• Optimization: single pass Opacity Thresholding using UAV
Observation: Alpha blending of hair is not for transparency but for rendering thin strands with clean
silhouettes, and hair textures has almost opaque texels (especially inner layers) excepts hair tips
Goal: Reduce drawing invisible pixels occluded by almost opaque pixels from the same geometry
Unordered Opacity Thresholding
Reducing Pixel Over Draws
34. • Set the opacity threshold for each hair material: ex) 0.95
• Do manual Z test with an additional UAV Z buffer RWTexture2D<uint> in PS
Try to write Z if opacity of a fragment is higher than the opacity threshold
Always do Z test with the UAV Z buffer
const uint DepthAsUint = asuint(SVPosition.w);
const uint DepthToWrite = (Opacity > OpacityThreshold) ? DepthAsUint : INVALID_UINT;
uint OldMinDepthAsUint;
InterlockedMin(OitOpacityThresholdingDepthUAV[int2(SVPosition.xy)], DepthToWrite, OldMinDepthAsUint);
if (DepthAsUint > OldMinDepthAsUint)
{
discard;
}
Unordered Opacity Thresholding
35. • Added costs to read and write the UAV but reduced the total rendering cost
• +5~10% rendering speed and -15% memory usage
Reduced heavy lighting costs
Reduced PPLL size
No additional draw calls
Unpredictable performance gain
by the rasterization order
Rendering Results
36. • More efficient unordered opacity thresholding
Exploit the order of triangle indices and locality?
Multiple meshes: adding per-geometry draw calls but reducing per-pixel over draws
• Use ROV for DX12
Future Work: OIT
37. • Per-pixel lighting for transparent materials
• Based on an experimental feature of UE4
• Limited usage
For hair / For glass and others in the future
• Supports approximated shadows
Transparent Deferred Shadows
Forward+ Lighting
38. • 16x16 tiled culling
• Forward light data
Constant buffer: faster than Structured Buffer (NV)
128 bit stride: cache efficiency, under 64KB
SOA? AOS?
Forward+ Lighting
41. • Problem of forward shadowing
Adding complex nested dynamic branch / GPR pressure
Increasing forward light data (CB): shadow matrix and other parameters
Using many VRAM simultaneously (CSM and cube shadow maps)
• Transparent shadow approximation:
Volumetric attenuation on the front-most transparent pixels
Integrated with the deferred shadow pass
Inaccurate but acceptable looks
Transparent Deferred Shadows
42. 1) Transparent Z Pre Pass
Using depth rendering material proxy if needed
The buffer is used for post processing as well
2) Shadow Depth and Deferred Shadows Passes
Shadow Depth Pass: Also using depth rendering material proxy
Deferred Shadows Pass: Compute volume lighting attenuation if transparent Z is less than opaque Z
Resolve the deferred shadowing buffer into a texture array for each light
3) Forward+ Lighting Pass
Fetching shadow amount from the texture array and weighting according to opacity
Transparent Deferred Shadows
float DeferredShadow = DeferredShadowsTextureArray.SampleLevel(PointSampler, float3(ScreenUV, LightIndex), 0).x;
DeferredShadow = lerp(1, DeferredShadow, Opacity);
45. • Many lights and various materials for forward+ lighting
• Faster reading forward light data
• Better shadowing for inner layers
Attenuation by screen Z?
• Solve conflict of storage for transparent shadows and single scattering
Currently single scattering is ignored when hair strands are on the skin
Future Work: Lighting and Shadowing
46. • Marschner’s: hair strand = cylinder
• Three components of specular
Primary reflection: R
Secondary reflection: TRT
Transmission: TT
• + fake light scattering
Physically Based Shading Model
51. • Longitudinal Scattering
Using ALU
Gaussian function instead of lookup table
• Azimuthal Scattering
Very complex math
2D function -> lookup table with constant material properties
Texture 2D Array: CosPhi, CosTheta, HairProfileID
ALU : Lookup Table
float3 ComputeLongitudinalScattering(float3 Theta, float Roughness)
{
const float bR = DecodeHairLongitudinalWidth(Roughness);
const float3 Beta3 = float3(bR, bR * 0.5, bR * 2);
return exp(-0.5 * Square(Theta) / Square(Beta3) ) / (sqrt(2.0 * PI) * Beta3);
}
U V Index
52. • Define per-hair properties
Create a lookup table for azimuthal scattering according to this asset
G buffer-free
Reusable
• Lookup table
Scalar light transfer for TRT and TT / performance reason
R: R, G: TRT, B: TT, A: Unused
Hair Profile Asset
53. • To give scalar TRT/TT spectral color
• Physically, volumetric attenuation by transmission
Angle: light, camera and tangent
Thickness: 1 - opacity (to brighten hair tips)
Travel distance of light: 2 * TT = TRT
Color Shift
const float Thickness = (1.0 - Opacity);
const float ColorTintFactor = saturate(CosThetaD) + Square(Thickness) + 1e-4;
const float2 BaseColorTintPower = float2(0.6, 1.2) / ColorTintFactor;
float3 TTColorTint = pow(BaseColor, BaseColorTintPower.x);
float3 TRTColorTint = pow(BaseColor, BaseColorTintPower.y);
54. • Fake Scattering: similar to UE 4.11
Volumetric attenuation and color shift using transparent deferred shadows
Phase function with camera and light vectors: from forward to backward scattering
Per-light scattering effect
Scattering: Direct Lighting
float3 ScatteringLighting = 0;
if (bShadowed)
{
float HGPhaseFunc = HGPhaseFunctionSchlick(VoL, ScatteringAnisotropy);
float3 ScatteringColor = GBuffer.BaseColor * Shadow;
float3 ScatterAttenuation = saturate(pow(ScatteringColor / Luminance(ScatteringColor), 1.0 - Shadow));
ScatteringLighting = ScatteringColor * ScatterAttenuation * HGPhaseFunc * ScatteringIntensity;
}
60. • Specular parameters
Noise: fuzzy highlight
Roughness: width and strength of highlight
Shift: position of highlight peek
+ binding a hair profile asset
• Diffuse, fake scattering and textures
• Other hair parameters
-> To create various styled hairs
Hair Material
61. • Better scattering
• Material LOD
• Finding the best method for a certain hair style
Future Work: Hair Shading
65. • For a character viewer and a lobby
EVSM: Exponential Variance Shadow Maps
• For the game scene
Sun: PCSS
Other types of lights: PCF / No changes from UE4
Additional shadows: Screen Space Inner Shadows (new feature)
Scene-Specific Shadows
66. • Experimental
• Pre-filtered shadows
No changes from the original algorithm
Nice look but light leaks and low performance
• Optimization
CSM split limits: maximum 2
Scissor test: larger than screen space shadow bounds
EVSM
67. • For Sun in the game scene
• One of an effect of time of lighting changes
Day and night cycle in the game play
Different blur size by time: sharp at noon, soft at sunrise and sunset
Different blur size by distance between occluders and receivers
• Optimization
CSM split limits: maximum 2
• Temporal reprojection
Reduce flickering due to moving Sun, and sampling artifacts
Lerp according to difference between prev and current frame
PCSS
float2 Attenuation = Texture2DSample(SunLightAttenuationTexture, TextureSamplerPoint, PrevScreenUV).xy;
float2 ShadowAndSSSTransmission = float2(Shadow, SSSTransmission);
const float2 TAAFactor = Square(1.0 - abs(ShadowAndSSSTransmission - Attenuation));
ShadowAndSSSTransmission = lerp(ShadowAndSSSTransmission, Attenuation, 0.5 * TAAFactor);
68.
69.
70.
71. • Shadows in shadows:
Darker shadows on environment occluded by characters
Better looks when a character is on the shadowed surfaces
Directionality: difference from AO
• Using scene depth
No preprocessing or asset settings
Comparison> Capsule based: The Order: 1886 or UE 4.11
Screen Space Inner Shadows
72. • G-buffer changes: add caster and receiver bit masks
1) Stencil Masking Pass
Write stencil at shadowed pixels by Sun
Ignore unlit pixels
2) Shadow Rendering Pass: SSR styled ray marching
Shoot rays to the half vector between Sun and Sky
Limit max tracing distance: artistic choice and performance win
Temporal reprojection to the previous frame’s buffer
Screen Space Inner Shadows
L Sky
H
Sun ShadowInner Shadow
73. • ½ resolution buffer
• Separable Gaussian blur
• Approximately 0.5 ms
• Shadow amount is applied for
sky lighting and GI with a receiver mask
Screen Space Inner Shadows
74. • Important for game visuals
Look
Day and night cycle
• How to reduce costs and flickering?
High draw calls
Moving Sun
Future Work: Shadow Rendering
76. • Movement of short hair
• Trembling body fat and cloth wrinkles
• To reduce work load of artists
Goal
77. • Define simulator assets in the editor
Currently we support spring simulation only
• Group vertices
According to vertex colors roughly painted by artists, and the simulator assets
• Sample simulator bones
In the bounds of a vertex group / according to a density setting / snap to the nearest vertex
Poisson distribution / deterministic sampling
• Rig the simulator bones
Simulator bone -> become a child of the nearest bone in a character
Vertex -> rigged with the nearest simulator bone
Distance based skinning weights
Algorithm
78. • Trying to rich animation with various methods
Module based animation
Procedural animation
Physics simulation
• They may be introduced by other conferences
ex) NDC 2017?
Animation Techniques of A1
80. • Using the GPU profiler of UE4
Checking rendering costs and doing high level shader optimization
• Using RenderDoc, Intel GPA, and AMD GPU PerfStudio
Debugging shader code and doing low level shader optimization
• Rewriting shader code by hand with optimization references
Approach
81. • Only critical parts of shader written by Epic Games
To upgrade a new version of the engine continously
We will check all of shader code before shipping
• Currently we focus on shader written by ours
• This talk shows examples of our optimization
Target Code
82. Before After
Static Branch with Preprocessor
float3 L = (LightPositionAndIsDirectional.w == 1)
? -LightPositionAndIsDirectional.xyz
: normalize(LightPositionAndIsDirectional.xyz
- OpaqueWorldPosition);
#if USE_FADE_PLANE // CSM case = directional light
float3 L = -LightPositionAndIsDirectional.xyz;
#else
float3 L = (LightPositionAndIsDirectional.w == 1)
? -LightPositionAndIsDirectional.xyz
: normalize(LightPositionAndIsDirectional.xyz
- OpaqueWorldPosition);
#endif
84. Before After
Share Preceding Computation
for (int X = 0; X < NumSamplesSqrt; ++X)
{
for (int Y = 0; Y < NumSamplesSqrt; Y++)
{
float2 ShadowOffset = TexelSize * StepSize
* float2(X, Y);
const float2 BaseTexelSize = TexelSize * StepSize;
…
for (int X = 0; X < NumSamplesSqrt; ++X)
{
for (int Y = 0; Y < NumSamplesSqrt; Y++)
{
float2 ShadowOffset = BaseTexelSize * float2(X, Y);
85. Before After
Rearrange Scalar/Vector Operation
float3 DiffuseLighting = (TangentDiffuse * GBuffer.DiffuseColor)
/ PI * NoLWrapped * ShadowColor;
float3 DiffuseLighting = GBuffer.DiffuseColor
* (TangentDiffuse / PI * NoLWrapped * ShadowColor);
86. Before After
Use Modifiers as Input
Force += -normalize(Position) * Simulation.GravityStrength; Force += normalize(-Position) * Simulation.GravityStrength;
87. Before After
Rearrange Code Lines
// 무언가 긴 작업…
//
clip(OpacityMask);
// 셰이더 메인이 시작 후 최대한 빨리…
//
clip(OpacityMask);
88. Use termination if possible
Early-Z, Stencil, and Discard
if (GBuffer.ShadingModelID != 0)
{
discard;
}
float Attenuation = Texture2DSample(SunLightAttenuationTexture, TextureSamplerPoint, ScreenUV).x;
if (Attenuation > 0.99)
{
discard;
}
-------
[EARLYDEPTHSTENCIL]
void OrderIndependentTransparencyCompositePixelMain(
float2 InScreenUV: TexCoord0, float4 InSVPosition: SV_Position, out float4 OutColor: SV_Target0)
{
…
90. • Continuous work
• Optimization of shader written by Epic Games
• Although aggressive and smart optimization of a shader compiler is
amazing, it sometimes produces unwanted results.
• Need explicitly writing GPU friendly code and checking disassembly
Future Work: Low Level Shader Optimization
91. • New IP of Nexon
• AAA-Quality Graphics
• With Cutting Edge Graphics Technology
• In Development
Summary of This Talk
92. • Colleagues of A1
• Support Team of Epic Korea
• Authors of References
Acknowledgement