With the advent of world class engines like Unity, game development has never been easier. Developers can make deploy to multiple platforms quickly and easily, and optimize for all. Come learn to identify performance issues and their sources using Unity tools and the Intel Graphics Performance Analyzer. Along the way, we will cover some key optimization tips and Unity game development methods to keep your game fast and fantastic
Contents
FPS (Frametime)
The amount of time to process and render one frame. Only includes time to update and render game view, not editor scene view, inspector update or other editor only processing
Batches
Combined rendering of multiple objects into a chunk of memory in order to reduce CPU overhead due to resources switching (Draw calls?)
Saved By Batching
Number of batches that were combined. For best batching, share materials between as many obj as possible.
Tris and Verts
The number of triangles and vertices drawn
Screen
The screen dimensions + anti-aliasing level and memory usage
SetPass
The number of rendering passes. Each pass requires Unity runtime to bind a new shader (which may introduce CPU overhead)
Visible Skinned Meshes
Number of skinned meshes rendered
Animations
Number of animations playing
Must be a Development Build
Current Data + last several hundred frames
Sub Profilers
CPU usage, GPU usage, Rendering, Memory, Audio, Physics, and Physics 2D
CPU
Metrics: Rendering, Scripts, Physics, GarbageCollector, Vsync, and Others sections.
Deep Profiling – all script code profiled to the function. Large overhead, lots o memory
Manual Deep Profiling with Profiler. BeginSample
CPU Render Time: Camera. Render
Self – time in function, not subfunctions
GC Alloc -- memory allocated in the current frame later collected by the garbage collector. Keep at 0 to avoid framrate hiccups
Memory
Memory in Unity, heap size, gfxDriver, audio driver, profiler, object count (num obj created. If increases, never destroyed)
Detailed not realtime
Audio
Physics
Active/Sleeping rigidbodies, number of contacts, static/dynamic colliders
No dev build needed
Metrics:
CPU, GPU, Memory, Power, Rasterizer, Vertex-Shader, IA, Output-Merger, Device IO, EU’s
Contains:
All state changes, resources, timing info, and much more
Format
List of render targets
Draw calls/clears/compute shaders – whatever takes time
Details Panel
Detail Per Draw
Frame Overview: timing/stat broken down by GPU pipeline
Details: ^ per draw call
Texture: list of currently bound textures
Shaders: view compiled shaders. Edit and see the effects immediately.
Geometry
Experiments
Null Hardware: Infinitely fast GPU Hardware
Disable Draw Calls: Infinitely fast GPU & Driver
2x2 Textures: simple tex
Simple Pixel Shader
Use: update function on a script is particularly expensive and doesn’t need to be called often
Co-routines: functions with the ability to pause and resume execution.
Start with Start() and set new update protocol
Disable meshrenderer when fully transparent
reduce visible geometry and draw-calls in complex static scenes with lots of occlusion. Design levels with occlusion in mind.
Goes through scene with virtual camera
Makes hierarchy of potentially visible sets (PVS) or objects
Data is composed of sells. View cells for static, target cells for moving
The occlusion culling process goes through the scene using a virtual camera to build a hierarchy of potentially visible sets of objects. Each camera uses this data at runtime to determine what's visible.
Avoid overdraw: renderqueue ordering
Every draw call induces significant graphics API overhead, largely due to state changes between calls (e.g. switching to a different material) which causes expensive validation and translation steps in the gfx driver.
Static batching works by transforming the static objects into world space and building a big vertex + index buffer for them. Then for visible objects w/I a batch, a series of "cheap" draw calls--requiring no state changes--is performed. (Does not actually reduce draw calls, just state changes which are the expensive part anyhow)
Thus requires additional mem to store the combined geom. Even if geom is identical, a copy is created for each object. Thus: not ideal for a dense forest (^rendering performance but MASSIVE memory footprint)
When using lots of pixel lights in the forward rendering path combining may not be ideal: meshes far enough apart to be affected by different pixel lights, if combined, must be rendered once for each pixel light.
Use compressed textures, not uncompressed 32-bit RGBA (use 16!)--> smaller memory footprint, faster load times
Enable Generate Mipmaps in 3d scenes (let GPU use lower res for smaller tris)
UNLESS texel maps 1:1 with rendered screen pixels (2d games, UI elements)
Catch-22: Incurs 33% more memory per texture. Worth it unless memory is a real limiting factor (mobile)
Use pixel shader or texture combiners to mix textures instead of multiple passes
Where possible, just use fewer textures
FORWARD
Pass BreakdownBase pass
First per-pixel light reserved for brightest directional light.
Next, up to 3 other per-pixel lights that are marked as important are drawn. If no lights are marked as important, the next 3 brightest from the scene are chosen. If there are more lights marked as important that exceed the “per-pixel light count” setting value in Project->Quality, then these are done in additional passes.
Next, up to 4 lights are rendered per-vertex.
Finally, remaining lights are drawn using spherical harmonic calculations (these values are always calculated, so essentially free on GPU).
Per-pixel Lighting Pass
An additional pass done for each per-pixel light remaining after the base pass.
Semi-transparent Object Pass
An additional pass done for semi-transparent objects.
DEFERRED
Bake all scene light into light map
As long as you have mem/sampler headroom
Use light probes for dynamic objects
Bake all scene light into light map
As long as you have mem/sampler headroom
Use light probes for dynamic objects
Non-Directional
Flat Diffuse. Single lightmap storing info about how much light the surface emits assuming it's purely diffuse. Normalmaps and specularity aren't used.
One texture, one texture sample, a few extra shader instructions
Directional
normalmapped diffuse. Adds a secondary lightmap storing the dominant light direction and factor proportional to how much light from the first lightmap is the from light along the dominant direction.
Two textures, two texture samples, a few more extra shader instructions
Directional with Specular
full shading. Uses two lightmaps like directional, but split in halves. Left side stores direct light, right stores indirect. Light is stored as incoming intensity. This allows the shader to run the BRDF usually reserved for realtime lights for a full-featured material appearance.
Two textures (twice the size of Directional), four textures samples, high extra shader cost (about equivalent to two un-shadowed lights)
Shadow Filtering – Method used to filter shadows
Hard - When sampling from the shadow map, Unity takes the nearest shadow map pixel
Soft – Averages several shadow map pixels to create smoother shadows. This options is more expensive, but creates a more natural looking shadow
Shadow Resolution – Resolution of the generated shadow map
Can significantly affect performance if using many point / spot lights
Shadow Projection – Method used to project shadows
Stable – Renders lower resolution shadows that do not cause wobbling if the camera moves
Close Fit – Renders higher resolution shadow maps that can wobble slightly if the camera moves
Shadow Cascades – The number of parallel splits used in cascades shadow maps (cascades closer to the viewer have higher resolution for improved quality)
Can significantly impact directional light performance
hadow Distance – Max distance from object that shadows can project
Can significantly affect fragment shader performance if using directional light
Can be changed on the fly via script
Performance results will vary as the GPU usage is dependent on the scene and how many objects are casting/receiving shadows. As always, it is important to use the lowest quality settings required to achieve the desired look. It is generally recommended to change the default shadow distance to a lower value.