Based on a real-world use-case, Will Armstrong, from Unity's Spotlight team, walks users through the process of finding out where performance and memory issues are coming from, how to use tools to track these issues down, and how to make UI that doesn't generate garbage.
Speakers:
William Armstrong - Unity Technologies
3. Test Case
• International Font Support
• Changing UI
• Added to the Shooter Demo
• Dynamic uGUI elements
4. Test Case
• International Font Support
• Changing UI
• Based on the Shooter Demo
• Dynamic uGUI elements
• That… allocates ~1kb per frame
5. Spoiler Alert
• Limit regular UI alterations that cause layout to dirty
• Pool anything you spawn
• Have different pools for different kinds of things
Good news! You don’t have to keep living with your garbage.
Earlier this year we took a trip to Seoul.
We were talking to lots of clients and potential clients, seeing what they were doing, trying to find the next great Unity Game to shine our Spotlight on.
Almost every team we talked to had the same problem.
They were displaying lots of text on screen, mostly player names floating above their character heads. Nametags in an MMO.
These were causing lots of performance issues, mostly having to do with Garbage Collection. As names changed and players wandered into and out of view, the changing UI would generate garbage, cause Garbage Collection, and cause major performance hits.
I found myself giving the same advice to all of these teams. Best practices, how to think of this problem, what specific tools they should look at.
Over and over this same advice, broken down into small bits. Back and forth over the specific details. Memory Pools, TextMeshPro, Signed Distance Fields, how to profile.
At every company.
For every game.
Through a translator.
Thousands of dollars to send our team out there, thousands more in salary sitting there in these meetings, taking a huge chunk of time to fix the same problem over and over again.
When I got back to SF, I decided to build that use case myself, and go through the steps of solving it. Mostly because I wanted to make sure that I had told everyone the right thing. But I also wanted to go through the process, make myself slow down and think about how I learned to solve garbage issues, and how to teach our users, you, here, now, to do this for themselves.
I am going to walk us through how I broke this use case down and show you how to find and destroy your own memory allocation issues. I’m going to cover some technical specifics of the problem, but I am going to focus on the methods for tracking the problem area down and how to think of solutions, not just how to render a nametag efficiently.
Though, you will all be able to do that too.
First things first, I needed a test case.
Given the sensitive nature of many of our projects, I had to make one myself. To start, I did things the most obvious way.
I took our old friend the Sample Shooter, and added nametags to all the enemies.
I added these to the Prefabs we were already spawning, each with their own canvas.
To simulate new players, with different names, coming into and out of vision, I decided to swap out the language of the enemy names rapidly, switching between Korean and English.
This hit all the problems that our clients were facing.
I glanced at the Profiler briefly to make sure we were seeing the same problems that our Clients were reporting. Sure enough, this solution was allocating 22.2 kb every time we spawn something, 12.7 kb of that seems to be coming from UI at a glance. Worse, we are also seeing over 1 kb of per frame allocations when we aren’t spawning anything. This is generating a ton of garbage that will have to be collected.
I don’t know if you’ve heard this, but Unity’s Garbage Collector… could be better.
Spoiler Alert – we are going to get rid of our per frame allocations entirely
The bulk of the allocations came from UI Layout Group recalculating every frame. Avoid regularly altering the Rect Transfrom of any thing that resizes your Layout Group or has a parent with a Content Size fitter
Introduce self and the spotlight team
Best Practices for Memory
Instantly, I set off doing the exactly wrong thing. I know a bunch of tricks to save memory and performance so just started doing them.
TextMeshPro
Performs better and looks better than old school atlas based fonts. Only requires a single signed-distance-field atlas to generate good results at all resolutions. Hooked that up. Fed in a nice free adobe font with support for the Korean character set, rasterized to a SDF, and had it generate a single texture for all my characters. Ended up being a 4k texture because there are a lot of characters needed.
Pooling
Made a pool to spawn all name tags as a child of a shared canvas. I remembered that any time anything on a Canvas dirties, the whole thing has to update, so you want to group your UI into dynamic and static Canvases or Canvas Groups.Spawning things from prefab is slow, so lets pre-allocate a pool. This also lets me only show the name tags for characters on screen, rather than needing a nametag for every character everywhere.
If you don’t know the term, Pooling is building a shared resource ahead of time, your pool, and then pulling resources out and returning them as they are needed. This lets you load and instantiate a type of object well ahead of needing it. In this case, I made a simple class that just pre-allocates a list of GameObjects that holds all my UI, and has some simple logic for keeping track of which GameObject is my next Deactivated one. There are lots of good Asset Store solutions you can buy, and great examples of this code online.
I highly recommend taking a moment and getting comfortable with a nice generic pooling solution early in every project. If you pool things by default, you will have much greater control of your memory usage as your game grows.
I do all of that, wasn’t very hard, should be good now, so open up the profiler and…
Fire up the profiler, sort by allocation and… its still bad. Really should have looked at this first. While everything I did helped, it was not the most bang for buck. As you are trying to optimize your own projects, every change counts. Always profile first. Don’t be like me.
Quick Trick – turn off Vsync to get more readable profiling results for CPU Usage
In this case I am concerned with per frame allocations, not total memory. But I might as well take a look, make sure there isn’t anything crazy. You get to the memory profiler by selecting the memory graph up top, picking a frame you want to look into, switching to detailed mode, and then hitting the Gather Object ReHere is a my one giant Text Mesh Pro texture. Hmm, takes up more memory than the default texture atlas solution. Looks better though, and supports more features. If I made the default look this good, it would take up more memory too.
All my pooling didn’t touch the total memory usage, but did lower the per frame, so I will take that win.
Back to our per frame allocation problem
You can see here we are getting these allocations from some Unity function Canvas.SendWillRenderCanvas()I know Canvas is a Unity class, I didn’t write it, and I don’t know what this code does.
Lets fix that.
Now that I know I have a problem, and know some information about where it is coming from, I can try to track down the source of the issue.Literally.
We have several places you can find Unity source code these days. These are just some examples. Lots of good stuff gets added to our Github there.
As an example, I just made a new small tool available on GitHub myself. A little utility that lets you set up GUIDs for game objects and then reference them even if they are in another scene.
LayoutGroup.OnRectTransformDimensionsChange() looks like the problem. To the code!base.OnRect.. Is empty, not it.Problem must be in SetDirty(). That already lets us know we might be able to work around this problem by not changing rect transform dimensions or doing anything else to dirty the layoutgroup. But lets dig in and see if the memory is going somewhere sensible.
I see a StartCoroutine, which those always allocate memory. But only a little.
I dig in a bit deeper.
Hmm, an Initialize() function. Those quite often set up memory.
In this case, it is copying a RectTransform struct, and looks like it is doing some boxing. Not great, but not obviously wrong either.
Looking into the rest of the function calls, we have an internal registration function.
That is doing a couple of searches on a Queue and some more boxing. Again, not ideal, but not obviously broken either.
So this memory usage seems appropriate. There is not a major bug in here that we can fix in the UGUI code to remove all the allocations.
So we need to work around it.
We know that all this allocating code gets called from the OnRectTransfromDimensionChange callback, and that is nicely named.
What is causing us to change our transform dimensions? I am pooling all the UI, so it could be when I first turn one on, but that would lead to spikes, not constant per-frame allocation.
So it must be something I am doing every frame. Which, for this test case, is translating the name plates between English and Korean every frame. That causes the text to resize.
LayoutGroup is the big clue! I have a content size fitter on there matching the backing faceplate. Since these are names, and we already set a fixed size on those, there is no reason to not just standardize the backing image. I only did it this way on a lark anyway.
So lets go turn off the content size fitter, set our text to best fit, and see what we get.
No allocations!
Ok, almost no allocations. 90 bytes are coming from some other UI code, but that could be a GetComponent in editor or a string concatenation or something. 90 bytes is a huge improvement though!
All of this was done in our old friend Unity 5.6. How many here are still on the 5.x family?
On the spotlight team we see all sorts of projects, on every version of Unity.
I wanted to double check my findings and make sure this was consistent across more modern versions of Unity. They hold steady. In 2017 and 2018, altering an uGUI layouts bounds will cause these kind of allocations.
I wanted to double check my findings and make sure this was consistent across more modern versions of Unity. They hold steady. In 2017 and 2018, altering an uGUI layouts bounds will cause these kind of allocations.
This also gave me a chance to look at my solution in the UI Profiler, added in 2017
The UI Profiler is a special purpose profiler similar to the Frame Debugger but just for UI.
Looking at my solution it looks… not great.
You can see here, we are getting 2 draw calls per nameplate.
The UI Profiler is very similar to our Frame Debugger, but focused on UI. It will show you how your UI is batching and give you hints about why a batch gets broken.
In this case, it is telling me that the objects I am rendering have different Material Instances. Which is true, since it is drawing the Nametags and Text in order, switching between those two materials.
This is due to them being able to stack on top of each other. Canvas Renderers are drawn in order from top to bottom in the Heirarchy and that is the order they are sorted in for occlusion. Since we want the names to be occluded by nametags in front of them, we need to leave this ordering alone.
So while I don’t see a great way to improve this draw call performance, I at least know where my batches are going.
All of our built in Profiling tools are great for finding the low hanging fruit. If you are allocating kb a frame, it is easy to do better than that. You should always drop into these tools first, they are integrated, easy to use, and will let you know where to focus your early efforts.
However, none of our generic tools can really tell you what is going on your specific hardware. For that, you need more customized tools. PIX on XBox, Razr on Sony, every platform has their own version. Our generic tools are great for knowing, relatively, what areas of your code are the problem. But they are not build to get accurate absolute measurements. Don’t focus too much on how many milliseconds our profiler is saying you spend. Just find the places that are the most expensive and make them cheaper.
Here is where we started - ~12 kb of UI allocations on every monster spawn. ~1 kb of garbage generated every frame.
Here is where we are now. 0 and 0.
This still isn’t perfect. Ideally, we would be pooling our AI before spawning them just like we pool their nametags. When pooling complex prefabs, like an AI, always be sure to Profile as you go. Some internal systems perform better when disabled rather than de-activated. For example, Animators free all their scratch memory when the GameObject they are part of is Deactivated. But not if the Animator is Disabled. If you are pooling and unpooling Animators rapidly, you might be better off disabling every Component than Deactivating the GameObject entirely.
The reason I care so much about per-frame allocation is that Garbage Collection is quite expensive. While work is ongoing under the hood to improve that, it will never be free, or even fast. As your game scales, the GC algorithm has to look at every Object to make sure that everything you have is still needed. By keeping your per frame allocations low, or zero, you can increase the time between GC hitches and have your game keep performing well as it scales up.
Use the Profiler. Don’t just assume you know where your memory is coming from.
Look at our code. Between bitbucket, github, and Package Manager, more and more of our code is available to you. Use it.
Don’t alter the Rect bounds of a Layout Group every frame. It allocates memory.
Make a pool. Don’t allocate on demand, allocate ahead of time.