Introducing MicroProfiler memory profiling, flame graphs, diffs, and much more

ZenMa1n · October 24, 2024, 9:26pm

Hello Creators,

We’ve recently made many improvements to the MicroProfiler. The latest update introduces detailed memory profiling, flamegraphs, and dump comparisons.

Key features include:

X-Ray mode for visualizing memory allocations and identifying areas with excessive memory operation intensity.
Summary FlameGraphs for an aggregated view of CPU and memory usage, aiding in understanding overall performance.
Diff FlameGraphs for easy comparison of two dumps, highlighting performance changes.

Performance and usability improvements:

Five times more compact dumps.
Improved browser scaling support and frame time bottleneck information.
Added Re-capture and Save To File buttons, along with a Ctrl+F hotkey to find a timer by name.

These changes make MicroProfiler a more powerful and easier-to-use tool for performance analysis and optimization.

For a more detailed guide on using the new features, see the full article.
If you’re just getting started with the MicroProfiler, check out the documentation for an overview.

Watch the video below for an overview of our updates:

If you have any ideas for further additions, please let us know in the comments. Thank you for using MicroProfiler and caring about performance!

Happy building!
ZenMa1n

Appendix

X-Ray Mode for Memory Profiling

The detailed picture of memory allocations/deallocations is now accessible for individual scopes. We aimed to keep the interface simple, so allocation intensity is displayed as an overlay using shades of gray, hence the name X-Ray mode.

The two gray bars at the top allow you to spot areas with excessive memory allocation, even if you were profiling CPU rather than memory. If you notice such areas in a frame, switch to X-Ray mode by pressing the X key on your keyboard or by selecting X-Ray → Main View from the top menu. CPU scopes will turn grayscale and you’ll see which ones are experiencing the most allocations, either by number or total size (yes, those will appear brighter than the others).

Summary FlameGraph for CPU and Memory

In detailed mode, it’s not always immediately clear what the overall CPU or memory usage looks like, especially in cases where a particular task consumes few resources but is called very frequently, subtly affecting the entire frame. For these situations, we created a mode that aggregates all individual call stacks into one large graph – this is the Summary FlameGraph.

You can find them in the top menu under Export → CPU Flamegraph and Memory Flamegraph. When you click on one of them, a new browser tab will open (or prompt you to download a file). The CPU and memory Flamegraphs look similar. When you hover over a scope, detailed information about it will be displayed at the bottom, including total CPU time, the number of memory allocations, or summarized allocation size depending on the selected mode. Click on a scope to zoom in.

It’s important to note that in these Flamegraphs, the statistics are shown only for the frames included in the dump, and not for the entire duration since the start of the experience.

Diff FlameGraphs

When we make changes to the experience, it’s nice to understand whether they lead to performance improvements or degradations without squinting at numbers, right? For this, we’ve introduced Diff FlameGraphs.

This is where working with MicroProfiler in the browser is a more powerful tool than working in the Roblox Player or Studio window. Since we save dumps to disk, we can track progress in improving performance over time (over weeks, months, from version to version, etc.).

Open one dump and drag the second dump (HTML file) directly into the browser window — a popup will appear where you can drop the second dump (you can also access this window from the top menu by clicking Export → Diff / Combine).

In the Left section (marked in Green), the name of the currently opened dump will appear automatically, while you drop the second dump in the Right section (marked in Blue). Click Combine & Compare, and in a new tab (or as a downloaded file), you’ll get a comparison like this.

The color of the scopes here depends on which dump (Left/Green or Right/Blue) consumes more CPU/Memory resources. The brighter the color, the greater the difference.

The displayed values are averaged per frame. This is because the dumps being compared can contain a different number of frames (for example, 32 and 128), and we still want to compare them, so we calculate the values for one averaged frame from both sides before comparing.

Five Times More Compact Dumps

We’ve revamped the data representation format (though it’s still an .html page), and now when taking a 512-frame dump, the size on disk may now be around 17MB, compared to 90MB previously.

Re-capture and Save To File Buttons

When opening a capture via HTTP (directly from a mobile device), you will see two new buttons in the top menu. The first button, Re-capture, allows you to take a new capture without issues related to the browser cache (pressing F5 may sometimes show the old page from the cache instead of the new dump). The second button, Save to File, lets you save the open capture as an HTML file while preserving its full name, including the timestamp, without worrying about the browser distorting the original page upon saving.

Improved Browser Scaling Support

Previously, the MicroProfiler interface would break when zooming the page. Now, it remains proportional and usable. This improvement is especially beneficial when demonstrating dumps to others, such as during video calls or presentations.

Frame Time Bottleneck Cause

Some time ago, we introduced color-coded highlighting for frames in the top part of the Profiler to indicate their performance characteristics. The purpose is to distinguish frames categorized as CPU-heavy (orange), GPU-heavy (red), or Render-heavy (blue).

When you hover over a frame, a tooltip is displayed. We have added detailed timing information to these tooltips to clarify how a frame is categorized. The value that most significantly contributes to the frame’s classification is highlighted in red.

Ctrl+F (Cmd+F)

We now have the ability to search for scopes by name using a more familiar hotkey. When you press Ctrl+F on Windows (or Cmd+F on Mac), a search box will appear in the lower-left corner. Type the name of the scope there, press Enter, and you’ll be taken to the instance of the scope that takes up the most time in that dump.

Reference Frame Time Adjustment

When profiling a game on a slow device, the vertical bars representing frame times at the top of the page can sometimes shoot up, making it unclear which frame is faster and which is slower (since technically they’re all slow). Now, if you hover over these bars and scroll your mouse wheel, you can adjust the upper limit of frame time so that all values fit on the screen without being cut off at the top. This is a faster method than navigating through the top menu and selecting Reference → 50 ms (100 ms, etc.).

Group Highlighting

A new option has been added to the top menu: Highlight → [group_names]. When you select a group (like Physics or Script), the scopes that belong to that group will continue to display as usual, while all other scopes will turn gray, reducing distractions. Selecting Highlight → None will return everything to its original state.

Scope Labels Improvements/Fixes

In the past, large captures (256 or 512 frames) with many scripts would occasionally overflow the internal buffer for labels containing script names. This would result in earlier saved frames displaying scope names as “Script_null”, while later frames appeared correctly. We have optimized how this buffer works and this should significantly reduce instances of overflow and ensure all script names are displayed correctly.

Help Tooltip

We’ve introduced a small tooltip that provides basic information about the viewer’s functions. This tooltip appears in the bottom right corner when you open a dump and disappears once the user interacts with the interface, ensuring it doesn’t get in the way.

system · October 24, 2024, 9:36pm

This topic was automatically opened after 10 minutes.

ffrostfall · October 24, 2024, 9:37pm

Do these allocations include C-based allocations (Like creating a new part would be an allocation)?
Will we get a way to isolate a specific tag?
Are there any plans for a Luau-sided API around the microprofiler and the data it collects, for things like analytics?
Will we be able to see all side effects of a specific event in the microprofiler? For example, when a property change occurs, it can fire listeners. We cannot actually see how many listeners there are in the microprofiler, and sometimes it can be very confusing when a raycast occurs due to a CFrame change. It can be hard to track down the causes of those kinds of things

V_ChampionSSR · October 24, 2024, 9:37pm

fantastic and much needed updates to the microprofiler, these changes will really help with narrowing down performance issues

Dekkonot · October 24, 2024, 9:41pm

Funny story, earlier today a coworker noticed that you could diff microprofiler dumps and we were all wondering how long it’s been a feature. Good to know that it’s not been one for very long, so we aren’t crazy!

These changes are all very welcome. Especially the one that makes it not break when you zoom in.

JunisL0L · October 24, 2024, 10:30pm

Oh WOW… this update sounds like a game changer, the addition of detailed memory profiling, flamegraphs, and the ability to compare dumps will make tracking down performance issues way more efficient

benpinpop · October 24, 2024, 11:03pm

Holy-freaking-moly.

I’ve been advocating for improvements to current diagnostic tools for a very long time, so I’m glad to see these changes being made! It’s extremely important that developers have the tools to diagnose what’s causing issues in their places.

When should we expect these changes to be put into official documentation?

romefalls · October 25, 2024, 12:03am

This is a lifesaver and has already proven incredibly useful for me, namely the find feature. Thank you Mr. Engineers.

Rayxu333 · October 25, 2024, 3:29am

great features!

my ears didnt like how the music was louder in one ear than the other D:

arxweb · October 25, 2024, 5:58am

I’m happy this has been added! It will make improving performance way easier and I will use it very often

Steeq · October 25, 2024, 7:18am

Whoever designed these needs be fired, ASAP!

I get it, cool now we can see it for memory, and export, but the way it’s graphed it’s terrible. So hard to read.

Had me looking at it with a facial expression of:

The user experience design and controls, is cool, gotta give props where props is due!

devSparkle · October 25, 2024, 7:55am

I really appreciate this update, but ever since it came out (originally, with the x-rays) all the MicroProfiler logs in one of my games come out like this:

Screenshot 2024-10-25 at 08.54.41

My work depends on this feature, I can’t have it be broken for this long.

PysephDEV · October 25, 2024, 10:12am

Would be super appreciated if the deleteDeferred label would provide the objects that were destroyed; our team had to spend four days identifying the performance leak because the microprofiler would only show a 60ms-long “deleteDeferred” label, with no further information.

ffrostfall · October 25, 2024, 3:02pm

It’s not graphed terribly. Flamegraphs are very common in performance, and microprofilers styled like that are also very common.

Just because you don’t understand it doesn’t mean it’s bad…

CDFII4 · October 25, 2024, 3:18pm

Came to read about new update, left with a great banger

WillyEdison · October 25, 2024, 8:04pm

This new MicroProfiler looks awesome!! But I’ve opened up my game’s place in Studio, and I only see what looks like an older version. All the tabs at the top look very different from the tabs shown on all the new previews. I can’t find the X-Ray tab to try out the new features. I’ve attached an image below of what I see.

Also, the MicroProfiler (at least the version I have) has some tabs obscured on the right side when using device resolution emulator. Here’s an image of what it looks like.

Glaring · October 25, 2024, 9:07pm

This is amazing, I was just talking with my team last week about how nice it would be to have better visibility for memory profiling. Will try this out for sure.

ZenMa1n · October 25, 2024, 11:15pm

Is this related to server dumps or client dumps? If it’s about server dumps, I’m commenting in this thread.
In short: the workaround is to create a server dump, and immediately after the first (non-functional) dump is completed, start recording the second dump right away, and it will be valid.

Regarding when the issue appeared - X-Ray mode was added in July (except for Windows), but I think the bug actually started later?

ZenMa1n · October 25, 2024, 11:35pm

This new MicroProfiler looks awesome!! But I’ve opened up my game’s place in Studio, and I only see what looks like an older version.

All the features described in this update are only implemented when opening captures in the browser. This is mentioned in the full article, but unfortunately, it was removed from the announcement post to make it shorter.

In short:
Press the tiny Dump → NN frames button in the MicroProfiler on-screen UI to save the capture as an HTML file in the same folder as the logs (on Windows, it’s C:\Users\<Username>\AppData\Local\Roblox\logs).

ZenMa1n · October 26, 2024, 12:15am

Do these allocations include C-based allocations (Like creating a new part would be an allocation)?

All memory allocations are displayed, including internal C++ Engine allocations, regardless of whether they occurred independently or were triggered by Lua scripts.

Will we get a way to isolate a specific tag?

Are you suggesting isolating it during the dump creation phase or during the display phase in the on-screen UI or browser UI? If it’s the latter, at least you can currently use Ctrl+F to find the desired tag.

Are there any plans for a Luau-sided API around the microprofiler and the data it collects, for things like analytics?

We are discussing this internally. The foundation of the MicroProfiler was developed with the assumption that it would be continuously enabled only for developers, not for all users, due to the slight overhead it introduces. Adding such an API without changing the foundation would allow us to quickly implement this feature, but we might not be satisfied with the result. Therefore, as a preliminary step, we first need to create an always-on, almost-zero-overhead mode for the MicroProfiler. This is an invisible part of the work but very significant.

Will we be able to see all side effects of a specific event in the microprofiler?

That’s a good idea. We have considered tracking other side effects, such as how much memory a model will ultimately consume upon loading. These tasks can be labor-intensive, but I believe some of them could be implemented. We need to explore this topic further.