Analytics: Diagnose server crashes with server memory snapshots


Hi Creators,

Today, we’re excited to launch server memory snapshots to help you identify the root cause of crashes and improve your experience’s stability. You can find the new page in Creator Dashboard for your experiences under Monitoring > Crashes.

Here’s what’s new:

Server Crashes

We added a new crash count chart to give you visibility into your server crashes. Game servers crash for two primary reasons:

  1. Out-of-Memory crashes: These are errors you can take direct action to stop. A spike in this line means servers are crashing due to high memory usage in your experience. This post explains how to identify and diagnose these root causes using the memory snapshots and provides links to more information on how to further debug and fix them in Studio.
  2. Platform crashes: These crashes are much less frequent, and unfortunately, they’re rarely directly actionable by a creator. They occur when the server enters an unrecoverable “bad” state, often due to engine-level errors. We work hard to monitor and automatically fix these errors platform-wide, but if you notice a significant uptick in these crashes, please help us out by filing a bug report so our team can investigate and resolve the issue faster.

Data is available for the most recent 7 days for games with 100+ DAU. If you see “No data for selected period”, that means you had 0 server crashes in that date range.

Server Out-of-Memory Snapshots

When a server detects it is running out of memory, the engine automatically captures a compact summary of the DataModel before the server shuts down. This summary lets you see where the memory is going so that you can identify and diagnose unexpectedly-large memory consumers.

Example Workflow: From Crash to Fix

Here’s a quick example of how you can use the above dashboard to find and fix an acute memory spike:

1. Identify the spike: You notice Out-of-Memory crashes spiking on your server crashes chart. You filter by Place version to confirm the crashes started immediately after your recent update.

2. Select a snapshot: You scroll down to the memory snapshots and examine the list by latest timestamp or ‘Server uptime.’ You select a snapshot from a server that crashed within minutes of starting up, which suggests a massive, acute memory spike during initialization.

3. Visualize the memory: The treemap shows nodes weighted by memory usage. You click into each node to see which services, folders, or assets dominated memory at crash time. Opening the snapshot in the treemap viewer highlights Clouds as the largest node. Clicking into Clouds shows that ParticleEmitter makes up 50% of the Clouds node. This indicates that heavy assets are being cloned into memory at startup but not properly parented or destroyed.

5_example3

4. Investigate and fix: You use the viewer’s breadcrumbs to trace the context of the bloated node. Since certain assets are dynamically generated at runtime, you use this path as a starting point to locate the VFX system in Studio’s Explorer, add cleanup logic, and monitor your next update to verify the fix.

You can also download the raw CSV summary of the memory snapshots for your own analysis, including feeding the data into your favorite LLM or writing your own parsing scripts.

We hope these changes give you a better understanding of your server performance. Please let us know in the comments if you have any feedback or questions.


FAQs

What about client memory snapshots?

  • We hear you! We are actively exploring similar diagnostic tools for client-side memory but prioritized server memory first to help you address widespread stability and session disconnect issues.

Are platform crashes my fault or Roblox’s?

  • These are Roblox’s engine-level bugs where the server intentionally crashes to capture a report. If you see a sharp, sudden spike or a high sustained volume of runtime errors, file a bug report.

Where can I learn more about managing memory and reducing OOM crashes?

Do high server-side crashes mean high client-side crashes too?

  • Not necessarily. Server Out-of-Memory crashes happen when the game server exceeds its memory limits, disconnecting players from the session. Client crashes happen locally on a player’s device, though heavy assets can negatively impact both.
151 Likes

This topic was automatically opened after 10 minutes.

Insanely useful tool, you’re a life saver we were just tracking a memory leak just now

20 Likes

I just saw this pop up in my analytics! Thank you roblox! This feature has been needed for a long time.

2 Likes

Can we get better client crash analytics now? :face_holding_back_tears:

11 Likes

get this guy a truth nuke! :pleading_face: (chars chars chars)

3 Likes

Can we get the ability to see a luau memory snapshot too? In our game every server crash we have seems to be with scripts because instances make up almost nothing in memory usage

7 Likes

Amazing tool! Very helpful to be able to actually get a grip on how frequent these crashes are and have some tools to diagnose.

I want to echo this request, right now the tool is incomplete without including the memory being used by scripts. This could cause teams to waste a lot of effort optimizing instances when some script related memory leak is dominating as the causal factor.

It would also be great if we could have some mechanism to capture these data snapshots manually, obviously heavily rate limited. That way we can more easily compare what a crashed server looks like vs a server that may be halfway through its lifespan to see what specific aspect of memory is growing.

10 Likes

Amazing job to the analytics team, this is a great addition. But please add more insight into client sided crashes as these are more important and more likely to happen. There’s not enough tools to properly debug client sided crashes atm

1 Like

I’m not a Roblox engineer but I have worked on a game engine before with memory tracking tools, I imagine it’d be quite difficult for Roblox to track memory on the client because in order to track that memory, you need to use RAM to store it. In our case we didn’t care very much since we were only targeting 4GB+ devices, but Roblox targets low end mobile devices with ~1.3GB of addressable memory which can make it difficult to poll memory usage. In fact I think I recall hearing that they actually disabled memory tracking on lower end devices recently for precisely this reason.

Also, great addition, nice to see more analytics whenever I’ve done something wrong! :sweat_smile:

5 Likes

i thought that was noelle deltarune for a second am i insane
On topic i think this is cool


when the crash report crashes…

1 Like

You’re a lifesaver. Roblox needed this update.


I guess this is fine but my current issue when trying to debug an engine issue for Roblox (or see if my own system are messing up) is that the microprofiler buffer just fills up and there’s nothing to dump when this happens.

1 Like

I would really like being able to force a snapshot manually. Its the best memory visualizer I ever seen on roblox and It would help a lot being able to manually trigger it!

4 Likes

U give us so much data for free :smiling_face_with_three_hearts:

This is a lifesaver. Hunting down silent leaks that slowly eat up server memory until an OOM crash has always been blind trial and error. Getting an exact snapshot of the bloated nodes right at the time of the crash completely changes the debugging workflow. Great update.

1 Like

Are these the “Unknown” and “RobloxMaintenance” crashes?

Awesome update btw

1 Like

Yes this is perfect and what we needed!

1 Like

Hi HumanCat222, Those look like close reasons which are different than crashes. Generally for crashes you wont get any indications of the crash or a specific reason.

1 Like