Roblox internal memory leak causes my game servers to undiagnosably crash every 3 hours

Reproduction Steps
I have spent the last few months trying to narrow down the problem to a script in our game without much luck; we have rewritten about 50% of our codebase in search of this issue and to no result.

SCI - Pathos III - Roblox - The issue occurs here in our live game, where servers reach between 6GB before crashing, and occasionally get the cap increased to 12GB.

Expected Behavior
Servers should either not crash, or if there is a script-caused memory leak that memory leak should be attributed to a specific script label in the developer console.

Actual Behavior
13 minutes into game server start:

Our in-game graphing for server memory and nil instance count over time
Vt7sFUt.gif (1154Ɨ436) (imgur.com)

Screenshot of Untracked Memory

PlayFab statistics for server lifetime, with Server Memory on the Y axis and time in the X. The key is each JobId.

The obvious pattern here is, according to some unknown factor (we have tested against player count, joins, deaths, respawns, vehicle spawns, streaming loads, weaponry, remote traffic and all seem to have no correlation to the rate of increase) the server memory sharply increases to some point, and then either stops for a while or continues up eventually hitting the ~6.25GB limit for Roblox servers before they crash.

Screenshot of our server memory

How I calculate Nil Instances
local function getNilInstances()
	return stats().InstanceCount - #game:GetDescendants()
end

This gets the number of game descendant objects and compares it to the stats() instance count, to identify the number of non-existent children.

The memory is distributed, apparently randomly between Untracked (implying a memory leak somewhere on our end), network/megaReplicationData (which I have been unable to find any information on), internal/RCCService, network/replicator and network/raknet

Memory Profile of game:
memProfStorage44428.json (5.6 KB)
d
Our game has approximately 100,000 parts (fluctuates due to building needs) - this is obviously very large, however if this was to blame Iā€™d expect the memory to stay at a constant level rather than constantly increasing.

Issue Area: Engine
Issue Type: Crashing
Impact: Very High
Frequency: Constantly
Date First Experienced: 2021-05-01 00:05:00 (+01:00)
Date Last Experienced: 2021-09-10 00:09:00 (+01:00)
A private message is associated with this bug report

52 Likes

I would like to provide additional information

By looking through the forums i managed to solve the issue of ā€˜nil instancesā€™ increasing over time by calling game.Debris:AddItem(Player, Time) connected to the PlayerRemoving event, but unfortunately did nothing to solve the memory steadily increasing.

the fact that your nil instances never scales down overtime on average is also a bad sign,
this could indicate your code is holding reference to that Instance object in memory, or when calculating with your function to get nil instances count, that the engine internally causes this leak to track instances that shouldā€™ve been garbage collected but canā€™t because those instances are now being referenced and the collector will not touch the instances increasing memory overtime.
This is pure speculation and I do not understand anything internally in Roblox but if someone can get an idea off of this and go further that is very much appreciated.

4 Likes

Thanks for the report! Weā€™ve filed a ticket to our internal database and weā€™ll follow up when we have an update for you.

5 Likes

Some additional information I would like to provide

I created a test place that acts as a stress test for memory by having a script create 100k parts inside of lighting every ~25 seconds, followed by clearing lighting a few seconds later, all in a loop, and this was the results i got:


Memory stats sourced from custom made memory tracker that is also located in the place

As seen in the graph, the total memory almost never went down while on the other hand, the untracked memory accumulated with almost every loop.

The place: BUG REPORTING - Roblox

3 Likes

Seems suspiciously similar to whatā€™s been ruining one of my games?

Memory rises initially as things are loaded up on the server, then plateaus between sudden large jumps.
Thereā€™s no correlation between player count or any in-game event that I could find, but havenā€™t gone as in-depth as you. I just canā€™t figure out how I can possibly be triggering sudden gb spikes in server memory within the span of a minute.

4 Likes

Yep this is pretty much the exact profile I get too. The closest I could tie it to is player respawns, however I havenā€™t been able to narrow it down past that. Part of my problem was obscured by a now fixed HumanoidDescription memory leak.

Have you tried plotting respawns against server memory?

5 Likes

Itā€™s funny you should say that actually, because I asked @Orlando777 to try and find any clues as to what was causing the memory leak, and he said it happens when people reset. Thing is I couldnā€™t replicate it and there just was very little code going on with character death and loading which was easily reviewed and ruled out, so I wrote it off as coincidence.

Iā€™ll put in some code to plot respawns and memory when I have some time and report back.

5 Likes

Done some plotting long term plotting of the various memory items against up time for one of my places that has been mainly affected by this.

Max Players: 50

In all three instances, there is one or more sudden spikes in Core Memory that climbs uncontrollably out of control and for no explainable reason, increasing by several hundred mb before sharply falling to roughly its previous level, with the total and untracked memory retaining some of that amount, never falling.

Having been fortunate enough to be in the first server when the memory spike began, I was able to determine that the increase is almost solely caused by network/replicator:

It is odd that it increases so rapidly despite everything else non-core related remaining the same, instances especially.

5 Likes

I believe the same thing has been happening to my game constantly for the past few months. I posted about it before and thought I had figured out, but entire servers are continuing to crash due to random spikes of untracked memory just as you say.

Itā€™s also incredibly hard for me to replicate as well.

2 Likes

Any update to this reported bug at this time?

Our games are still suffering from this unsolvable and unfixable bug

2 Likes

If you want a (temporary) fix - max out your game at 700 players and then limit it in a custom server browser. That raises your server memory to 12GB and this has mitigated the issue for us.

7 Likes

Are there any updates on this? I was about to start a new thread but figured Iā€™d add here as our symptoms sound very similar.
We went live with our game a month ago, and weā€™ve been trying to track down the server crashes with no luck. Same thing - nothing else seems to be leaking, but Internal/Total Memory increases and has huge unexplainable spikes. When Total Memory reaches 6GB it eventually crashes.

Some servers have uptime as low as an hour, others may stay up for a whole day.

Our game is here: Wings-of-Fire-Seven-Thrones

We have custom character models (skinned mesh dragons), it is a large world, but it is not a building game so thereā€™s no accumulation of parts over time.
However, there is a constant flux of parts (fire breathing, items, etc.) that get spawned and destroyed (usually using Debris service). It is a survival / pvp game so characters will get killed and respawned.

@unix_system - could you explain your workaround on how to get the 12GB servers? Is that only for private server games, as ours is free-to-play public.

If anyone or any staff can give some guidance on what the internal memory leak/spikes may be, it would help us understand what may be going on. If this is in fact a confirmed Roblox leak any information on its status or how to work around it would be helpful too.
Thanks.

image

Figure 1 - Each line represents a unique JobId and is tracking the Internal Memory value as obtained with:
game:GetService(ā€œStatsā€):GetMemoryUsageMbForTag ( Enum.DeveloperMemoryTag.Internal )

image

Figure 2 - Server Total Memroy. Same timeframe, server Total Memory from Stats:GetTotalMemoryUsageMb()
(Lines that reach right edge are still up)

image

Figure 3 - Players per server over same timeframe (x-axis is just UTC on this one)

3 Likes

This is exactly the same profile I am experiencing. We have not had any resolution on it (but getting our server memory capped at 12GB has at least delayed the issue so we can get a good 6-12 hours of server uptime)

Hi gigagiele. Could we please get a quick status update on this ticket? Even just knowing whether or not this is being looked at would be helpful! The crashes have been very frustrating for both the team and the user base. Any technical information you could provide would be greatly appreciated to help understand what may be the nature of the leak. I donā€™t mind adjusting to workaround it, but I feel like Iā€™m taking shots in the dark at the moment.
Thanks

7 Likes

Adding my voice to this thread since Iā€™m experiencing similar symptoms.
My game Dragon Blade also runs for a few hours then crashes. I am yet unable to pinpoint the cause but I also see over 1Gb used by the ā€œmegaReplicationDataā€ in particular, and 2GB+ of CoreMemory.
My game also uses custom skinned-mesh avatars, and features a very large smooth terrain (8k by 8x studs).
Any update on this would be very helpful. At the moment the server can run from 3-5 hours before crashing.

5 Likes

did you guys ever figure out this problem? Iā€™ve been reading through the forums trying to get an answer to this long-standing issue with my game and saw your posts in this thread

4 Likes

Still waiting on an update to this myself.

3 Likes

Can confirm this is also happening to me.

4 Likes

This is happening to me as well. Untracked memory reaches 3GB+ and touchReplication is at 1GB+ for some reason.

5 Likes