Server network/raknet Memory Leak?

Okay, I need to monitor the state of memory for a while so that I can give an accurate result. I’ll try to answer on Saturday or Sunday.

5 Likes

I think that the problem has not been solved, because the memory is still increasing, I recorded the state for some time, and I got this list:

06/01/2023 7:15 AM - Server Runtime: 4.09 hrs, raknet memory: 623 MB, Players: 104
06/01/2023 3:19 PM - Server Runtime: 11.9 hrs, raknet memory: 905 MB, Players: 110
06/01/2023 10:52 PM - Server Runtime: 18.8 hrs, raknet memory: 1133 MB, Players: 105
06/02/2023 1:15 AM - Server Runtime: 20.8 hrs, raknet memory: 1206 MB, Players: 110
06/02/2023 6:46 AM - Server Runtime: 25.5 hrs, raknet memory: 1420 MB, Players: 41

Newer servers experience the same problem

2 Likes

Thanks for providing that. Sorry the patch did not fix the issue for you; we will keep investigating and I will let you know when I have another update!

3 Likes

@pozzzy333 Still investigating. Any idea when you first noticed this issue? And does this issue show up in any other memory categories, or just CoreMemory/network/raknet?

Thank you for your patience!

2 Likes

I noticed this recently, but it seems to me that it has been problem around for a long time, because I noticed performance problems for a long time, but could not know their cause for a long time. I also have problems with UntrackedMemory, I can’t know exactly how it is related to raknet, but UntrackedMemory can reach 4-5 GB and the server will crash after 50-60 hours of runtime

3 Likes

Hi PlumJar,

Thank you for the insight here - my game (link here) has been experiencing a very similar increase (under network/megaReplicationData, network/raknet, and network/replicationCoalescing as well as UntrackedMemory) for the last two years. Servers were crashing at 6.5GB of memory after an hour of playtime. Our solution after a few months of trying to debug it was to increase the server player cap to >600 which gave us 12GB of memory for breathing room.

Here’s the bug report filed at the time: Roblox internal memory leak causes my game servers to undiagnosably crash every 3 hours - Bug Reports / Engine Bugs - Developer Forum | Roblox; considering the pattern here this appears to be related to that issue.

I have noticed, however, that this memory leak doesn’t seem to be passive - instead it looks to be actively caused by some event on the game. I’ve tried narrowing it down to a particular action but haven’t had any luck. @pozzzy333 have you noticed the same? (the rate of increase isn’t constant).

4 Likes

Hi @unix_system Thank you for sharing your report. Sorry it seems the previous time you reported this issue the report went stale without resolution. The more information I can get, the better - I will keep you guys updated

7 Likes

Yes, the growth rate is not constant, and the server may crash at different times, I noticed this thing when I started getting different memory stats after the same amount of time. Most likely it depends on the actions of the players, perhaps some players perform a certain action more times than at other times, most likely these are remote events and remote functions. My game uses a large number of remote events and each player sends and receives them at different times and frequencies. Perhaps there is often synchronization of requests, when the maximum number of players simultaneously send requests, creating mini DDoS, dynamic systems can synchronize with time for a certain period of time. This is a thing of servers with a large number of players. (This is just my guess and may not be accurate.)

3 Likes

Just wanted to post a quick update before the long weekend — I have been unable to repro using some test places I prepared which exercise things like remote events/functions etc, but of course these test places are less sophisticated than your guys’ live games. This, along with the suggestions that the growth rate is not constant lead me to agree that the leak may be related to some specific behavior. Next week I will be monitoring memory stack traces to hopefully see exactly where this memory is going.

Thanks for your patience!

3 Likes

We’re also noticing a similar issue with raknet and untracked memory growing to high levels (although not as pronounced) in our game with 65 player servers;


2 Likes

I’ve also noticed this in my game that has 100 player servers for at least a couple months now. network/raknet is consistently over 1GB on older servers.

Here is a screenshot from a server that is 22 hours old:


2 Likes

I just looked at my old notes where I monitored this issue in November 2021, most likely I met this issue earlier, but I don’t have much information about it, it’s very old

1 Like

Just our of curiosity, does your game use any networking/replication libraries?

1 Like

No, I use basic roblox functions FireServer, InvokeServer, FireClient, FireAllClients.

3 Likes

I think my game is also impacted by this issue. Although I’m not fully able to confirm that this is a network/raknet issue, I know that following the recent Performance tab upgrade to creator dashboard, I was able to assess that suddenly during some of the testing sessions on my game, the server memory suddenly jumps to 6.25 gigs before crashing. All of this is unexpected, there is no particular action performed by the players, it doesn’t seem to happen during a specific event (like, map loading), and I remember taking note for myself in the past that raknet memory was unusually high at some point in live game.

I’m glad to see that our issues with high server memory are finally being addressed, especially when it doesn’t feel like it’s anything in our scope, but I also feel like the hard capped 6.25 gigs is very limiting, especially as games are getting more complex with more maps trying to fully utilize the full capabilities of PBR textures & Future is Bright, every single textures/parts/meshes being cached and never fully freed means that there is a very high memory usage despite only one map being loaded at a time.

Roblox should start to consider increasing that memory cap or periodically partially freeing cached instances (which are, by the way, categorized as UntrackedMemory, making it more difficult for us to find the difference between memory leaks or regular engine caching). I personally feel like my project will soon reach the server limits in term of assets, and that I won’t be able to add any more maps.

5 Likes

Thanks for your guys’ patience. Just wanted to share that there are some patches in the works and more investigations ongoing

8 Likes

Thank you so much!

For some extra information, we’ve observed that the raknet memory is directly correlated to an unexplained increase in PhysicsParts, Signals, and other place memory statistics, particularly when players join. We don’t create any instances other than the standard character model for player joins, and the raknet increase occurs around the same time.

Unfortunately I’m unable to narrow it further, as the increase is only properly identifiable (rather than just incremental) when many players join at the same time (i.e. when a new server starts, or an old server closes with their players mass migrating to a new server). This makes me think there is some overhead from character models when someone spawns which should be removed from memory but aren’t for whatever reason and are being replicated.

As a side pointer our game makes heavy use of HumanoidDescription; I’m unsure if this is correlated and somewhat doubtful as trying successive respawns on already joined players doesn’t seem to have an abnormal increase in memory as opposed to players join.

Hope this helps!

6 Likes

Since in the code I had an artificial delay in processing requests, the memory accumulated weaker, I removed this delay and now all requests are sent instantly, because of which the servers began to crash even more.

3 GB in 4 hours of runtime with 110 players is definitely something new.


In 4 hours of runtime, network/raknet went up by only 480 MB.


I can’t say for sure why UntrakedMemory rises so much, maybe it’s a mistake in my code, or maybe a mistake in the Roblox code, but most likely I use all the functions correctly, they are just used very often and quickly, due to the large number of players.

Hopefully PlumJar’s investigation will figure out what’s wrong. :smiley:

4 Likes

Hey @PozziDev, thanks for sharing. Can you please elaborate on this artificial delay when processing requests? Are these requests RemoteEvents or RemoteFunctions initiated on the Client and answered on the server? What kind of delay did you introduce? Any other details you can share about the kind of work that is being done (processing)?

1 Like

I have a custom FireClients() function that sends FireClient() to all clients listed in the table. It looks something like this

function FireClients(clients, event, ...)
   for _, client in pairs(clients) do
      event:FireClient(client, ...)
   end
end

I did this because I needed to send the request only to certain players, and not to all, as FireAllClients() does. I can assume that frequent use of FireClient() in a loop may leak UntrackedMemory and network/raknet. Due to the peculiarities inside the FireClient() function, it is possible that some variables are initialized in it and are not cleared properly, this is inside the engine and I cannot be sure, it turns out that the engine has to initialize internal variables again and again for the FireClient() function to work while it is called in a loop like for.


The artificial delay is the wait() function in the for loop. It looks something like this

function FireClients(clients, event, ...)
   for _, client in pairs(clients) do
      event:FireClient(client, ...)
      wait() -- artificial delay
   end
end

When the delay works, it causes UntrackedMemory to grow more slowly, and server runtime goes up. I can’t say exactly how much artificial delay effectively stops this problem, but there is clearly a difference, and it depends on the delay.


Perhaps this can be solved with an engine function such as FireCertainClients() for RemoteEvent and RemoteFunction, as is done in TeleportService, but most likely I need to create a post in the feature suggestions section for this. In this case, it seems to me that the problem is related to FireClient() and the frequent call.

3 Likes