Server OOM events causal to nil Instance references

We’ve investigated reports of in-game features like drinking water and fighting other players in our game suddenly not working. After much debugging, we found a correlation between server memory usage, server uptime, and if/how the garbage collector is recycling memory.

In two examples, respectively, the server has lost references to Models and RemoteEvents that are crucial to these features. Both being cases where servers are over 6 days old, and memory usage is stated to be 5.5GB (out of 6.25GB). Based on the analytics provided by Roblox in our dashboard, we can see that our server memory consistently reaches the cap at 6.25GB after a server is alive for around a week. We are left to assume that in these scenarios, servers are releasing memory to prevent an all-out crash.

The way we’ve tested this is:

  1. Localize the system(s) that players are having problems with.
  2. Connect to the remote in question (it still exists in the DOM, the server has just lost the reference to it) and print any contact from clients via the developer console.
  3. Insert a LocalScript uploaded as a website Model into our PlayerGui from the developer console and contact that remote, sending the lost Instance in the RemoteEvent and attempting to print it on the server.

In the scenario where the Model was observed as lost, but the remote was left intact, firing; the model reference prints as nil on the server, despite being identical to the client’s copy, having never been streamed out. It is worth noting again that this game system worked fine for 6 days of uptime until the server started exhibiting this behavior.

In the scenario where the RemoteEvent was observed as lost, the client would fire to the RemoteEvent and the server would not receive any contact, as if the RBXScriptSignal had been disconnected.

This issue occurs in:

Expected behavior

Understanding the complexity of memory management and recycling that the garbage collector has to do, I am hoping for heuristics to be developed that help the engine determine what Instance references to keep vs. what to drop when garbage collection has to be done to prevent a server from crashing. At the very least, do not lose references to remotes or models.

A private message is associated with this bug report

5 Likes

Thanks for the report! Just to confirm, we have a ticket filed and we’ll follow up when we have an update for you.

1 Like

Use Introducing the Luau VM memory exploration tools to help you find memory leaks on the server (try in Studio/private RCC first!).

When there’s not enough memory on the server, before it just crashes there might be issues where there’s not enough memory to perform many kinds of operations.