Server stops 'working' after ~2 min - no error message or anything on client

Around yesterday my place started consistently breaking after 2 minutes of being up.

The symptoms are as follows:

  1. Physics stops moving
  2. RemoteEvents/Functions stop responding (or alternatively, whatever they do respond, never gets to the client)
  3. Player GUIs disappear
  4. Player instances disappear (stuff inside the Player object, like some numbervalues I have)
  5. Cannot see server log or script stats (the pages are empty)
  6. The client receives no error message of any kind, and I can still turn the camera. But cannot move or do anything else, really.

Those dont happen all at once, but over the course of a few seconds (I first notice not being able to deselect some tool for my custom tool system or something subtle like that, then the other stuff happens)

Its this place (friends only :>):

(Thats the lobby, you have to join a server through it)

The place works fine in start server + start player in studio.
The only changes I recall doing is adding another 2 sound effects, changing a small table from a regular one to a weak table, and some basic reordering of script flow. None of which should suddenly cause a mysterious server death after 2 minutes in online mode exclusively.

The server is reserved using ReserveServer then I teleport player there.

Can you make a minimal reproduction place? It’s pretty difficult to debug something like this in a complex place.

Ill try disabling features to see if it starts working

Honestly, that’s probably your best bet if there’s no obvious error coming out. I’ll let you know if I think of anything in particular that might cause this.

last I had this happen it was because I had a gui that made copies of a few image frames and I never had it delete them

I didn’t notice because they slid off screen

it took my game a bit longer to freeze but basically the same thing happened

If that were the case, I would expect to see a rise in the number of instances over time, which is not happening.

I tried looking into the server stats before it goes down, but could not see some sudden rise or anything else suspicious, and I also print lua memory use along with size of some of my tables, those are fine as well (might expand the coverage and rate of logging there if nothing else works).

I put it in a separate place without lobby, and NOW it gives me a proper red “You have been disconnected” message. Could that be indicative of something? (or at least a bug since it doesnt show up for the actual place?)
@0xBAADF00D

Can confirm, being kicked in a place that is not the start place, doesn’t show the Shutdown/Disconnected gui, but when kicked at the start place, the Shutdown message is shown.

So what is happing for you is you are being disconnected, but since the place is not the start place, there is no indication that you lost connection.

Ok I think I narrowed it down to a single event @0xBAADF00D

I spawn some fish that then swim around, and once they reach a certain distance from origin, I parent the fish to nil and remove its data from a table:

fishData[fish]=nil
fish.Parent=nil

The place freezes when the first fish goes far enough and is removed (actually it managed to print that two fish are removed, so there is at least a tiny delay).

Now, as I said, I had changed a table (that stuff, including the fish, are as keys in) to a weak table. Now it seems, if I change it back to a regular table, that fixes the issue.

So its something about the weak table causing issues when the only remaining reference to an instance is a key in the weak table (which should result in that key being removed/collected automatically, if I understood how weak tables work).

Because of the delay in removing fish and the place freezing, it could be that it happens when Lua does its garbage collection. Or it could just happen next frame from the removal of fish.

EDIT:
Do weak tables work if the value contains references to the child instances of the key? Or does such a cycle prevent it from being collected? Thats not the case here (it only happens in specific conditions), but I might need to reorganize things a bit otherwise.

Also I have another weak table with part keys and that hasnt had any issues…

The fish have a bodyvelocity and a bodygyro, and theyre unionparts (box collision fidelity), and they have a bunch of decals on them. If that changes anything.

Just a follow-up:
Within past 1-2 days I added another weak table and that crashed/froze server too.
But today, enabling both the weak tables, or only the original that caused issues, Im no longer experiencing the bug.
EDIT: NVM it just takes few minutes longer. Server still breaks.

Looking at what separated the problematic weak tables from the one that worked fine, the problematic ones had map-style tables as values, while the working one had a simple array. Also the problematic ones had function-type values in their table, and those functions referenced ‘captured’ variables outside the function itself (idk what the Lua terminology is).

Was there a recent update/bugfix that mightve solved this?