My game Apocalypse Rising 2 (Apocalypse Rising 2 - Roblox) has had a massive increase in client crash reports in the last 24 hours. We have not update our game since the 6th of November and we believe this is a Roblox change causing the issue.
Our developer stats confirm the “can only play for about 15 minutes” user stories we’re receiving from our players:
Our game has historically run very high memory usage wise and has been prone to memory leaks in the past that have caused crashing. Usually this doesn’t happen to our players until the ~1 hour mark.
Expected Behavior
We expect players to not crash within 15 minutes of pressing play
Actual Behavior
Players are crashing within ~15 minutes of playing from out of memory issues. Users who have been able to record their sessions report “Out of memory” errors in the client log and a number visually related of out of memory issues (invisible meshes, textures not loading, game features breaking).
Workaround
No known workaround, we cannot diagnose the issue ourselves.
Issue Area: Engine Issue Type: Crashing Impact: Critical Frequency: Constantly Date First Experienced: 2022-11-09 00:11:00 (-05:00) Date Last Experienced: 2022-11-10 00:11:00 (-05:00)
At ~4:50PM EST we set EnableDynamicHeads and LoadCharacterLayeredClothing to Disabled from Default in an attempt to see if crashing persists. Our game does not use either feature and we don’t have any reason to assume they’re related to crashing. I will update this post if we see an improvement at all.
Update @ 8:17PM EST, no improvement after setting EnableDynamicHeads and LoadCharacterLayeredClothing to Disabled
We will be taking steps to reduce memory usage over the next 24 hours. We’re not sure how this will affect this issue on the Roblox side of it. The version of the game that is crashing can be privately made available upon request.
I work for a game called Pembroke Pines and we have had the exact same issue with a random increase in crash rates. We have been forced to lower the player count per server until we fix the issue.
When should we start seeing improvements? After ~19 hours we’re still getting a lot of crash reports and our average play time hasn’t recovered at all.
We’ll be testing the changes we’ve made to the game since posting this bug report to confirm if it’s an issue on our end or not - I’ll update this post once we have results.
Update: We tested the build that was live when this bug report was made and we still get crashes after the ~13 minute mark. Running the same test with out updated production build gives us similar results.
It doesn’t appear that the rollback was effective in solving the issue for us. If anything it’s getting worse:
We are still in progress - I hope to have narrowed down the specific change that caused the regression shortly. The instance/gui and instance/object counts are the culprits here.
We found the root cause but it will require a patch to fix. What I’ve noticed is that it will take longer to crash the less players are in a server. I’ll see if we can find a better strategy to use.
We have narrowed down the cause. This crash increase is caused by a memory leak. The client crashes due to running out of memory. This memory leak happens when an Instance is destroyed from an event handler attached to one of its own Events.
We have found a temporary workaround: attaching and immediately disconnecting a dummy event handler from the same Event prior to destroying the Instance will prevent the leak.
Here is a minimal reproduction of the issue, along with a demo of the workaround: Event leak demo - Roblox
Here’s how to use it:
Launch the place using either the PC or Mac client.
Open the Micro Profiler: Roblox button → Settings → Micro Profiler → On
In the Micro Profiler menu in the top left, hover over “Mode” and click “Counters”.
Expand/watch the counter under memory/instance/gui. (You can view a graph by left-clicking twice on the right side of the “gui” row.)
Click the “Leak 500” button to reproduce the problem. You’ll notice that the counter increases and does not decrease even after waiting a bit.
Click the “Leak 500 w/ workaround” button to test the same code with the workaround inserted. Notice that the memory counter temporarily increases, but eventually decreases after waiting a bit.
Here’s what the Micro Profiler should look like after pressing the “Leak 500” button a few times:
This place is copy-enabled, so you can open it in Studio to see what’s going on. Here’s the code that’s used for the “workaround” version:
local function leak2()
local testButton = Instance.new("TextButton")
-- These 500 connections are never called, they're just here to make the leak worse.
for i=1, 500 do
testButton.Changed:Connect(function ()
print("Not printed")
end)
end
-- This is where the leak actually happens.
testButton.Changed:Connect(function ()
print("Destroying testButton")
-- Begin workaround
-- Note that the Event used here (Changed) is the same as the one we're subscribed to.
local c = testButton.Changed:Connect(function()
assert(false, "Not reached")
end)
c:Disconnect()
-- End workaround
testButton:Destroy() -- Without the above workaround, this would leak all event handlers.
print("Button destroyed")
end)
-- Trigger testButton:Changed
testButton.Text = "Delete me"
end
We are working on an engine change to resolve this leak, but expect it to take a while longer to finish.
While investigating this, we also found this to be the most efficient set of steps to reproduce this issue in your Apocalypse Rising 2 experience, @LMH_Hutch:
Join an AR2 instance with at least 25 players. (Fewer players work too, but it seems to worsen with more players.)
Open the Micro Profiler counters view and expand memory/instance/gui (so you can see when the issue is occurring).
Hit “Join”.
Run around in the world for a couple seconds.
After completing these steps, you can watch the memory/instance/gui graph continue to grow until the client eventually runs out of memory and crashes. Here’s what the graph looks like standing idle for a while:
This issue also reproduces in Studio, so you can use Studio to test implementing the workaround.
We will be patching the engine to resolve this leak, but we expect the patch to take a while to be released. This workaround is only intended to be temporary and can be removed once the engine is patched.
This issue happens with all object types and all events. It happens when any event handler destroys its parent object.
The event you connect/disconnect from in the workaround should be the same event that you’re about to destroy the object from. You don’t need to connect/disconnect all events on the object: just the one that you’re about to destroy it from. For example:
local button = script.Parent -- assume it's a TextButton
button.Changed:Connect(function ()
print("Button changed") -- doesn't destroy the button; no workaround needed.
end)
button.MouseButton1Click:Connect(function ()
print("Clicked")
button:Destroy() -- this handler destroys its parent, so it needs to have the workaround
end)
And in this case, since the MouseButton1Click Event is the one calling Destroy(), the workaround would look like this:
-- Edited version of the above event handler to apply the workaround.
button.MouseButton1Click:Connect(function ()
print("Clicked")
-- Connect to the same event as this handler.
connection = button.MouseButton1Click:Connect(function ()
-- This function isn't ever called, you don't really need anything here.
assert(false, "Not reached")
end)
connection:Disconnect()
button:Destroy()
end)
You can even make the workaround a one-liner if you want: