Server Crash With 277 Disconnect

Servers in REV// have been infrequently crashing since the game released a few months ago. The issue appears to occur on a time-frame of approximately 1-3 days per server, but as a crash inevitably leads to a loss of player progress this has proven to be a real problem.

I have checked server code for any probable causes of faulty logic causing infinite loop/recursion, checked any remotes which could be exploitable and added rate limiting to any remotes which may use a lot of resources. Memory leaks were my main suspicion, but reviewing the server code for possible leaks does not appear to have had any impact. There is often a few gb of untracked memory but this doesn’t appear to be unusual, and there does not appear to be any gradual server slow down as would be expected from a memory leak.

I would ask if the server crash logs could be checked to see if there is any other possible cause for this issue, as past posts here have previously found unusual engine bugs causing the problem.

19 Likes

The problem comes from the roblox servers then.

Does your game delete vehicles or assemblies that rely on physics / constraints at any time?

I’ve noticed my game SCP: Roleplay has started having the same server crashing problem ever since we implemented a Blackhawk helicopter that would fly to an helipad in the map and get deleted 30 seconds after. The crash doesn’t occur every single time, but as far as user reports go and analytics on the event where the blackhawk is present, we’ve determined that the server crashes always happen when the model is deleted. (unsure why though)

Not much we can do about this as this does seem to be an engine problem (if it wasn’t, then the server would crash every single time). My game has no gradual slowdown either, this all only occurs when the model is deleted.

4 Likes

This is sad to hear :sweat:

I would like to say this, you should heavily consider having auto saving.
Only saving data on player removing doesn’t guarantee it will be saved. If a server crashes, data lost. So I would highly recommend implementing a system, expect if you’re using DS2 on which case I’m pretty sure you can’t do that.

It also might be because of the ‘DDosing’ (not sure if it is that, but exploitation of servers that leads to crashing)

To help the engineers examine your specific case, you should obtain a log file from a server that crashed and send it to the log file group, and link the PM to your post. The engineers need the ip of a server that has crashed so they can determine the cause, whether it be an engine bug, a memory leak, or unusual network activity that would indicate a DDoS attack.

I was using the latest DS2 which wasn’t saving. I switched to a custom system but autosaving often enough to really make much of a difference pretty quickly caused throttling problems

2 Likes

Thanks for the report! We’ve filed a ticket to our internal database and we’ll follow up when we have an update for you.

3 Likes

@MetatableIndex If you can reproduce the issue and believe it’s an engine bug, please file a report!

2 Likes

I have the same problem with error 277 Disconnect problem in Naval Warfare - Roblox . 100% of Server will crash and kick everyone with 277 disconnect after around 40min-2 hour of server being created. The game average play time has decline to 11 minutes from 14 minutes because of this. I’ve tried optimizing my codes but nothing changes. There is no gradual increase in lag or ping prior to the disconnection. This is not cause by an exploit or scripts from the game, I was in the game recording the file log until the error 277 disconnection. This has been going on for over a month now.

Edit: I just found out it was a memory leak problem, the memory keep getting larger over time.

Funny. For the past 2 days I’ve been getting back into Plane Crazy and I can’t play for 5 minutes without getting 277. I noticed in the Dev Console that it puts a wait on something, then once it hits its limit, it kicks me. It only happens to me and so far none of my friends have experienced this.

Edit: My ping also spikes before it happens, averaging at 750 before a crash.

1 Like

I’m very interested when you say you recorded the file log - did you manage to record the server memory too?

I was informed that the crash in REV is a memory issue, so it may well just be a memory leak on my end, but it doesn’t confirm whether there’s a slow leak or something causing a sudden increase.

1 Like

I didn’t check the server memory before making that post, but it was a memory leak on my part.

1 Like

I found the code causing the memory leak that led to the error 277 disconnection in Naval Warfare - Roblox, however I can’t fix it no matter what I tried.

The game has a lot of AA gun being rapidly fired, and
here is a minimal example code stimulating that.

local Debris = game:GetService("Debris")

local bullet = Instance.new("Part")

for i = 1,30 do
	spawn(function()
		while wait(0) do
			local partCopy = bullet:Clone()
			Debris:AddItem(partCopy,0.5)
			partCopy.Touched:Connect(function()end) -- This line causes memory leak
			partCopy.Parent = workspace
		end
	end)
end

I ran this code as a server script in an empty baseplate and found out that it will cause the server to use up more memory overtime and eventually crashed the server, but if you removed the touched:connection line there’s no memory leak and everything works fine, but a bullet needed that touch connection. Can someone help me on how to fix this code to avoid memory leak problem?

1 Like

Memory leaks should not really slow down your game afaik, at least, not much, memory does not have a CPU time cost unless there is code accessing or modifying that memory. A memory leak is just memory that isn’t being collected by the GC for some reason, so, there’s not really much CPU cost, at least, not anything that should be noticeable or consistent.

Most memory leaks will introduce some small cost, since the GC is checking if it can collect that stuff, and, most memory leaks are caused by it being unsafe to delete data because its considered in use. The garbage collector only periodically checks values, I believe about once every 5-10 seconds last time I checked, so, you’d really only see a small increase in frame time every 5-10 seconds and probably spread out over a second or so.

Several GBs of untracked memory is not normal afaik, a typical amount of untracked memory should be under a few hundred MBs at most, and, that’s consistent across a few large games I have worked on.

Error code 277 refers to DisconnectConnectionLost on the ConnectionError Enum page, in otherwords, this would imply a hard crash on the server side. (P.s. this Enum page is surprisingly not well known about, it actually gives some really good insight into different error codes and gives you a lot of useful context)

If I had to guess, your problems are likely due to a memory related crash. I know on OSes like Windows when the system runs out of memory programs will be silently and instantly terminated. (If you’ve ever seen an out of memory error, iirc this is actually done by the program, and, is different than the system running out of memory. Java will allocate a set amount of memory, and, once that set amount of memory runs out it will give you an out of memory error. I believe it also refuses to allocate too much memory once the system gets to a certain point)

@MetatableIndex

This isn’t actually true, I believe you’re misunderstanding the bug report you linked (or maybe you just worded it wrong)


:Destroy() does disconnect all connections, you can test this with a BindableEvent easily:

local bindable = Instance.new("BindableEvent")

bindable.Event:Connect(print)
bindable:Fire("Before destroy")

bindable:Destroy()

bindable:Fire("After destroy")

The problem discussed in the linked post is that when :Destroy()ing an object from the server it only replicates the ancestry change (so its like setting Parent = nil), meaning on the client the object isn’t cleaned up. This is actually a super interesting issue that I didn’t think about.

Additionally, after thoroughly investigating @l_11I’s code I determined its probably not actually a memory leak related to the touched connection not being disconnected, rather its actually a memory leak due to the use of Debris (its a direct consequence of the thread scheduler being completely overloaded). If you remove the Touched connection the memory leak definitely still happens its just about 60-70x slower because when using it an extra instance (TouchTransmitter) is created and the connection itself, the physics data, and the data extra data the TouchTransmitter is adding for replication stuff is a lot relatively speaking. You are creating about 450 parts, and, doing 450 schedules per second. That is a lot of schedules per second.

The following code still produces the memory leak:

local Debris = game:GetService("Debris")

local bullet = Instance.new("Part")

for i = 1,30 do
	spawn(function()
		while wait(0) do
			local partCopy = bullet:Clone()
			Debris:AddItem(partCopy,0)
			--partCopy.Touched:Connect(function()end) -- This line causes memory leak
			partCopy.Parent = workspace
			--partCopy:Destroy()
			--print(#workspace:GetChildren())
		end
	end)
end

The following code does not:

local Debris = game:GetService("Debris")

local bullet = Instance.new("Part")

for i = 1,30 do
	spawn(function()
		while wait(0) do
			local partCopy = bullet:Clone()
			--Debris:AddItem(partCopy,0)
			partCopy.Touched:Connect(function()end) -- This line causes memory leak
			partCopy.Parent = workspace
			partCopy:Destroy()
			--print(#workspace:GetChildren())
		end
	end)
end

And finally, this code does not either (I use coroutines because they do not rely on the thread scheduler like spawn which would make it extremely slow but also the thread scheduler appears to be the underlying issue):

local Debris = game:GetService("Debris")

local bullet = Instance.new("Part")

for i = 1,30 do
	spawn(function()
		while wait(0) do
			local partCopy = bullet:Clone()
			--Debris:AddItem(partCopy,0)
			coroutine.resume(coroutine.create(function()
				wait()
				partCopy:Destroy()
			end))
			partCopy.Touched:Connect(function()end) -- This line causes memory leak
			partCopy.Parent = workspace
			--partCopy:Destroy()
			--print(#workspace:GetChildren())
		end
	end)
end

If you try the prior snippet with spawn you will notice the memory leak reappear. (You also have to remove the wait() call for consistent speed to the Debris example because spawn has a minimum delay, another reason to not use it or other thread scheduler based things).

Additionally, wait relies on the thread scheduler. I went into detail on why using wait for non-delay related things is bad in another post somewhere but I’ll summarize it here. Basically, wait is a lot more lightweight than spawn or in this case Debris and its synchronous with your script so its not as bad, but, wait can still throttle and utilizing it for non delay related code (e.g. an infinite loop) can actually cause gradually worsening lag and memory problems as it slowly throttles more and more.

You should only use wait (and thread scheduler things) for what they’re meant for: delays. You can use wait for quite a few loops before you will start seeing any problems, and, its about 30x less impactful per second you’re waiting for when used in a loop, but, its not great to do that, and, too many loops means you’ll hit a point where your game starts getting inexplicably laggy and inexplicably memory intensive on the server (or client).

Instead of while wait() do use one of the RunService events instead. For example, RunService.Heartbeat:Wait(). Despite being a loop that’s twice as fast, the impact it has is significantly less than that of wait when used in bulk.

That goes for spawn too.

The complexity of thread scheduler use scales up very quickly, unlike using Heartbeat. Additionally, you’ll even see that your game is smoother when not using wait() because Heartbeat happens before data is sent to the client from the server, its kind of like RenderStepped, except for data going to the client.

Lastly, here’s a list of things that use the thread scheduler and should be avoided in bulk (Using these is not a problem, using them a lot is):
wait - Not too impactful since its fairly lightweight, its just pertaining to the thread it was called from. Can cause really bad throttling issues in bulk (hundreds of wait calls happening at once per second).
spawn & delay - Very impactful. Both are basically equivalent. This creates an extra thread, and for some reason is just ridiculously slow. It has a huge impact on throttling, probably the worst.
Debris - I’m not really sure what the impact of this is but based on my testing it seems to be roughly 0.7x as impactful as spawn and delay (Based on the memory usage). It likely just doesn’t create threads internally, and, if it doesn’t, probably skips some unnecessary stuff related ot lua contexts or something.

Certain things like wait, spawn, and delay (and probably Debris since it appears to behave very close to delay) will throttle, meaning usage of them creates “fake” lag.

Additional information:
Task Scheduler (roblox.com)

8 Likes

This happens in my game where 18 NASCAR’s fall of the map at once. (Therefore deleted) Over and over every 5-8 minutes.

The 227 disconnected message is getting annoying and leading to less playtime.

Maybe it is connected to deleting vehicles. Possibly a memory leak.

The group Pinewood Builders which I moderate for has had these crashes too and they’re getting quite frequent. We believe it is a DDOS as the server ping goes to 50k+ then the chat filter fails and after awhile the server crashes, we’ve found no other cause to this problem but it always seems to happen when we host an event so someone could be targeting game servers. Hopefully this can be fixed as soon as possible as it’s getting quite annoying.

1 Like

Happening at my group game as well. Been going on for possibly 2 years now. ROBLOX has still not put out a solution to the ddos attacks on their server s.