Roblox servers being unresponsive to all clients ingame

The issue is being with either the Roblox client not receiving any Roblox Server network updates anymore or some new undetected exploit script that is throttling the server’s network ability to update to all the clients ingame.

This causes the game to look like for all the clients as ‘frozen’. People who are walking are playing their walking animation staying still, all the chat’s are frozen, players that are trying to join the server meet the ‘Joining game’ page forever.

This has been happening a lot on servers with max players set over 100 since it’s intended for everyone to be in a single server anyway.

The following games I’ve seen this issue happen on.
Main Campus - Roblox
🧚Brooklyn, New York - Roblox
Rae's real name is Sahir Taliaferro - Roblox
BMS | Main Campus - Roblox

The server however is still perfectly fine as I can request a microprofiller in the middle of this lag and it returns perfectly fine. The Roblox server even still sends information through HttpService for logging with no issue. I’m thinking the server hadn’t been able to receive my request to even start a micro profiller till after the lag goes away.
Edit: It’s apparently that the server doesn’t receive the request to start recording a microprofiler request till after the ‘lag’ stops. So I’ve been just spamming the ‘start recording’ button until my latest post, showed some unusual tasks making a long frame?

log_E957C_microprofile_20200517-165828.html (451.9 KB)
log_E957C_microprofile_20200517-165917.html (450.1 KB)

This issue always seems to happen out of the blue, the server performing just fine until 30 minutes later where a lot of people are ingame. The issue does go away after I made my ‘hacky patch’ where all the clients are set to ping the server every 2-3 seconds and if the server never got a ping for 6 seconds, to start kicking the latest player that joined the server. The server would always come back to life after it kicked 5-12 people, which concludes that this is an undetected exploit.

I can’t reproduce this myself as I’m still looking into this issue and don’t know what is causing the Roblox server to become ‘unresponsive’ network wise to all clients (Chats, Character movement) but yet still be responsive for everything else.

This is one video that I’ve finally managed to take, this server was working perfectly fine for 15 minutes.
I’ll start recording more videos on other games if I’m able to catch them being lagged out also.

External Media

The current ‘hacky patch’ that makes me know that this is an exploit, is to have all the clients ping the server through a RemoteEvent every 2-5 seconds. If the ‘median’ of all the players pings in-game goes over 7 seconds on the server side, I start kicking the last player that joined the server and check the ping again and repeat kicking players until the ping comes back which it always seems to come back.

28 Likes

Same issue for me exept its Simultion lag which your profiler shows is very high aswell like mine. Your Profiler: image This started a week or two ago aften an update.

3 Likes

Im still having this bug though, Roblox needs to fix this.

Still when I join games.

1 Like

Valley Community High School, as well as other groups that have large servers such as Park East Middle School and Desoto High School which I have personally seen this issue with, are all experiencing this issue during our sessions. Our player cap is currently 120 players, however we have also had the issue when the cap was as low as 50.

Specifically, the issue we are having is that when the ping skyrockets, each localplayer can see themselves moving and jumping around with no issue but cannot see anybody else moving. They can also not see any chats they type or anybody else’s chat messages until the spike is over. The localplayer also still receives content from the server in some cases, such as our Bell script which notifies players when the next class has begun. When these ping spikes occur, they occur for about 3-8 minutes at a time, and then recovers instantly with a normal ping. Then, a few minutes later, it happens again.

To test to see if the issue has been due to one of our own scripts or systems, we tried running a session with NO scripts or local scripts other than essential modules such as the chat. During this test, we experienced the same spikes that we have been seeing in our normal sessions.

We believe that this issue can either be due to exploiters or a roblox engine issue.

https://www.roblox.com/games/457996978/Valley-Community-High-School

1 Like

This is also happening on low player servers such as

https://www.roblox.com/games/185655149/Welcome-to-Bloxburg-BETA?refPageId=07731ff3-1d4b-4c29-9124-4508feb31072

So I had to wait for a few hours and it let me in.

1 Like

I already stated in the OP of two of the games that are affected that you just repeated. Please read my entire post before posting, thanks.

I read your entire post. Just confirming with what I have seen and adding new information so that this issue is resolved, thanks.

3 Likes

As you saw in the video, when this happen it looks like the updates appear to be getting “buffered” (like the chat messages) and then everything flushes through all at once.

The most likely cause for this is when the server has an abnormally long frame (hundreds of milliseconds), which can have a variety of causes including exploits. The next time this happens, can you try opening up the explorer to the folder where microprofiles are stored, so that you can manually estimate how long it takes to get the microprofiles after you request them? If the server is having these very slow frames, you might not get the microprofiles back until after everything “snaps” back in to place.

3 Likes

Speaking from experience, the last time this happened in which I recorded microprofiles, the page did not confirm that the microprofiler was taken, meaning the “Start” button was locked for minutes and the profile was not added to my logs folder until after the server recovered.

Here is the microprofile that was added to my folder after the server had recovered from the lag spike.
log_F2DA9_microprofile_20200525-165432.html (405.2 KB)

1 Like

Your game has a ‘Spread’ script so I probably would consider removing that.
Be sure to check through any models you use!

4 Likes

Thank you for communicating with us about this issue. To give you more information on the question you asked, I have recorded a video during one of our lag spikes.

In addition to my other response, here is a summary of this video in which I recorded two microprofilers; the first one, which was requested at approximately 4:49 PM CST, was added to my logs folder at 4:54 PM CST, which was when the server began to recover. The second microprofile was requested at 4:54 PM CST and was added to my folder at 4:55 PM CST, which was the exact moment that the server returned to normal. You can see this occuring in the video below.

And here are the two microprofiles that it recorded.
log_ACD3E_microprofile_20200527-165443.html (385.2 KB)
log_ACD3E_microprofile_20200527-165522.html (374.2 KB)

3 Likes

About a week or two ago roblox made some update that has basically caused alot of lag spikes I haven’t seen before either. The most common things i see with people having the issue is that Simulation, Run Job and Heartbeat are usually the biggest bars. Yet to hear ona fix or some investigation done.

Here is my latest dump of multiple micro-profiles of the server.

All I notice is a spike in Replicator SendData and MegaJobs when this issue floats. I’ll keep trying to spam ‘microprofile’ start button if this issue happens again.

log_F14EC_microprofile_20200603-164002.html (372.6 KB)
log_F14EC_microprofile_20200603-164406.html (829.4 KB)
log_F14EC_microprofile_20200603-164352.html (389.6 KB)
log_F14EC_microprofile_20200603-164213.html (388.3 KB)
log_F14EC_microprofile_20200603-164209.html (392.1 KB)
log_F14EC_microprofile_20200603-164139.html (393.0 KB)
log_F14EC_microprofile_20200603-164127.html (403.6 KB)
log_F14EC_microprofile_20200603-164121.html (406.4 KB)
log_F14EC_microprofile_20200603-164116.html (358.1 KB)

Here is another video clip showing exactly what’s been happening.

https://streamable.com/tpj52p

Edit:
Here are some more microprofile dumps on another large server frame after I changed the ‘Frames per second’ etc.

log_F14EC_microprofile_20200603-165332.html (731.6 KB)
log_F14EC_microprofile_20200603-165654.html (774.7 KB)
log_F14EC_microprofile_20200603-165612.html (788.0 KB)
log_F14EC_microprofile_20200603-165606.html (822.8 KB)
log_F14EC_microprofile_20200603-165556.html (738.3 KB)
log_F14EC_microprofile_20200603-165526.html (696.9 KB)
log_F14EC_microprofile_20200603-165420.html (684.0 KB)

Noticing quite frequent chunks of these taking a long frame.

This something to worry about? I do notice TimeScript will need optimizations and will work on that asap.

Here are a few server MicroProfiles that I may consider needing a look at.

log_27891_microprofile_20200604-172040.html (3.3 MB)
log_27891_microprofile_20200604-171940.html (4.7 MB)
log_27891_microprofile_20200604-171932.html (4.5 MB)
log_27891_microprofile_20200604-171903.html (4.3 MB)

ServerJoinSnapshot?
log_27891_microprofile_20200604-171817.html (3.4 MB)
log_27891_microprofile_20200604-171759.html (4.2 MB)
log_27891_microprofile_20200604-161958.html (1.7 MB)

PhysicsSteppedSpike?
log_27891_microprofile_20200604-171656.html (7.4 MB)

Server with lots of PhysicsStepped and goes back to ‘healthy server’?
log_27891_microprofile_20200604-165332.html (5.1 MB)

‘Healthy server’ with PhysicsStepped starting again?
log_27891_microprofile_20200604-165339.html (3.7 MB)

Lots of PhysicsStepped and DisconnectCleanup?
log_27891_microprofile_20200604-165405.html (5.9 MB)

I’m going to try considering one fix that I believe would resolve this kind of ‘Physics’ abuse and report back if that solves my issue.


Disconnect Cleanup / Write Marshalled long frame

Some more MicroProfiles from today…

log_E7A5F_microprofile_20200608-175718.html (1.7 MB)

Another Video of the issue, server has been frozen for the past 3 minutes.
heads up, discord sound
https://streamable.com/ra55fh

Will add more later today, life got suddenly busy.

Not sure if this is related, but coincidentally at the same time, I’m getting a lot of HTTP 503 (Service Unavailable) errors causing servers in my games to come to a halt.

It started occurring about a day ago, and at random intervals. I have done nothing that would’ve caused this on my end, but the game is consistently struggling with broken servers due to failed HTTP requests. It just stops working.

Is there anything you could suggest to try to resolve this issue or anything for us to test? Are any of my microprofillers showing anything for the engineering team for us to toggle something? We’ve been having this issue for the past few weeks and it seems we’re running out of solutions.

Logging the use of Click Detectors for click spamming isn’t working.
Logging the use of the Touched event isn’t working.
Making another Click Detector as a honeypot isn’t working.
Making another Touched event as a honeypot isn’t working.
Making all unanchored/non-character parts SetNetworkOwner to nil for it to be handeled by the server only isn’t working.
Lowering MaximumMessageLength in ChatSettings and kicking players that go over this lowered limit isn’t working.
Disabling public admin commands from being used by non-staff users isn’t working.
Making sure that there is 0 un-anchored parts in the game except for characters isn’t working.
Changing ChatService ‘SayMessageRequest’ to a different name to throw off exploits that try to FireServer through the name of an event through a honeypot isn’t working.

Currently going to add a detection on server-side to kick players that say :lag all since apparently some people are saying that and the server always seems to stop responding for a bit afterwards to see if that will help at all to know if it’s some localscript admin commands added by an exploit.

Detecting when clients didn’t ping the server for over 30 seconds used to work, now doesn’t anymore so I need to make another method detecting if the server is lagging because I can’t interface with Avg. Ping from the developer console on a server script as its CoreScript only?

It’s getting tiring that Roblox doesn’t provide an option in the DeveloperConsole to automatically start a microprofiller if avg. ping goes high. I’m having to spam click the ‘start recording’ button just because the request for starting a server microprofiller HAS to start BEFORE this ‘exploiter/lagger’ ‘freezes’ the server or the microprofiller results is just ‘normal’ because the server only gets my request till AFTER the lag stopped.

2 Likes

RemoteEvents are sometimes targeted by exploiters, adding logs for remote events (especially remote events related to chat) may provide useful clues.

Also, if you can retain the game server ip & game instance id from a session where there was a lot of lag, I can check the internal server logs to see if there are any other clues. The information will be in a client log that looks like this:

1591719232.50393,7670,6 ! Joining game '77bc3959-c1bd-4b5b-8009-f78aa071e57e' place 606849621 at 128.116.54.198

Will edit this reply to include other days this has happened once I get onto my other computer, but for now, this is one that I do have on my current computer.
[Jun 15] Again today…

1592260852.65039,4f10,10 RakPeer has distributed 894 packets to plugins since last debug time
1592260852.65039,4f10,10 PacketReturnQueue is empty, no work to do

We’re getting a LOT of

1592260852.65039,4f10,10 RakPeer has distributed 894 packets to plugins since last debug time
1592260852.65039,4f10,10 PacketReturnQueue is empty, no work to do

in our clientlogs…

[Jun 10] Just today, lagged again.

1591825071.25897,1f7c,6 ! Joining game ‘775b9260-07a7-4395-9056-c8eb835c439f’ place 2698066019 at 128.116.42.70

Other days

1591223427.53896,2178,6 ! Joining game ‘7c516762-0873-4afa-a681-5ca99d9de10d’ place 2698066019 at 128.116.54.78

1591392902.27006,19d4,6 ! Joining game ‘7b5eea3b-e821-4b99-bfb7-c336390c7553’ place 2698066019 at 128.116.32.75

1591220395.36002,17ac,6 ! Joining game ‘19d9a285-a007-42ae-8e8d-7b5769b81535’ place 2698066019 at 128.116.43.156

I created this script to monitor the default chat’s remote events.
It was running during a laggy session and did not reveal any abuse of the chat remotes.

local numFires = {}
game:GetService("Players").PlayerAdded:Connect(function(player)
	numFires[player.UserId] = 0
end)

local storage = game:GetService("ReplicatedStorage")
local folder = storage:WaitForChild("DefaultChatSystemChatEvents")
local cService = game:GetService("CollectionService")
local complete 

repeat 
	for i,v in pairs (folder:GetChildren()) do
		complete = i
	end
	wait() 
until complete == 14

local function count(p)
	numFires[p]= numFires[p] + 1
 	wait(1)
	numFires[p]= numFires[p] - 1
end	


for i,v in pairs (folder:GetChildren()) do
	if v.ClassName == "RemoteEvent" then
		cService:AddTag(v,"remote")
	end
end

for j,remotes in pairs (cService:GetTagged("remote")) do
	remotes.OnServerEvent:Connect(function(player)
		if numFires[player.UserId] ~= nil then
			if numFires[player.UserId] >= 6 then
				player:Kick("Kicked for chat spam!")
				print(player.Name.. " was kicked for chat spam!")
			end
			count(player.UserId)
		end
	end)
end