Speaking from experience, the last time this happened in which I recorded microprofiles, the page did not confirm that the microprofiler was taken, meaning the “Start” button was locked for minutes and the profile was not added to my logs folder until after the server recovered.
Thank you for communicating with us about this issue. To give you more information on the question you asked, I have recorded a video during one of our lag spikes.
In addition to my other response, here is a summary of this video in which I recorded two microprofilers; the first one, which was requested at approximately 4:49 PM CST, was added to my logs folder at 4:54 PM CST, which was when the server began to recover. The second microprofile was requested at 4:54 PM CST and was added to my folder at 4:55 PM CST, which was the exact moment that the server returned to normal. You can see this occuring in the video below.
About a week or two ago roblox made some update that has basically caused alot of lag spikes I haven’t seen before either. The most common things i see with people having the issue is that Simulation, Run Job and Heartbeat are usually the biggest bars. Yet to hear ona fix or some investigation done.
Here is my latest dump of multiple micro-profiles of the server.
All I notice is a spike in Replicator SendData and MegaJobs when this issue floats. I’ll keep trying to spam ‘microprofile’ start button if this issue happens again.
Not sure if this is related, but coincidentally at the same time, I’m getting a lot of HTTP 503 (Service Unavailable) errors causing servers in my games to come to a halt.
It started occurring about a day ago, and at random intervals. I have done nothing that would’ve caused this on my end, but the game is consistently struggling with broken servers due to failed HTTP requests. It just stops working.
Is there anything you could suggest to try to resolve this issue or anything for us to test? Are any of my microprofillers showing anything for the engineering team for us to toggle something? We’ve been having this issue for the past few weeks and it seems we’re running out of solutions.
Logging the use of Click Detectors for click spamming isn’t working.
Logging the use of the Touched event isn’t working.
Making another Click Detector as a honeypot isn’t working.
Making another Touched event as a honeypot isn’t working.
Making all unanchored/non-character parts SetNetworkOwner to nil for it to be handeled by the server only isn’t working.
Lowering MaximumMessageLength in ChatSettings and kicking players that go over this lowered limit isn’t working.
Disabling public admin commands from being used by non-staff users isn’t working.
Making sure that there is 0 un-anchored parts in the game except for characters isn’t working.
Changing ChatService ‘SayMessageRequest’ to a different name to throw off exploits that try to FireServer through the name of an event through a honeypot isn’t working.
Currently going to add a detection on server-side to kick players that say :lag all since apparently some people are saying that and the server always seems to stop responding for a bit afterwards to see if that will help at all to know if it’s some localscript admin commands added by an exploit.
It’s getting tiring that Roblox doesn’t provide an option in the DeveloperConsole to automatically start a microprofiller if avg. ping goes high. I’m having to spam click the ‘start recording’ button just because the request for starting a server microprofiller HAS to start BEFORE this ‘exploiter/lagger’ ‘freezes’ the server or the microprofiller results is just ‘normal’ because the server only gets my request till AFTER the lag stopped.
RemoteEvents are sometimes targeted by exploiters, adding logs for remote events (especially remote events related to chat) may provide useful clues.
Also, if you can retain the game server ip & game instance id from a session where there was a lot of lag, I can check the internal server logs to see if there are any other clues. The information will be in a client log that looks like this:
1591719232.50393,7670,6 ! Joining game '77bc3959-c1bd-4b5b-8009-f78aa071e57e' place 606849621 at 128.116.54.198
Will edit this reply to include other days this has happened once I get onto my other computer, but for now, this is one that I do have on my current computer.
[Jun 15] Again today…
1592260852.65039,4f10,10 RakPeer has distributed 894 packets to plugins since last debug time
1592260852.65039,4f10,10 PacketReturnQueue is empty, no work to do
We’re getting a LOT of
1592260852.65039,4f10,10 RakPeer has distributed 894 packets to plugins since last debug time
1592260852.65039,4f10,10 PacketReturnQueue is empty, no work to do
in our clientlogs…
[Jun 10] Just today, lagged again.
1591825071.25897,1f7c,6 ! Joining game ‘775b9260-07a7-4395-9056-c8eb835c439f’ place 2698066019 at 128.116.42.70
Other days
1591223427.53896,2178,6 ! Joining game ‘7c516762-0873-4afa-a681-5ca99d9de10d’ place 2698066019 at 128.116.54.78
1591392902.27006,19d4,6 ! Joining game ‘7b5eea3b-e821-4b99-bfb7-c336390c7553’ place 2698066019 at 128.116.32.75
1591220395.36002,17ac,6 ! Joining game ‘19d9a285-a007-42ae-8e8d-7b5769b81535’ place 2698066019 at 128.116.43.156
I created this script to monitor the default chat’s remote events.
It was running during a laggy session and did not reveal any abuse of the chat remotes.
local numFires = {}
game:GetService("Players").PlayerAdded:Connect(function(player)
numFires[player.UserId] = 0
end)
local storage = game:GetService("ReplicatedStorage")
local folder = storage:WaitForChild("DefaultChatSystemChatEvents")
local cService = game:GetService("CollectionService")
local complete
repeat
for i,v in pairs (folder:GetChildren()) do
complete = i
end
wait()
until complete == 14
local function count(p)
numFires[p]= numFires[p] + 1
wait(1)
numFires[p]= numFires[p] - 1
end
for i,v in pairs (folder:GetChildren()) do
if v.ClassName == "RemoteEvent" then
cService:AddTag(v,"remote")
end
end
for j,remotes in pairs (cService:GetTagged("remote")) do
remotes.OnServerEvent:Connect(function(player)
if numFires[player.UserId] ~= nil then
if numFires[player.UserId] >= 6 then
player:Kick("Kicked for chat spam!")
print(player.Name.. " was kicked for chat spam!")
end
count(player.UserId)
end
end)
end
I’m unable to tell since I don’t think there even is lua access to see this information on a Roblox client to see if isLimitedByCongestionControl is even true.
If you truly want to get rid of exploiters using DoS through your game (application layer), you won’t get around auditing your entire server code, looking for things which can be spammed and are quite expensive.
Look out for:
Backdoors (anything requiring a module you dont know about)
OnServerEvent connections
OnServerInvoke connections
Touched events
InvokeClient occurences (never do this, delete them)
ClickDetector events
GuiButton mouse events which are connected on the server (yes that works)
Scripts that interact with Instance changes inside characters or player backpacks
Scripts that interact with Humanoid properties and events including animations
Scripts that interact with Accessories and Tools which are children of workspace
Sound playback if RespectFilteringEnabled is disabled
Add debounces, usage trackers and/or rewrite badly performing code. It’s important to note that the “spamming” of a signal is not necessary to cause a DoS as it is possible to send malformed data which can lead to very long/infinite loops or similiar.
The server microprofiler shows a completely healthy server on the over HUNDREDS I’ve recorded. Other than the random occasions of big SendData and Disconnect frames. The server has been entirely healthy pinging a discord webhook through a tick() every 60 seconds sharp.
I’m full on betting this is a RakNet bug or some kind of RakNet issue being abused thats causing this insane queueing.
Our lag has been happening over… and over… and over… It’s starting to be a daily occurance and it’s nothing we can do because we don’t have any RakNet logs. We can’t tell who’s sending the most RakNet data to the Roblox server because we simply can’t get that data…
Here’s yet another video, showing that everyone is complaining about lag. The server freezing like crazy, character positions not normal, chat being chunky.
https://streamable.com/7okxeq
Apparently Roblox Screen Recorder doesn’t record the Network Connection Health stats on CTRL+SHIFT+F4/F6 so I’ll have to record it again next time.
I’ve completely modified ChatService to not rely on Plr:Chatted by removing the C++ legacy fire event. I’ve completely renamed a few of the ChatService remote events so this isn’t a chat spam or command issue.
I’ve checked the game to have 0 unanchored parts. I’ve checked and made a script to force all non character parts to be SetNetworkOwner nil to have it server only. I’ve checked Server Scripts on the DevConsole and clicked on Rate (/s) to sort and see what’s firing the most and it’s nothing abnormal. I’ve reduced the attempts I’ve been using RemoteEvents even though the Server Logs doesn’t say anything about Remotes being fired too much from a player. I’ve even swapped between using Adonis or Basic Admin Essentials only, neither has a difference.
I’ve exhausted all my efforts, I’m blaming this lag issue on a RakNet buffer queue.
I really wish I could slap [ROBL CRITICAL] on this issue because this is happening to multiple games still.
We can’t do anything about it because I have a strong suspicion that this is either some DoS or RakNet exploit. The Roblox server is just fine and isn’t even having a long frame over 10 seconds but every client ingame is just frozen for upwards of 10 minutes, completely unable to play the game…
If a Roblox Staff would also want to know information about my captured packets from the game host IP, please DM me before 10 hours of this post, otherwise I’ll just dump all the zip files here of all the captured packets from RakNet.
This issue is very negatively effecting my game, making it almost unplayable due to these exploits. Please fix this as soon as possible. Exploiters are able to shutdown servers in my game through this and it seems there’s nothing I can do.
Upon every exhausted effort, I can finally conclude that this is someone using a DoS attack or RakNet exploit against the game servers that made the server and that the games we’re playing on.
The only fix for this is to either wait for Roblox engineers to add DoS prevention measures like Cloudflare Spectrum or for Roblox engineers to find a way to work around this unwanted traffic jam.
or for game developers to work on split server cross gameplay, a method of splitting players between multiple Roblox servers and either using MessagingService or HttpService with your own server to relay data between all the other servers to handle showing all the characters and game data.
We’re still yet having this issue for over 40 days.
Here is an entire list of game job ids that I have available that had the game lag for Roblox staff to view through it.
1591825071.25897,1f7c,6 ! Joining game ‘775b9260-07a7-4395-9056-c8eb835c439f’ place 2698066019 at 128.116.42.70
1592254411.40699,4f0c,6 ! Joining game ‘7b8c3158-4946-4e90-a8b5-2103c9a3a508’ place 2698066019 at 128.116.4.103
1592254411.40699,4f0c,6 ! Joining game ‘7b8c3158-4946-4e90-a8b5-2103c9a3a508’ place 2698066019 at 128.116.4.103
1592342524.64920,3f30,6 ! Joining game ‘f8afb967-ef90-4112-84fd-24308e0d5b1e’ place 2698066019 at 128.116.24.153
1592601368.21216,1dfc,6 ! Joining game ‘345ba639-e6f0-49a9-b726-706be203e46b’ place 2698066019 at 128.116.34.25
1592860096.39523,0798,6 ! Joining game ‘0743b4a1-88c5-455d-a0dc-496d904e2595’ place 2698066019 at 209.206.42.108
pcapng for section 1 through 3 events - 6-22-2020-1-3 lag events.zip (2.9 MB)
1592950723.75232,3f8c,6 ! Joining game ‘39fb1318-c90f-4aa0-8099-f5f1ece59768’ place 2698066019 at 128.116.35.88
This is still a game breaking issue as no body can play our game.