Server Heartbeat Fluctuating Wildy with Floating Parts

Reproduction Steps

The bug occurs here:

This doesn’t seem to happen all the time, but happens more often than not. Sometimes I switch servers to find a slow one.

  • Wait for a round to start in a populated server and view the ServerJobs tab of the developer console. Observe the Steps Per Sec as the round starts and while the round is on-going.

I can’t replicate this bug outside the game

Expected Behavior

The server should run smoothly for the most part, without huge fluctuations.

Actual Behavior

Before Wednesday June 1st, the servers were running fine. There has been no update to the game since the 22nd of May and no major update for a long time. This is an old game and there have been no major server-side performance issues up until this point.

As boats are added to the water the server starts to struggle and the heartbeat fluctuates between 60 steps and 15 steps. This occurs while boats are intact and while they are destroyed in smaller parts floating on the surface. You’ll notice boats start to drive slower than intended and react slowly, this is because the boats are networked to the server.

This image was taken with all boats in tact, as you can see there are not many boats on the water. Steps Per Sec is at 15.

More screenshots


A screenshot that shows the server state after boats get removed.


Here are some server microprofiler exports.
log_BF2AF_microprofile_20220610-103059.html (1.8 MB)
log_BF2AF_microprofile_20220610-103038.html (1.7 MB)

Issue Area: Engine
Issue Type: Performance
Impact: High
Frequency: Constantly
Date First Experienced: 2022-06-01 00:06:00 (+01:00)

3 Likes

We’ve been having server lag issues recently in SCP: Roleplay as well, despite no known changes that could lead to something like that.
It is worth noting that SCP: Roleplay has no water component to it, which makes it difficult for us to trace the issue.

This issue can be seen in Glacier though, which does have water physics used a lot.

2 Likes

The engineers are investigating the issue. We will come back with a reply when we have updates!

Thanks for flagging!

2 Likes

I don’t seem to be having issues right now with a mostly water based game (my boat obby).
@Focia19 has anything been changed in the last 8 hours?

Is this issue still occurring @opplo ?
We are having a hard time reproducing the bug.

Sorry for the slow response @Focia19
The issue does still seem to be occurring, I’m having a hard time finding a way to reproduce the bug. I’ve been getting lots of reports about the issue since the 1st of June.

It would be great to know if any physics related update was pushed live on that date.
I’m going to spend some time today looking further into it.

Edit: It’s definitely still occurring. It may take 2 - 3 game rounds to experience it.

I’ve been doing some more digging, it seems some servers lag badly and others don’t at all. The first server I joined was broken. The only differences I can see are in the server Microprofiler readings. There’s very obvious differences as you can see from the screenshots below, both of these are taken before any boats are destroyed:
Broken Server


Working Server




Here you can see a quick video of the server performance in a broken and different working server. The Microprofiler readings were taken from each video.

Broken:

Here you can see the boats moving very slowly, the shark moves just fine this is because the shark is networked to the client not the server. Boats are all networked to the server.

Video

Server Microprofiler Report
log_2A6BD_microprofile_20220628-152303-[BROKEN-ROUND].html (3.0 MB)

Working Server

Video

Server Microprofiler Report
log_719E5_microprofile_20220628-153331-[WORKING].html (3.1 MB)

Sorry for the wait. We found two different problems:

  1. There was a threading bug that slows down water contact processing proportionally to however many threads the server is running on - each thread fights for the same resource, causing contention. This is why stepContacts is slow.
  2. There’s also a different low-level task routine taking too long.

We’ve disabled some flags to fix #1. Let us know if there’s any improvement for stepContacts.

Fixing #1 is likely going to fix #2 because the stepContacts performance problems likely caused the reschedule calls. Solving reschedule performance fully will take some more time and a rethink of this system, but it should be “solved” for now as long as stepContacts is fixed.

2 Likes

Hi Subcritical, thanks for getting back to me and looking further into the issue! It’s great news to see something might have been found.

The issue is still persisting in current servers, do servers need to be restarted for the changes to take effect?

Edit:
Here’s the latest server microprofile from a live SharkBite server:
log_3C128_microprofile_20220714-212853.html (3.2 MB)

After restarting all the servers it looks as though this issue has been fixed :grinning: and the servers that I’ve been through have all been running nice and smooth. I’ll update this thread over the next few days if the problem returns but thank you to the engineers who helped identify & fix the bug!

Here’s a most recent microprofile for comparison:
log_3122E_microprofile_20220715-130220.html (2.9 MB)

@subcritical I have a new game release (sequel) coming out soon and is currently in testing. Will the appropriate flags be disabled globally or will they need to be disabled on the testing place too?
Sequel place: SharkBite 2 Development - Roblox

2 Likes

we’ve turned off the flag globally at this point, so you should not worry about your new place

2 Likes

Hi opplo! We have again enabled a new version of the code that used to cause this (this time it is optimized specifically to prevent this problem). Please let us know whether the heartbeat is still stable. Thanks!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.