My game Dystovia has a huge issue with server-sided lag that comes up after hours of uptime making it REALLY hard to test any single change and I’ve tried everything in my scripts and in the game to solve this and I can’t get to a solution. These are some server sided stats of how much “Navigation” and “Physics” takes up:
Navigation from my reading is and should be mostly related to pathfinding and through stresstesting I’ve found out that the Engine itself adds small amount of MB’s in memory to the navigation SIMPLY by calling Path:Computeasync, which does not get garbagecollected, even after 100% destroying a pathfinding object, IF you make a lot of calls that are complicated (The game world is huge). The topic I’ve found this through says that around 2-3 GB’s of memory it garbage collects a whole chunk and it is merely visual, but from my findings the game does get laggier over time and the server does not hold up over time.
For more info:
Details about the game and the bug:
Mobs get spawned in and despawned based on if a player is in range, while waiting for a player it checks every 4-5 seconds and can spawn in.
Luaheap, Scripts, etc. do not increase significantly in memory.
The amount of models and amount of mobs does not increase over time in the game
(Therefore from my understanding if it is not saved in heap or it isn’t found in workspace idk how there could be more pathfinding calls?)
Here is the essential parts/pseudocode of the pathfinding code for mobs including everywhere the path object in the function is being used:
local function PathFind(Params,Humanoid,Origin,Target)
local Path = PathfindingService:CreatePath({Params})
local Success, ErrorMessage = pcall(function()
Path:ComputeAsync(Origin.Position,Target.Position)
end)
if Success and Path.Status == Enum.PathStatus.Success then
local WaypointList = Path:GetWaypoints()
for PathNum,PathPoint in ipairs(WaypointList) do
local PathPointDist = (PathPoint.Position - Target.Position).Magnitude
if Conditions then
break
end
Humanoid:MoveTo(PathPoint.Position)
repeat
if Conditions then Path:Destroy() return end
NewMob.PathCounter += 1
if NewMob.PathCounter > MaxPathTries or Conditions2 then
Path:Destroy()
return true
end
until Conditions3
else
break
end
end
else
if Conditions4 then Path:Destroy() return end
repeat
Tries += 1
if Conditions5 then
break
end
if Conditions6 then
Path:Destroy()
CallFunction1()
else
CallFunction2()
end
if Conditions7 then
Success, ErrorMessage = pcall(function()
Path:ComputeAsync(MyRoot.Position,PlayerTarget.Position)
end)
Path:Destroy()
end
RunService.Heartbeat:Wait()
until Conditions8 or Tries >= 3
end
if Path then
Path:Destroy()
end
end
All I can tell from these stats is that System and Navigation uses the most memory in general on average. The analytics are nice to use sometimes when viewing overall game health, but it’s only partially useful for isolating problems in specific servers. How many servers are “bad” and how many are “good” in that graph? We don’t know.
Roblox’s documentation for the pathfinding system is horrible. They don’t specify how it should be used. We also don’t know what’s going on under the hood. I think there is some amount of caching or they pushed an update that broke pathfinding (this isn’t the first time).
Can you provide the following information:
when did it start happening?
what do you mean by lag? are your players experiencing high ping? low FPS? are their devices/clients crashing? is the server crashing? is the server shutting down on players?
have you actually entered a “laggy” server to gather data?
have you done a server side micro profile in a “laggy” server?
have you done a snapshot of the lua heap in a “laggy” server?
Looks like memory leak somewhere, you should search through scripts on one of those laggy servers using luau heap to see what really is going on
EDIT: I forget to add that i never saw game that uses path finding made by roblox to create AI, you can know from look, they usually use their custom path finding or simple lerping, you should consider making your own custom path finding by using raycasts and algorithms soo it will be 100% dependant on your knowledge
Noticeably old servers have mobs that don’t move simply due to their AI being throddled so much that their AI works slower, it goes from very responsive instantaneous reactions to the player to waiting 10 seconds before performing any action whatsoever.
I can maybe provide graphs for what happens in game, just know that navigation memory ticks up to 10x its original usage even with a super high amount of mobs active (the amount of active mobs vary), and the server performance through the microprofiler goes down with time. I can’t test this properly since you have to play in a server for like +3 hours with others to make the server really tank in performance noticeably.
It begun a long time ago, but I thought it was related to other stuff so may date even a year back, for sure 6 months.
High ping, fps isn’t impacted as much, client is smooth
Yep and it says navigation is high, physics is high - but a lot of the data is confusing and leads in different directions - the only clear thing seems to be navigation.
& 5. Yes and through that I’ve fixed potential memory leaks, and found out that the “model” count doesn’t go up aka the models / mobs in game doesn’t increase with time
Luau heap is SUPER low, and how would I figure out what the luau heap is spent on?
Pathfinding is tricky, because I am using Roblox’ terrain so any other pathfinding algorithm I come up with will 100% be less efficient in cost, worse at navigating and takes a lot of time. If this is the only solution then I guess, but navigating Roblox’ navmesh isn’t the easiest.
Ok, so high ping mostly. I can see that your server’s tick rate is dropping. You want to keep that under 16ms per “tick” or “frame”. The image below shows a healthy server.
Your laggy server looks like it takes anywhere between 24-51ms to complete a frame, we need to figure out why it’s doing that.
Looks like major contributors are this “sleep” task and marshalling. Unfortunately there’s not much information to go beyond this. Some of the task names aren’t publicly documented.
Here’s what I recommend you try:
1. Instead of creating a new path every time, create 1 path at the start.
Updated
local Path = PathfindingService:CreatePath({Params})
local function PathFind(Params,Humanoid,Origin,Target)
local Success, ErrorMessage = pcall(function()
Path:ComputeAsync(Origin.Position,Target.Position)
end)
I’m 50% sure this should fix your system.
2. Insert a lot of microprofile markers.
As an example I will show you my recent debugging experience. I got vague markers of what the process is doing, so I started injecting a lot of custom descriptive markers to my scripts and let it run until the issue pops up. Then I was able to diagnose the problem. See debug.profileend() and debug.profilebegin()
You can’t really fix stuff unless you know exactly what triggers it. Do everything in your power to get it to behave like the old servers. Once you do that, try different fixes until you cannot trigger it anymore.
You did try doing it but I’m skeptical of it. Memory problems usually lead to the server crashing or the client crashing. You said people were having ping issues and the microprofiler is showing long CPU processing times. This is a different issue. Find the problem or ask your community for help.
I’m afraid this is the closest to a solution, I will test the path thing first and let you know my findings. The annoying part is that through everything I’ve tested it points in 10 different directions making me think I solved the issue, while only doing a small server optimization. I’ve tried to make it consistently happen first but I cannot replicate it within 10 minutes so far.
I have, the graph view is bugged and links to the wrong things so it doesn’t help at all. It will say for an example “Deer mobrunner 2” which just calls their AI, all it really says is that it is related to mobs.
why do you think it’s broken? i use it often and it helps a lot, also i made some research and path finding service is broken… soo there are few possible options:
Remove path finding and simply add mobs going to player with a little help of raycasts
Shut down the server every few hours to clean it up
Make your RPG round based (many games that use path finding are this format only), i don’t reccomend that tho
Create your custom pathfinding (can be controlled but performant heavy)
The game relies on complicated AI, I cannot just remove pathfinding for this type of game.
I could make some other system but again that isn’t trivial for navigating Roblox’ navmesh and in the end would probably create more frustrations than solving it with the chance that it isn’t even related to pathfinding in which case I’ve wasted time on a system that makes AI worse and doesn’t solve the issue
I already shut down servers.
The game is not turn based nor can be, you should try it out to get an idea - this is not feasible.
See, roblox’s pathfinding as you know doesn’t garbage collect stuff, soo there is no other option, you can see many devs talking about this too in dev forum, sadly only thing we can do is wait for roblox to solve stuff, or you need to replace this roblox’s pathfinding with something else sadly, anyways i wish you good luck with that
Try this first, there’s a good chance it’ll fix it.
The whole memory debugging thing is the hardest route to take out of all of your choices. Take my advice and focus on placing debug markers for the micro profiler. There’ s a good chance if you fix the micro profiler problem you’ll also fix the memory problem because they’re linked.
If you give up, try a substitution. Your code is really easy to sub out with SimplePath. I recommend this one. It’s an open sourced module that has been battle tested in real games and used by multiple developers include myself.
I cannot call debug profile begin and end because the script is a modulescript being run by several mobs simoultaneously, instead I’ll try deactivating pathing to see if it fixes it - any change I do I can check ~8-12 hours later in the main game when a server has been up for a while. I’ve tried also a few other small changes/fixes related to what I’ve seen but not expecting much. There are 2 major things microprofiler/stuff points to: Navigation and Humanoids (Either amount or related to animation)
I’ve now disabled pathfinding and that wasn’t the issue, server lag still accumulates over time. Memory is reduced but didn’t have an impact in the bigger picture:
It might actually be better, I was jumping the gun a bit but this different issue is not as severe and the game might actually be good - the issue is finding a replacement for pathfinding completely as using the built in will not work…