Pathfinding is game-breakingly slow and blocks other processes

We recently released a game that relies heavily on pathfinding, Zombie Outbreak Tycoon. Most of our issues have been sorted out since launch but the biggest one remains - zombies sometimes stop in place, due to Path::ComputeAsync taking long to compute.

In the theoretical worst case, this means an individual zombie may stop what they are doing for a second or so. But, in practice, since pathfinding is not parallelized in the background and can’t run several jobs at once (according to the microprofiler), one particular path being difficult to compute queues all others up after it and has a cascading effect that makes all zombies stop moving for an indefinite amount of time.

Worse yet, it seems as though pathfinding may also randomly block work on the main thread, in labels such as buildPolyMesh, buildRegions and computePath, this is also evident from microprofiler dumps.

From the perspective of a player these issues make it seem like the “zombie” part of “zombie outbreak tycoon”, which presumably is why they started playing the game, is either nonexistent or broken, and also cause seemingly random ping spikes as the server is busy.

Here’s a microprofiler dump from a server showing the issue:
log_94bd7_microprofile_20230626-195820.html (1.9 MB)

Expected behavior

I expect pathfinding to take a reasonable amount of time, or at the very least error if pathfinding is taking too long (>0.5s) instead of blocking all other pathfinding calls, since there exists no way to cancel an ongoing pathfinding job.
I also expect pathfinding to truly be a background job, and never completely block the main thread.

5 Likes

Can you provide more details about how you are using pathfinding?

  1. How often do you issue path requests?
  2. Are you using the Blocked event?
  3. How dynamic is the environment?
  4. Do you reuse paths among NPCs?
  5. Do all NPCs use the same agent dimensions?

If you have a repro place, that would be helpful too.

Can you provide more details about how you are using pathfinding?

  1. Depends. When zombies spawn (and many may spawn at once) they need to issue a path request, but when walking and following the path they don’t issue another unless they switch targets. We have also set a hard limit of max one new pathfinding request per frame.
  2. Yes, but we don’t recalculate paths right away when blocked, we store the blocked waypoint index and recalculate once that has been reached.
  3. The environment is very dynamic in the sense that models appear/disappear (it’s a tycoon/building game), but we generally don’t have any unanchored or constantly moving parts that affect pathfinding. Doors may open and close and affect pathfinding, but this is also an on/off thing and any animation is done on the client.
  4. We don’t, because of the dynamic nature of the game.
  5. We have 3 or 4 different sizes for zombies and they have their agent height param adjusted accordingly to let them fit/not fit in doors the way players would expect. Would having a single set of size params help our case here?

Thanks. The profiler capture does indicate that creating the navmesh is taking longer than expected to generate. To help diagnose the problem, could you try disconnecting from the Blocked event and also using a single agent size?

If you have a place that we can use to investigate, please DM it to me directly.

Hello, we have made some recent improvements to perf and memory management around pathfinding, though this sounds like it could be a separate problem. Are you still experiencing this issue? If so, we will need a repro case to investigate further. Otherwise we will resolve this issue for now and you can share an example if you run into it again. Thank you!

Did you end up finding a fix for this? I’m experiencing this right now as well.

I am also experiencing this issue and have not discovered a resolution. It appears that my NPCs will just randomly start experiencing extremely long ComputeAsync calls which cause them to stand idle for 10+ seconds at a time. This typically starts happening once servers have been up for 20+ minutes but does not appear to be related to a memory leak, as I ensured server memory is not continuously increasing during that time. My game typically can have as many as 50 pathfinding AI alive at once but I have a 3 second pathfinding cooldown for each AI which prevents them from spamming ComputeAsync too often. I have also attempted to reduce the complexity of the paths by adding PassThrough modifiers to each collideable part within the AI as well as players. Any assistance on fixing this issue would be greatly appreciated.