Actor:SendMessage() taking too much time

  1. What do you want to achieve?
    Fix or reduce the time of Actor:SendMessage()

  2. What is the issue?
    Actor:SendMessage() is just taking too much time that it makes the performance of my game servers really bad.

  3. What solutions have you tried so far?
    I have trying to optimize, but im out of ideas for this problem

For more details: im basically working on a RTS game based on the napoleonic era, where units can get up to 400 per game. It runs fine overall, but there is one problem. Im basically using a SwarmModule for faster distance checking and Parallel Luau for multi threading.

What the game does is making a lot of actors server-side in startup, to then send them messages to update units positions / make them search for targets. But because of actors having their own copies of modules by default, I need to send them all the swarm module “cells” or info about all the units to do certain things.

But then, it seems like sending that much info to actors make the performance way worse, being the main cause of lag problems for my game. What can i do to fix this? Any ideas? Im really lost about this.

2 Likes

Honestly it’s hard to respond to this because it’s mostly guesswork on what the actual problems are. You suspect it’s a large amount of data you are sharing and are probably right. But without much more information there is little that can be offered besides speculation and potential ideas that might work depending on the structure of your project. That said I have some information and some ideas to share, but expect them to not accurately reflect your situation.

First is not related to parallel, but you may need to consider some form of acceleration structure depending on what exactly it is you are doing. Like if every NPC is looking at every other NPC per step that is a lot of processing since you’d have about 160,000 checks per query with the basic approach of just looping. If this is the type of problem you are facing then looking into acceleration structures like quadtrees or bvh can help speed things up dramatically since those let you only consider areas that are actually interesting. Though this example does kind of assume some sort of locality to the decisions you make. This is however the exact type of problem things like physics engines need to do a lot so there is a lot of stuff written on how to do fast queries with structures like these.

With respect to parallel luau though, it’s worth noting that unless something has changed the way roblox gives you cores is based on your max player count. I think if you have 6 or less players then you only get 1 core. So if your game has too few players it’s possible that parallel luau is actually slower because it takes time to switch between these tasks that you wouldn’t have if you were running in sequence. The post I remember reading did say this could potentially be more complicated though because the core isn’t necessarily dedicated and that you may have half of 2 cores or however their systems split up work. But with only one core it was likely that you got one.

You could look into using shared tables which seem to be designed for distributing lots of information across cores which could help depending on the context you are attempting.

You could also probably design actors that specifically handle one type of enemy so that when it’s their turn you tell them and they know everything already instead of have to pull all of it in. You can potentially find other ways to reduce the amount of data you need to send, but I’m not sure if it’s the message length themselves being the issue, or just waiting for other work to complete on the cores. You’d probably have to somehow test this with them getting about the same amount of work with only sending a message to wake them up to determine if it’s the data size itself causing the issue instead of just having too much work.

Finally if you are just trying to speed up some decision making processes, you could also consider sending some of this work to the clients. Though if you do, keep in mind that the clients can and will lie about what values they get (because cheaters). So any data you get back should be validated. Or at least checked to be reasonable. Some things like pathfinding could benefit from this because it’s relatively quick to check if a valid path was returned. Though proving it’s the best path instead of the longest would require at least a brief search.

Anyways. That felt like a bit of rambling, but if you want more detailed information instead of just some vague ideas you’ll probably need to share some more specifics about how your systems works. At least to the point of being able to understand the actual limitations and problems.

1 Like

Alright, so, first of all im gonna share a bigger screenshot, since the last one has less info.

As we can see in the image, it is definitely the actors sending messages that are taking way too much time.

Now, explaining what is exactly sent through them, when I use them i send the unit data, its position, modifier areas and the biggest size thing, the “cells” of the swarm module.

You can think of the swarm module cells as a grid that has all the units locations in it, obviously used to optimize computing time of target searching. Now, why i do this?

The actors swarm modules will never be completely up to date all the time about units locations, since they get their own copy of the module. For that reason, I need to update them frequently for several things. The way I do it rn is sending all the cells info to the actors, that is obviously something very big, and i suspect thats the main cause.

Maybe I can use a bindable event for when a cell is updated, send the info compressed through a buffer to all the actors, so they keep up-to-date without sending them all the cells at once frequentely? I guess it would be way faster, right?

Should I do it with the bindable event connected to all the actors, or just let a “central” script update them with the new cell info with Actor:SendMessage?

1 Like

Have you tried taking a look at the SharedTable datatype? It’s designed to share large amounts of data between actors. So what you would do is create the table once, then send a message to each actor with the table and have them store it internally. The only caveat to this is that the data is shared amongst all actors, so if one actor modifies the table, they will all see the modification…in real time.

2 Likes

Parallel lua (and parallel in general) can be quite slow if your cpu cores need to talk to each other
Any method you use to send data to actors will be magnitudes slower than a usual lua table, so SharedTable, Bindables, :SendMessage(), will all be somewhat similar

Using parallel lua on roblox is hard because of this, and limitations imposed by the roblox engine (for example, cannot have a fully parallel lua threads that runs independent from the game’s fps)
To use parallel lua, you must have a heavy workload, per actor. If you have a ton of actors, each doing a tiny task, group them into a handful of actors doing a handful of big tasks

My recommendation would be to restructure how you are using parallel lua, if possible, to avoid sending that much data to actors, and also grouping data into like 48 or so actors rather than multiple hundreds

I would also recommend using the micro profiler to analyze performance
(as long as you don’t use the stupid performance tab in studio, it’s ok I guess)

1 Like

Right now, I have two options.

Use bindable events
Basically update individual cells when needed in each actor instead of sending them all when messaging

Use shared tables
Basically that, try to use SharedTables.

Im not really sure what to go with, since I have heard shared tables are quite slow too

For my system…I have each module is an actor. Each weapon, each NPC. Because LUA was never designed as a concurrent language (It was originally developed in Brazil for people who do not hold computer science degrees to be able to write software…and it shows.), Roblox had to make some provisions so that things wouldn’t clobber each other. Just because something is in an actor, doesn’t mean that it’s running in a separate memory space. I have found out, through experience, that Roblox’s implementation of Parallel LUA does not use a shared memory space. Each memory space is capable of running one thread. But you can have a group of actors within a memory space. Because Roblox has chosen to keep things separate, they have instituted a number of protocols to deal with this. Under the hood, [SharedMemory] has read/write locks so you can have many readers, but if something needs to write, that write is protected because the readers are all blocked.

I have actually implemented multi-threaded server code myself using C and C++. It’s easier to deal with, manage, and get the performance out of the code…but you can also really get into trouble with it too. The reason why Roblox implemented it this way was so that developers couldn’t easily crash the server or the client. Most Roblox game developers do not have a degree in computer science, so the process safety requirement overrides the performance requirement.

I’ve had a few conversations with Roblox engineers about Parallel LUA. You might want to take a look at them.

Here’s some topics you should read:

https://devforum.roblox.com/t/insufficient-documentation-about-actors/2619278
https://devforum.roblox.com/t/add-ability-to-suppress-warningerrors-from-accessing-thread-unsafe-instances-in-parallel-lua/2628762
https://devforum.roblox.com/t/disable-engine-not-safe-to-in-parallel-errors/2624950
https://devforum.roblox.com/t/lua-interpreter-add-locking-primitives-to-the-task-library/2571024
https://devforum.roblox.com/t/parallel-lua-how-to-get-data-back-from-an-actor/2579091/3

Hopefully some of that will help.

Idk many details about your system, but i’m working on projectiles and tried using parrarel luau, which turned out to be worse than serial, here are some things you can consider

  • Pre-calculate stuff that doesn’t need to be calculated, even small thing like multiplication or exponentation might provide you some gain in ms, it’s still a lot considering it’s usually a matter of 1-2 more variables

  • Use client for rendering, if you don’t do it already, it will come at the cost of dealing with bandwidth and binary operations to optimize it, but still it’s worth, if you don’t know, you can combine buffers and send only neccesary info to maximize your game’s performance

  • Use native code, if you have pathfinding operations like A*, you should try to use native code inside a module, this might speed up your game by few percent, it only works on server, but it’s worth trying

  • Optimize grid, grid tends to be easier for use, if you don’t mind storing a lot of data in your game of course, but sometimes devs don’t use it to it’s fullest potential, for instance range checks can be reduced to 9-25 per unit, rather than hundreds

    A great example of this, is not using indexes, but rather looping through each grid, you can find square of cells, based off your target cell position

Min = index - x - z * number of collumns
Max = index + x + z * number of collumns
  • Don’t update units outside of camera view, your player really doesn’t need to update enemy if he doesn’t see it, players should only keep track of if enemy is actually on screen and reduce actions based off that

  • Simplify your physics, if you use Roblox’s ones, then you may also want to use raycasts instead, RTS are usually 2D, and even if they have 3D elements, they don’t require having upper collisions, so sending 5 raycasts when your units update is possible solution

  • Try to handle units on client, if you don’t have to use your modules on server, you can use parrarel luau on client, making it much more effective, if you perform server checks, you can still pretty much have cheater-free experience

I wish i have helped, good luck with your game

@0786ideal
@Maelstorm_1973

I have actually made all pre-calculations possible to reduce computing time and made all the units only be displayed client-side, server have nothing, just pure data and i keep searching for little things to speed up computing time.

I have actually reduced the actor messaging time by a lot, a houndred times faster than it was. But to do it I am using buffers, and using them is taking more time than i would like to. Based in your experience:

Which one should be faster, bit packing with bit32 or buffers? I want to go with the fastest one to optimize the compression further

1 Like

Bit32 allows you to do binary math, buffers allow you to store data as bytes, you can use bit32 to turn two numbers into one, remember that each number you will compress, you need to decompress + they will take more bytes

example, if you want to turn two 4bit numbers into a buffered one, you’ll get one 8bit number

Also, important thing to know, is that buffers are mostly used for remote event optimization, idk if actors have bandwidth so i can’t tell you this, but you should use one big buffer over one smaller one, at least roblox says that

Also, if you want to optimize your game, you should focus on structural optimizations, rather than code-level ones, as they are the most usefull ones, you can also try to optimize gameplay, by making 400 units rather not strategically good idea

Remember that parrarel luau is only for big games with a lot of players, and if it is only used on smaller ones it lags a lot

If you want to know, use about 16-32 actors for all of your enemies, and you should be fine, each actor can control each player’s units or maybe smaller groups

I think actors do have some kind of bandwith, since after I made them that they receive compressed data in buffers instead the time Roblox engine took to send the messages was a houndred times faster than before, therefore why i want to compress it even further.

The rest of the game structure seems to be extremely well performant based on script profiler and such, as I did spend some time optimizing them as much as I could. Actors messaging are the last issue to be fixed, and compressing the data seems to improve it a lot

1 Like

All of this is nice and well, but there is something you must realize. Data compression is a science in itself which stems from information theory. There is a tradeoff here though. Any time that you execute code, it takes time. In computer science, we call that the (O) notation. The actual number is derived from the algorithm used. With that being said, the tradeoff is the time it takes to compress/decompress the data vs. actor signaling.

Actors allows your game to run on multiple CPUs which is good for large numbers of players. From what I understand, the number of CPUs that your game is allocated depends directly on the number of players. So there’s that to consider. Read this bug report and look at Roblox’s response to it. The number of CPUs you get determines the number of threads you get (1 thread per CPU), which in turn is determined by the number of players on the game. Roblox does this so as to maximize CPU utilization. That’s good because in a server farm, you do not want idle CPUs hanging around.

Then, there is another thing you can do, which may or may not benefit your game

This thing is to reduce data you calculate, you should look for other algorithms, or maybe there is option to make those calculations less frequent

Idk about your units, but you’re probably doing magnitude checks of some sort, in this case you can go for spatial partitioning

If you’re not using any grid/like structure, you can also split work on clients, and use parrarel luau here, it comes at cost of risk, but it will benefit in performance

Last tip is to not use parrarel luau and try serial, sometimes it’s faster because the time it takes to jump between threads is very long, doing it for 40 actors at the same time is even longer

Based in Roblox response our game should have around 4 cores and 4 threads, and seeing benchmarking tools in Roblox it seems like data compression seems a very good tradeoff, taking merely just some very small miliseconds and actor messaging needing way less time, based in this the game should actually be beneffited a lot by parallelization

The game has been using swarm module (spatial partiotining) since a long time. Based on Roblox response about how many cores and threads are allocated, Parallel Luau seems to be worth it, since the tradeoff by computing time is really favorable.

To optimize it even further, ill try to compress it with buffer bits functions even more, ill let you all know how it goes

1 Like

Swarm Module isn’t perfect, and can be improved, see grids are OK to use, but there are a lot better data structures such as quad-trees

Roblox allocates cores per 6 player, so if your game would be small parrarel luau will cause more lag than performance

Packing data is nice idea, but still try to remove stress from your script, idk how your units work, but if they are AI based and player don’t control them, you can spread their simulations over many frames by using Random and time accumulators, this way instead of large lag spikes you’ll achieve smooth transitions

Another possible methood that i saw, is to disable units rendering when they are off screen, also for magnitude checks connected to a player, you can use getDistanceFromCharacter which is about 2x faster than magnitudes

@0786ideal @Maelstorm_1973

After ton of time trying to improve the performance out of actor messaging, I have finally improved it extremely, making it take WAY less time than before. I wanna thank u all for helping me and giving me advice, especially for the info about parallel luau cores and such.

Here is a screenshot to show the improvement:

EDIT:

For anyone who has the same problem i had before with actor messaging time, its probably happening due to the same thing that happened to me: sending too much data to actors.

In case you need to send, for example, tables with a lot of data that is gonna be the same, prefer to use shared tables instead.

Another tip is to compress the data you 100% need to send with buffers, if possible.

1 Like

Also another fun fact, luau heap is not hom much memory is used but rather allocated at the time you measure, you should use profiler to measure time