Best uses of Parallel Luau?

So I’ve learned know how to use actors to have different cores running processes, but now I am wondering what is actually the best use cases for this that people have found when making games. I am currently trying to improve a procedural terrain generator im using for a game, that takes about 30 seconds to generate 1500x1500 studs of terrain, including a large number of parts (trees, buildings, etc). On the Parallel Luau documentation, it has an example of using multiple actors for procedural terrain generation, but I tried this and now know that basically any operation that writes to the data model is restricted in parallel. So this is not even useful for terrain generation or placing a ton of parts.

So, I can see the usage of doing certain things like having a unique actor with events tied to individual clients for ray casting/other calculations, but where else have you guys found that this is useful, particularly for strictly server side stuff?

I believe parallel is useful when you need more computing power. Procedural terrain generation need to do computing to figure out what they want. Once they do, then you switch to the main thread to write the results. Raycasts simply read the sizes and cframes of parts and terrain to detect intersections. Some collision operations are more expensive than others, like spherecast and blockcast. When I say expensive I mean computing power.

1 Like

The naming is a bit frustrating. Actors are really worker nodes. And the main thread is the master in a traditional master-slave paradigm. So you can google more on game dev how the master-slave model is used for your research purposes.

basically any operation that writes to the data model is restricted in parallel. So this is not even useful for terrain generation or placing a ton of parts.

Parallel is very useful when you follow the single master-multiworker design pattern. And in doing so you must follow the rule where you have one writer and multiple readers (the actors). Main thread / master is for coordinating roblox specific tasks like a queue system which then processes/modifying data of instances (as you said). Whereas workers are great for heavy mathematical computations or even modifying object data. Ill explain a bit further below but I use a chunk system to manage mobs. Because the chunk system is data objects I can read from master. Im able to modify using shareddatatable and combination of mutex locks.

But workers are excellent at computing raw data. For example in the master you can Lerp or offload data to let the child compute the lerp while your master performs duties like rendering, reading events etc. Child only requries things like timestamp, startPosition, endPosition, startRotation, endRotation, walkSpeed, turnSpeed. Once computed chlid can add this to the queue and master can process the queue. Rather than lerping you would run a loop updating PivotTo() at each interval. Whats neat about this is you can move a little then end the thread to free up the master to do other things then come back next cycle to continue the loop. This is a bit of client prediction territory- my server updates every 500ms where as client does the smoothening/animation.

When would you use parallel luau?

In my opinion I would first start off with the script then check microprofiler and script performance. If optimization is needed then consider a simple task.spawn and call it a day. If you need to push and squeeze for more performance then thats where you’d find use of actors very beneficial. The odd confusing step is migrating from a standard script to one that uses actors that self replicates to make the worker nodes (like the procedural generation example in their documentation). Master spins up and initializes state to manage workers then replicates itself N time to make workers. It can be inconvenient depending on your code design and requires a lot of decoupling. And if management is not done correctly you will incur a lot of overhead which may not be worth it from the start due to limitations of single writer design.

But with actors im able to do the following:

  • A mob management system which uses non-humanoids, simulates ai, movement and mob states. It can support up to 2000 mobs in a screen with great FPS and no performance drop. Pushing it past 2000 is where hardware limitations begin (a server has limited memory and you can only push this so much. Simialrly your target hardware either PC or phone may not support this many mobs so it isnt worth optimizing further). Though it required a lot of gutting and pre-planning. Id say it was worth the effort.
  • Mob Partitioned mob spawn manager which spawned/despawned in grid chunks (similar to terrain generation) except this system supports 150 mobs per player. Uses a chunking system paired with object pooling
2 Likes

I want to note that multiprocessing (i.e. multiple processes being executed simultaneously on individual processors, often cores) is not the same as multithreading (many threads being controlled by a single processor). Actors utilize multithreading, not multiprocessing.

EDIT: I may be wrong about this, but the documentation uses multithreading verbage, which is not correct if it’s real multiprocessing…

Parallel computing often benefits tasks that can be “chunked” so-to-speak into individual processes that do not conflict with each other, create deadlocks, overwrite data, etc… a good explanation of this has already been given by @Harisaiyo, but it basically boils down to ensuring only one thread at a time can write (reader-writer problem) so that the above issues don’t occur.

Chunked terrain generation is often used as an example because each chunk can be generated independently of its neighbors provided you have set up your generation function in a way that is independent of neighboring data (perlin noise is a good example of this). Other examples include writing to a thread-safe file (such as Python’s logger module), processing chunks of data from a .csv file (each thread can take 10% of the total data, and since the data is independent per-record, the work is theoretically done 10x as fast), etc…

Things that parallelism wouldn’t be good for are things like sequential operations such as writing to thread-unsafe files, updating the same resource without mutex locks/semaphores/etc., or programs where two threads updating a variable at the same time could introduce issues.

To put it short, if you can “chunk” an operation into segments that can be handled individually, you can probably utilize Luau’s parallelism. That doesn’t mean it’s a good idea. Multithreading should typically be avoided unless you a) have extensive knowledge of tackling the issues multithreading introduces and b) have a good reason to use. Otherwise, optimize your serial code first.

3 Likes

I was under the impression that using separate actors is actually using different cores, particularly because actors are placed on different lua VMs. If using multiple actors isn’t real parallel processing, then what’s the difference between that and using the task library?

1 Like

This is definitely helpful information, thanks. What are the ways that you create mutex locks between actors? By using actor:Send/BindMessage? Also, is it possible for a child/worker actor to send information back to the master thread in the same way the master thread talks to the workers using actor:sendMessage?

I could be wrong, but the documentation makes it seem like Luau is creating new threads instead of new processes. I would double-check haha, it mentions nothing of utilizing independent cores so /shrug

I think they are using language regarding multiple threads loosely, because in the Parallel Luau page it says, “They work as units of execution isolation that distribute the load across multiple cores running simultaneously”. But it definitely still is more complicated, because I know that multiple actors can run on the same core, since obviously every machine only has a limited number, and they talk about using up to 64 actors. I would like to learn more about how the lua VMs are managed and how the processes are actually split up onto the server’s and client’s machines, but idk if they’d even have very much published about that since it might be a privacy concern about the engine.

What are the ways that you create mutex locks between actors?

I just went down a rabbit hole about an hour ago regarding this.

It seems like while you can use mutex lock, you cannot use singletons in the actor design pattern. So you have to craft nify ways using event based methods like what I did in that link. I dont believe this is a bug but rather the side effect of partitioning everything in seprarate VMs. The global context in a VM is not shared with other scripts not in a VM so the only way to share data SharedDataTable with timestamps or send messages as you mentioned using actor event based messaging.

Also, is it possible for a child/worker actor to send information back to the master thread in the same way the master thread talks to the workers using actor:sendMessage?

No not really. This was the weird tricky part. Luckily we have SharedTableRegistry and SharedTable to update data and combination of RemoteEvents to notify master that worker is done and to clean it up and put its id back into the process queue. You really want to use SharedTableRegistry anyway sine its considered parallel safe. And to ensure you dont break the single-write multi-reader problem SharedTable has a CloneAndFreeze method. You COULD consider cloning a table, modifying it in a task.spawn but you run the risk of reading/writing bad data. I believe its bad practice.

1 Like

You shouldn’t use SharedTables as they are incredibly slow, you’d get better results with using just SendMessage or even bindable events

2 Likes