Parallel Luau Developer Preview

I am super excited by this update, can’t wait to start using it.

Alright - finally been able to play around with this thing. I’m hyped!

I managed to get a simple Lua-based API working for a job-based paradigm:

The idea is pretty simple: you can pass in pure functions with a set of arguments, and they’ll be run in parallel as fast as possible. The system should be easily expandable to let you return values from those functions too, if you’re looking to use them to run computations.

I’ll most likely end up using a refined version of this in Blox to run terrain generation and greedy meshing calculations :slightly_smiling_face:

18 Likes

Thanks for the report! As mentioned previously, any engine crash is a bug even if you’re using multithreading. You were writing to a property that isn’t supposed to be writeable but our mechanism for preventing this had a typo in the check that guards against incorrect usage of API. This will be fixed in the next build later this week.

4 Likes

1000 actors would make sense here. In general we see two possible patterns for use of this system:

  • Using Actors for logical entities in the game, and scripts inside these actors to control the behavior (of course they can share the logic either through the package mechanism or by calling out to a shared modulescript).
  • Using Actors to create a job distribution system, and simply call out to some scripts that process large sets of objects.

Today both could work. In the future we’re going to start using Actor hierarchy to selectively enable writes - for example, you’d be able to move the parts of the NPC from the parallel section but only if the script and parts are both part of the same Actor. So when you already have a notion of an entity we’d recommend creating an Actor per entity.

11 Likes

If multithreading allows the client to use more of their CPU and the thread count depends on the CPU, what about the server? How many threads will the server be able to effectively use?
I know we shouldn’t aim to use a certain number of cores but it would be good to know the performance difference server-side.

5 Likes

This is somewhat separate from this release, but we plan to have official documentation on the core count for the server, which will grow based on the # of players the server supports and the historical CPU usage. This will affect both internal engine systems (a lot of them became multithreaded over the last year or so), and the parallel script execution. Except more announcements on this subject next year, for now this is all the information we have.

19 Likes

Awesome! Looking forward to this; parallel processing would be exceptional for neural network training.

3 Likes

Thanks for the feedback everyone! We’ve uploaded a new build (links in the post updated) with fixes based on the initial reports and internal testing.

Changes:

  • Property writes are now correctly disallowed in parallel code
  • Improve performance of function calls and property accesses in parallel code
  • Whenever a property or function can’t be used, the error message now includes the name of the property/function
  • debug.profilebegin/profileend are currently not safe to use from parallel code; this will change in the future, but they error in parallel code for now
  • Terrain.ReadVoxels is now white-listed for use from parallel code
  • Fix rare race conditions in parallel execution when the threads didn’t run for very long
15 Likes

What’s the intended way to install these builds?

1 Like

It’s low priority, I know, so I have no expectations but can the various (safe) CollectionService methods be whitelisted in the next build? Obviously AddTag and RemoveTag aren’t safe to use but the other ones seem like they’d be safe, at least from an outside perspective.

2 Likes

There’s no intended way to install these, but my recomendation would be to store them inside a folder called “BetaBuilds” located at %LOCALAPPDATA%/Roblox/BetaBuilds.

Of course, it’s up to you where you decide to store beta builds.

2 Likes

Did somebody say parallel voxel terrain generation? :stuck_out_tongue:

Just unzip it and run it wherever - doesn’t matter :slightly_smiling_face:

29 Likes

GetTags, GetTagged and HasTag will be unlocked in the next build, whenever that happens.

6 Likes

Yeah. I’m assuming so.

6 Likes

Sometimes during renders it’ll just randomly stop all threads and one of my cores get maxed out and the other ones go idle. Here’s a benchmark with me running 2 threads. Shortly after this it crashed

I also managed to get these weird errors lol

1 Like

Is there a downside to using potentially thousands of actors? In your raytracer demo you used 1 actor per row, why not use 1 actor per pixel?

6 Likes

Likely because if the task is too small, the overhead from setting up and entering into parallel execution for each pixel would be too large for it to give you a meaningful speedup compared to doing it in parallel for each row. You should be able to benchmark this and see what happens.

7 Likes

Is this a way to create parallel “units of work” without spamming Actor instances? The system is cool as-is, but I somehow dislike the idea of creating a few dozen Actors and script instances when trying to dispatch a parallelized task. Stuff like the scanner camera or the voxel generation example you mentioned would be a lot cleaner if they could be done in a single script that just dispatches several parallel threads instead.

5 Likes

The script has to be inside an actor to use that functionality

1 Like

Yes - that’s correct. Transitioning between actors/coroutines takes some time; we’re likely going to improve this in the future, but in any system there’s some dispatch overhead. When I was writing this code I just did the first thing that was going to work reasonably well :slight_smile:

You still need to use Actors to gain parallel execution. Right now there’s no way to run something in parallel without using Actors, and this is for two reasons:

  1. To allow instance modification in the future, we need to be able to scope code that runs in parallel to a specific hierarchy. Without this code that runs “outside” of that hierarchy would only be able to ever read the state of the world.

  2. Somewhat more crucially, we can’t run a function outside of a VM it’s created in, and we can’t run multiple VMs in parallel (this is due to many high performance systems inside the VM such as the garbage collector, not liking concurrent execution). For this reason the only interface we’d be able to provide as a first class would be pretty awkward - e.g. we’ve discussed APIs like task.run but they’d need to take a closure without upvalues, which means you wouldn’t be able to refer to locals from the outer scope, and that significantly limits the usability and makes the feature more surprising.

This may change in the future. For now we hope that the community will experiment with this and come up with different nicer-to-use libraries for ad-hoc parallelism without us having to bake anything into the language, and use Actors + Scripts for entity-based parallelism where the API is pretty intuitive and natural.

8 Likes