Parallel Luau Developer Preview

Mutlithreading is an exciting feature for advanced software engineers. It will certainly raise the bar for what users expect out of Roblox games: the games made by the best developers will be reaching new heights in terms of functionality and performance.

This brings up an interesting thought that’s been swirling around in my mind for a while after reading this topic.

Multithreading is notoriously tricky. Even experienced software engineers make mistakes when it comes to thread safety. With that in mind, I think about how many developers on Roblox seem to struggle to understand even basic things such as Roblox’s networking relationship between the server and clients, and how even fewer understand its implications adequately enough to understand how to engineer their servers to securely handle client requests.

Do you see multithreading becoming an error-phone challenge that new developers will have to overcome and struggle with regularly before deploying reliable games? Do you see average developers often making mistakes with multithreading that cause players’ clients to crash, thereby potentially risking the overall perception of the reliability and quality of Roblox games?

6 Likes

This is probably why this feature was made the way it is. It could have been more complex. I can already see youtube tutorials on how to “mUltItHreAd tO maKe yOuR gAmE 10x FaSter” that just make it run worse.

2 Likes

This is an excellent question! This is part of why the API is what it is.

A few notes:

  1. If you can make the Roblox engine crash when using this feature, it’s ALWAYS an engine bug.
  2. We’re trying to establish rules for access from parallel sections that are as non-racy as feasible - e.g. the chance that a function call will randomly throw errors under parallel execution is small.
  3. There are going to be cases where the Luau code devs write is racy - that is, scripts see different execution states that aren’t trivial to test for ahead of time. In these cases scripts should error, and you see some of that today with network desyncs - we’re going to look into tools to make sure you can test the game with simulated # of cores to help there.

But basically, if you can crash the engine because you use some API incorrectly from parallel section, it’s always a bug on our side. When we go live with this feature we’ll have a lot of automated test infrastructure that we didn’t have time to put in place yet, that ensures that there are few bugs like that in the engine.

18 Likes

Ok. My review overall is that this API surface is nice (although we’ll see how I scale some other tasks/services for localized actors. Would love to see the design document. :stuck_out_tongue:

I tried to get BindableFunctions to work properly, and it does seem to work, but there’s a lot of overhead.


Small error: it seems like IsDescendantOf() errors despite being whitelisted.

-- LocalScript
print(script:IsDescendantOf("Actor"))

--[[

  18:52:40.897  Unable to cast value to Object  -  Client  -  LocalScript:1
  18:52:40.897  Stack Begin  -  Studio
  18:52:40.897  Script 'Players.Quenty.PlayerGui.ScreenGui.LocalScript', Line 1  -  Studio  -  LocalScript:1
  18:52:40.897  Stack End  -  Studio
--]]

So the big thing I just verified was whether or not you can use bindable functions to invoke control, and then get it back. The answer is yes, you can. but, it’s expensive. In most cases, it cost 5-6 frames, but sometimesthat number spikes up to 12 frames. So this sort of individual task dispatching is quite expensive (200 ms). With the desynchronize turned off, it costs more like 16 ms to send 100 requests (1/60). Of course, we’re paying a very large overhead for serialization, but apparently the parallel delay costs us another.

I think maybe this will be better with more parallel entry points, this number will go down.

Here’s the code I used to test this:

Happy case (seems to avoid delaying at all, which is good): This does not happen very often.

  19:09:10.527  Executing on Actor_10 with 96 after 0.007881 secs  -  Server  -  TaskFarm:79
  19:09:10.527  Executing on Actor_7 with 82 after 0.008289 secs  -  Server  -  TaskFarm:79
  19:09:10.527  Executing on Actor_15 with 81 after 0.008205 secs  -  Server  -  TaskFarm:79
  19:09:10.527  Executing on Actor_23 with 89 after 0.008577 secs  -  Server  -  TaskFarm:79
  19:09:10.527  Executing on Actor_7 with 94 after 0.008716 secs  -  Server  -  TaskFarm:79
  19:09:10.620  Done executing all tasks 0.10087809996912  -  Server  -  TaskFarm:75

In general, resolving back to the final result is the problem.

If you’re spinning up your task-farm adhoc you can see it can be a lot worse. You can see it takes an additional 5-6 frames in certain cases after the last execution point to finish everything:

  19:04:46.981  Executing on Actor_5 with 73 after 0.081986 secs  -  Server  -  TaskFarm:79
  19:04:46.981  Executing on Actor_21 with 77 after 0.082116 secs  -  Server  -  TaskFarm:79
  19:04:46.981  Executing on Actor_5 with 78 after 0.082221 secs  -  Server  -  TaskFarm:79
  19:04:46.981  Executing on Actor_21 with 99 after 0.082325 secs  -  Server  -  TaskFarm:79
  19:04:47.087  Done executing all tasks 0.18825740000466  -  Server  -  TaskFarm:75
  19:08:06.658  Executing on Actor_1 with 55 after 0.100157 secs  -  Server  -  TaskFarm:79
  19:08:06.658  Executing on Actor_12 with 49 after 0.100198 secs  -  Server  -  TaskFarm:79
  19:08:06.658  Executing on Actor_1 with 100 after 0.100253 secs  -  Server  -  TaskFarm:79
  19:08:06.658  Executing on Actor_12 with 53 after 0.100343 secs  -  Server  -  TaskFarm:79
  19:08:06.658  Executing on Actor_12 with 54 after 0.100495 secs  -  Server  -  TaskFarm:79
  19:08:06.761  Done executing all tasks 0.20351199997822  -  Server  -  TaskFarm:75
2 Likes

IsDescendantOf takes an Instance (… our error message here could be improved)

10 Likes

Oh whoops. :FindFirstAncestorWhichIsA(“Actor”) is what I was looking for… -_-

Overall, this API is really nice (except I haven’t found a good way to pragmatically separate out my execution contexts yet.

I think this may force me to reconsider how I’m programming things again. I’m already shifting closer to attached-scripts-per-actor sort of thing, but it’ll take some more interesting code to let execution occur close to the instances. I think code written this way may scale much further in the future, which could be neat.

I really see a possible world in which Roblox’s servers are treated as one, and the world is infinite. Could be very cool.

Overall, I’m really glad this implementation avoids all of the pitfalls that a normal parallel implementation would have. It was either something like this though, or some sort of rust semantics. However, I think the rust semantics don’t really work here, because Lua is such a loose typed language.

12 Likes

Wow, I’m incredibly happy to see how user-friendly this is. I really like the design & that the engine will distribute the load as it sees fit. I think every game I have has many use-cases for this. Thanks for all the work on this!

12 Likes

I’m working on a project right now that requires semi-heavy mathematical computations every frame. I hope this is released onto the public build sometime in Q2 of 2021. This will definitely push the bar much higher for the quality of games on Roblox!

5 Likes

Definitely exciting and would love to play around with it, too bad the preview version of studio crashes any time I try run multithreaded code

2 Likes

I tried to make a quick perlin noise blocky terrain generator using this and studio just crashes instantly with no warning when I try to use it with this. There’s a good chance I’m simply just not using it right and there’s also a good chance there’s an actual bug at hand, so here’s the entire place file… Hopefully this will be of some help :slightly_smiling_face:

p.s. This is really cool, I’m excited for this to be eventually fully released!

PARALLELLUAU_mapgenerator.rbxl (22.3 KB)

1 Like

I’m not exactly sure what the architecture of a game server looks like, is this something that will benefit server scripts, or or is it more of just a clientside thing? Do game servers right now (or in the future) have multiple threads we can take advantage of with this?

6 Likes

This looks very interesting indeed. Would it be possible to take advantage of this in smooth terrain generation at all, or do the API restrictions make it unsafe?

A few years ago I messed with making an infinite chunk version of the terrain generator found in Roblox’s built-in terrain tools, but it was far too slow and memory heavy to make use of in a production game. Granted, I probably used some poor multi-threading practices at the time.

10 Likes

Second this.

I sincerely hope that this becomes a core part of the terrain editor tools, especially generation. It’s not fun waiting ages for huge terrain to generate.

7 Likes

Here’s what you can do today:

  • Run all the complex terrain generation logic, including creating the necessary 3D arrays, in the parallel section
  • Call WriteVoxels after task.synchronize. It’s relatively cheap.

We plan to whitelist both ReadVoxels and WriteVoxels for parallel execution in the future, but WriteVoxels needs some internal synchronization to make it safe so it’s not a completely trivial change.

7 Likes


it doesnt open

4 Likes

I’m not a mac user but on windows usually when you click the little question mark or show more icon it will give you an option to override the warning.

1 Like

That just links me to a support page about malware.

1 Like

This might be off topic for the thread so it could get deleted but I found this

Not a security expert but from my understanding this is a dev build so it doesn’t have a security certificate

2 Likes

Converted your raytracing example into a simple path tracer. Runs 4 times faster than my old single threaded path tracer, 10 times faster if you compare to the pre-luau optimizations. These renders took about a minute each.



Rendering at 1 sample per pixel, it runs at around 3-4 fps at 200x200 resolution

Edit: Here’s a more difficult render with mostly indirect light and higher resolution, took about 2 hours to render this.

39 Likes

I want to first say thank you all for this update/preview!! I been working making a navigation mesh generator, and I will finally be able to implement parallel processing and make it far more performant!

15 Likes