Parallel Luau Developer Preview

plasma_node · December 15, 2020, 7:57am

@zeuxcg Studio simulation isnt running. Is this a memory leak? Lol.

Same place you posted for raytracer except I added some lighting effects.

I also found an unrelated crash.

dogwarrior24 · December 15, 2020, 8:35am

Will there eventually be a way to separate code entirely from the main thread, so as to not slow single threaded workloads being performed there?

Also, is pathfinding going to be enabled for multi-threading?

VortexColor · December 15, 2020, 11:14am

Yes finally. Been waiting for multithreading a long time ago. Hope it won’t be too different of a workflow.

metatablecatmaid · December 15, 2020, 11:23am

I’m a bit confused about this, does this mean if I’m using a 4-core system, 16 actors are spread between each core if i create 64 actors, or?

buildthomas · December 15, 2020, 11:35am

He’s implying you should not target specific core counts or hardware, just create however many parallel tasks you need logically. The engine takes care of spreading it over cores (or not) depending on internal heuristics. Hardware that only has room to run single-threaded Lua will run your 64 actors sequentially and better hardware might run 1 actor per core, or anything in between.

bobbybob2131 · December 15, 2020, 11:37am

This is great news and will raise the bar significantly, my only worry is that this seems very complex. Will games have to move to this new system with Actors or is that only if they want multi-threading?

Maastrophy · December 15, 2020, 11:46am

The general rule is that a new feature is optional unless explicitly specified otherwise. This will be optional.

Yuuwa0519 · December 15, 2020, 12:44pm

Ah yes time to run into many race conditions whilst implementing this into my game cause i have absolute no experience with parallel execution XD However I am totally ready to tackle this feature and get used to it

Few questions I have have:
Since actor instances are required per process logical needs, does that mean single script architectures such as AeroGameFramework will not be able to utilize the full functionality?

My other question is not related to technical aspects but I am wondering why the function is named “task.desynchronize” instead of “task.desync”? Although auto complete exists, it still feels like a long text to type it out.

Xan_TheDragon · December 15, 2020, 1:08pm

I’m glad the Raycast method is usable! A number of people have been ~~pestering~~ politely asking me to implement parallel Luau into FastCast to allow more projectiles to simulate at once, and I plan on doing just that. I anticipate great performance gains from being able to cluster various ongoing raycasts onto different cores so that only a fraction of the work is done on any single core.

plasma_node · December 15, 2020, 1:45pm

Does this apply to very very large amounts of tasks?

I am trying to determine how I will work with this system.

Let’s say I want to simulate over 1,000 NPCs and constantly update them. Also let’s ignore other methods to optimize them such as only doing a certain amount each second/frame. In this case we’d basically be running 1,000 tasks in parallel at the same time.

Does this mean it would be better to create 1,000 actors or tasks, instead of using say 10 actors each doing 100 NPCs?

It seems the only potential benefit is someone with a high number of cores.

If that is the case then I think I’d rather add some abstraction and go with the former because I could also be potentially running other intensive tasks that I’d like to prioritize.

Isocortex · December 15, 2020, 1:49pm

Super excited by this one, nice job!

I appreciate you’re still actively working on this but do you have a timeline with regard to release? Have quite a few cool ideas I’d love to throw into some projects.

Also, assuming the internal tech spec doesn’t include sensitive / trade info, any chance that’ll be released to the community? I would love to read it to get a deeper understanding of RBLX.

imalex4 · December 15, 2020, 3:04pm

In regards to module scripts, I hope that I will be able to run some functions in parallel from inside the module script while keeping the rest of the module script global. (In this hypothetical case, the script requiring the module script would not be inside an Actor or otherwise used in parallel.) Though, perhaps there could be an issue with module variables which are accessed from the parallel code and accessed from the serial code (or maybe not, I’m thinking with the general C mutex mindset where you need to lock things down).

Throddy · December 15, 2020, 3:06pm

I am super excited by this update, can’t wait to start using it.

Elttob · December 15, 2020, 3:56pm

Alright - finally been able to play around with this thing. I’m hyped!

I managed to get a simple Lua-based API working for a job-based paradigm:

The idea is pretty simple: you can pass in pure functions with a set of arguments, and they’ll be run in parallel as fast as possible. The system should be easily expandable to let you return values from those functions too, if you’re looking to use them to run computations.

I’ll most likely end up using a refined version of this in Blox to run terrain generation and greedy meshing calculations

zeuxcg · December 15, 2020, 4:46pm

Thanks for the report! As mentioned previously, any engine crash is a bug even if you’re using multithreading. You were writing to a property that isn’t supposed to be writeable but our mechanism for preventing this had a typo in the check that guards against incorrect usage of API. This will be fixed in the next build later this week.

zeuxcg · December 15, 2020, 4:51pm

1000 actors would make sense here. In general we see two possible patterns for use of this system:

Using Actors for logical entities in the game, and scripts inside these actors to control the behavior (of course they can share the logic either through the package mechanism or by calling out to a shared modulescript).
Using Actors to create a job distribution system, and simply call out to some scripts that process large sets of objects.

Today both could work. In the future we’re going to start using Actor hierarchy to selectively enable writes - for example, you’d be able to move the parts of the NPC from the parallel section but only if the script and parts are both part of the same Actor. So when you already have a notion of an entity we’d recommend creating an Actor per entity.

Kironte · December 15, 2020, 5:40pm

If multithreading allows the client to use more of their CPU and the thread count depends on the CPU, what about the server? How many threads will the server be able to effectively use?
I know we shouldn’t aim to use a certain number of cores but it would be good to know the performance difference server-side.

zeuxcg · December 15, 2020, 5:43pm

This is somewhat separate from this release, but we plan to have official documentation on the core count for the server, which will grow based on the # of players the server supports and the historical CPU usage. This will affect both internal engine systems (a lot of them became multithreaded over the last year or so), and the parallel script execution. Except more announcements on this subject next year, for now this is all the information we have.

Kironte · December 15, 2020, 5:44pm

Awesome! Looking forward to this; parallel processing would be exceptional for neural network training.

zeuxcg · December 15, 2020, 6:17pm

Thanks for the feedback everyone! We’ve uploaded a new build (links in the post updated) with fixes based on the initial reports and internal testing.

Changes:

Property writes are now correctly disallowed in parallel code
Improve performance of function calls and property accesses in parallel code
Whenever a property or function can’t be used, the error message now includes the name of the property/function
debug.profilebegin/profileend are currently not safe to use from parallel code; this will change in the future, but they error in parallel code for now
Terrain.ReadVoxels is now white-listed for use from parallel code
Fix rare race conditions in parallel execution when the threads didn’t run for very long