Parallel Luau Developer Preview

zeuxcg · December 15, 2020, 1:06am

We’re excited to share a developer preview release of Parallel Luau project! Thanks to @EthicalRobot and @machinamentum for working on this (@zeuxcg also helped but people seem to frown on third person references so he decided he shouldn’t mention it)

Just like Future Is Bright and Avatar Evolution from prior years, this is a developer preview release. It needs more polish before being ready to become a Studio Beta, and definitely more work before becoming a production feature, but we wanted to share the build with y’all before the end of the year. We’re excited to see feedback on what works, what doesn’t, and what could be improved!

For the impatient, you can download the builds here (updated 12/16/2020):

Windows build (179 MB)
macOS build (166 MB)

And you may want to check out this neat parallel ray tracer example: raytracer.rbxl (the script is in StarterGui/ScreenGui/Tracer)

Please note that by downloading these builds you agree to a limited terms use license that is standard for our developer preview builds.

It’s important to realize that this build doesn’t magically make your code run in parallel. We have a plan that involves introducing a new programming model that is friendly to parallelism; it works really well once you get used to it but may require a bit of a transition. We will have tutorials and documentation for this when the feature goes outside of the preview, but for now you’ll have to make do with the rest of the post

Actors

This release adds a new instance type, Actor; scripts that are located under Actor instances in the hierarchy gain capability for parallel execution - by default the code still runs on the single thread though.

Actor objects are necessary as they will become units of execution isolation in the future; e.g. in the initial releases all functionality to modify the instances is going to be locked from the parallel execution, but we’re going to unlock mutation for the Actor subtrees in the future for the scripts that reside under them.

Actor inherits from Model, so you should be able to replace the top-level instance type for your cars / NPCs / other 3D entities with Actor with no changes in the scripts (but see caveat about stateful ModuleScripts later).

It’s important to mention that scripts that are part of the same Actor always execute sequentially with respect to each other. For example an NPC is probably a good candidate to become an Actor.

As a side note, Actors are the units of parallel execution but we recommend to create them based on logical units of work. For example, if you want to generate voxel terrain in parallel, it’s totally reasonable to use 64 Actors or more instead of just 4 even if you’re targeting 4-core systems. This is valuable for scalability of the system and allows us to distribute the work based on the capability of the underlying hardware.

Parallel (“desynchronized”) execution

Each script still runs serially by default, but scripts running inside Actors can switch to run in parallel by using task.desynchronize function. This function is yieldable - it suspends execution of the current coroutine and resumes it at the next parallel execution opportunity.

It’s important to understand that regions of parallel execution run scripts that belong to different actors in parallel, but wait for the parallel sections to finish executing before proceeding with serial execution. In other words, to take advantage of this feature you can’t run a very long computation that takes seconds in parallel to the rest of the simulation - you have to break it into small pieces, but you can run these pieces on multiple cores. Your mental model should be “let me run updates for 1000 NPCs in parallel with each update potentially running on a separate core” instead of “let me run this really slow function that sequentially updates all NPC state in parallel to the rest of the world processing”.

During parallel execution, access to the Instance hierarchy is restricted. You should be able to read most objects of the hierarchy as usual, with the exception of some properties that aren’t safe to read:

GuiBase2d.AbsolutePosition
GuiBase2d.AbsoluteSize
ScrollingFrame.AbsoluteWindowSize
UIGridLayout.AbsoluteCellCount
UIGridLayout.AbsoluteCellSize
UIGridStyleLayout.AbsoluteContentSize

(note: there may be other properties that we haven’t identified yet as “unsafe to read in parallel” and reading some of them may crash Studio; we’re going to refine the list of properties that aren’t safe and expose this as part of the API dump in the future)

You can’t modify any properties at this time. In the future releases we’re going to unlock ability to change properties selectively as long as the instance is part of the same Actor’s hierarchy. To be able to perform mutation on the hierarchy, you must switch back to the serial (“synchronized”) execution; you can do this by calling task.synchronize , which will suspend execution of the current coroutine and resume it at the next serial execution opportunity.

Methods exposed on Instances are safe to call only if they have been explicitly whitelisted (because many of them perform mutation of the hierarchy). In this release we’ve whitelisted the following methods; the method status wrt thread safety will be exposed in the API dump in the future, and this list will expand over time:

Instance.IsA
Instance.FindFirstChild
Instance.FindFirstChildOfClass
Instance.FindFirstChildWhichIsA
Instance.FindFirstAncestor
Instance.FindFirstAncestorOfClass
Instance.FindFirstAncestorWhichIsA
Instance.GetAttribute
Instance.GetAttributes
Instance.GetChildren
Instance.GetDescendants
Instance.GetFullName
Instance.IsDescendantOf
Instance.IsAncestorOf
Part.GetConnectedParts
Part.GetJoints
Part.GetRootPart
Part.GetMass
Part.IsGrounded
CollectionService.GetTagged
CollectionService.GetTag
CollectionService.HasTag
Workspace.FindPartsInRegion3
Workspace.Raycast
Terrain.ReadVoxels

RBXScriptSignal:ConnectParallel

Instead of using task.desynchronize in signals, you can use a new ConnectParallel method. This method will run your code in parallel when that signal is triggered, which is more efficient than using Connect + task.desynchronize .

A common pattern for parallel execution that we expect to see is:

RunService.Heartbeat:ConnectParallel(function ()
    ... -- some parallel code that computes a state update
    task.synchronize()
    ... -- some serial code that changes the state of instances
end)

Initially you should expect to have to put all of the code that changes Instance properties in the serial portion of the update, with future releases allowing you to move more code from the serial to parallel portion.

ModuleScripts

Scripts that run in the same Actor are running in the same Luau VM, but scripts that run in different actors may run in different VMs. You can’t control the allocation of Actors to VMs or the total number of VMs - it depends on the number of cores the processor has and some other internal parameters.

When you require a ModuleScript from a Script inside the Actor, the script is going to get loaded (& cached) in every VM it’s needed in. This means that if your ModuleScript has mutable state, this state will not be global to your game anymore - it will be global to a VM, and there may be multiple VMs at play.

We encourage use of ModuleScripts that don’t contain global state. In the future we’re going to provide a shared storage that will be thread-safe so that games that use parallel execution can use it to store truly global state, as well as ways to communicate between scripts safely using messages, but for now you should be aware of this gotcha.

Debugger

… won’t work on scripts inside Actors in this release. This is why this is a developer preview Some other parts of functionality may be disabled or unstable - please report issues with this release via DevForum.

We’re eager to hear your feedback after you’ve tried to use this a bit! Please note that this may not match your existing model of how threading could work - this is not WebWorkers, and this is not “oh I know let me just create a thread”. Trust us, there are deep and profound reasons for why these models didn’t work for us, they are documented in an internal 26-page technical specification for this feature that we’re slowly building the implementation of

This being a developer preview you should expect some features to be broken, some amount of stability issues, and some features to just be lacking. For the beta release we’re planning to address the stability issues, add a way to communicate data between scripts that survives the VM separation and is thread-safe, expose more thread-safe methods and generally improve on the feedback that we get from this.

zeuxcg · December 15, 2020, 1:06am

zeuxcg · December 15, 2020, 1:30am

… for those of you that have too much CPU power you can switch N in raytracer.rbxl script from 50 to 150 and enjoy a crisper image. Yes, the fact that we aren’t saturating all 12 [logical] cores here is unacceptable and we’ll work on ourselves, but part of the frame here is physics simulation and rendering and some of these systems aren’t internally threaded to the max yet

zeuxcg · December 15, 2020, 2:00am

This topic was automatically opened after 53 minutes.

Rocky28447 · December 15, 2020, 2:01am

So does this mean that state managers like Rodux are a no-go in parallel luau for now?

Quenty · December 15, 2020, 2:02am

This looks super cool. I really like the way the actors and task systems function. The VM solution is especially nice for avoiding blowing up the memory consumption of this system, while still allowing for modules.

Is there documentation on the execution timing that parallel threads can execute on? I’m wondering what places it’s safe to push to parallel, and avoid a frame of lag. Specifically, are these acceptable entry points for input:

Stepped
Heartbeat
RemoteEvent/RemoteFunction invocation stacks

Or does a parallel execution occur after every possible lua execution block? This would be especially useful, but could create lots of reentrance. For specifics, does this deadlock?

Additionally, I"m about to test this, but one question I had is if your code is hosted underneath the main thread, will task calls still error? In this case, the issue is basically writing thread-agnostic code that I just want to push into the parallel execution code. I’m assuming this is not the case.

Looking forward to more of this–looks like maybe this can scale Roblox across multiple servers, to create some truly massive experiences. I’m excited to see what this will bring!

Elttob · December 15, 2020, 2:02am

This is so cool! (also, was that voxel terrain gen example aimed at me?)

Question tho: can you create instances in parallel if they’re outside the data model? It’d be cool if we could parallelise creating large numbers of instances (for science, of course!)

Dekkonot · December 15, 2020, 2:11am

As predicted, I was 300% not using this right. Suppose that’s what I get for messing with features that don’t have any documentation!

I’m sort of struggling to come up with any projects that might actually benefit from being parallelized, but I think my serializer might be a good contender? A bunch of string manipulation in a loop seems like a good candidate anyway.

Also, finally, Roblox can run Doom at 60fps… What we’ve always wanted.

Quenty · December 15, 2020, 2:14am

Ok, so it looks like the desynchronization time is something like 1/60

local startTime = os.clock()
while true do
	task.desynchronize()
	
	print("time to desync", os.clock() - startTime)
	startTime = os.clock()
	task.synchronize()
	
	print("time to resync", os.clock() - startTime)
end

  18:11:06.350  time to desync 0.016767400025856  -  Client  -  LocalScript:5
  18:11:06.350  time to resync 0.00017469999147579  -  Client  -  LocalScript:9
  18:11:06.369  time to desync 0.018455900019035  -  Client  -  LocalScript:5
  18:11:06.369  time to resync 9.8699994850904e-05  -  Client  -  LocalScript:9
  18:11:06.386  time to desync 0.016901900002267  -  Client  -  LocalScript:5
  18:11:06.386  time to resync 3.399996785447e-05  -  Client  -  LocalScript:9
  18:11:06.406  time to desync 0.020426300005056  -  Client  -  LocalScript:5
...

This is especially interesting, because it means that :ConnectParallel() is actually really important to avoid frame-delays. I’m assuming if we connect to inputs and other properties in parallel, maybe we avoid the frame-delay.

I think the way this is working, is it seems if you don’t have an actor, it still executes the stuff in parallel, but sort of parallel-under-a-global actor.

I’m going to see if BindableEvents/Functions operate as a safe interop boundary for this sort of thing. My guess is “yes”! I think if this is the case, we’ll start having a lot more interesting interop/source-of-truth designs coming up.

Super exciting stuff.

Edit: Looks like maybe ContextActionService is about to have a wrapper, there’s no way to connect something in parallel on it.

That being said, ContextActionService is pretty global, so I don’t think you can avoid syncronizing your thread from a global perspective.

drager980 · December 15, 2020, 2:30am

This crosses VMs, right
Does the operation critically improve over a task where you might just mindlessly use a BindableEvent/Function and make a thread or should we be more cautious

Autterfly · December 15, 2020, 2:32am

Adding onto this, I would like to know what the expected design patterns will be for exchanging data from normal code to parallel code and between different parallel actors.

Since parallel code will require synchronizing before writing to the data model, how would this affect behavior such as BindableEvents being able to pass and receive functions created by different actors?

Quenty · December 15, 2020, 2:33am

It looks like right now connecting to user input results in parallel results in 2 frames of input lag unless I’m misinterpreting these results. So responding to userinput will probably always be delayed by a frame if you need to do heavy computation with synchronious output.

:Connect() event occurs normally
RenderStepped occurs
:Connect(), task.desynchronize() executes at the same time as :ConnectParallel(), but ConnectParallel() prevents a second closure from running synchronously, so that’s slightly better.
RenderStepped occurs
Synchronization of previously desynchronized code runs now.
RenderStepped occurs.

Is there a chance we can get this reduced down to 1 frames of input lag? I’m not realy sure if I’m doing something wrong, or if this is intended.

While executing certain things (like gun raycasting and whatnot), are acceptable for a frame of lateness, it seems that any expensive interactions I’d want should not be delayed by effectively 33ms, which is way too much input lag in my opinion.

  18:26:13.517  Running synchronious (Parallel) 1607970372.6385  -  Client  -  LocalScript:12
  18:26:13.518  RenderStepped occured 1607970372.6397  -  Client  -  LocalScript:33
  18:26:13.522  Running parallel (task.desynchronize()) 1607970372.6432  -  Client  -  LocalScript:27
  18:26:13.522  Running parallel (ConnectParallel)  1607970372.6434  -  Client  -  LocalScript:7
  18:26:13.537  RenderStepped occured 1607970372.6587  -  Client  -  LocalScript:33
  18:26:13.539  Running synchronious (ConnectParallel, task.synchronize()) 1607970372.6602  -  Client  -  LocalScript:20
  18:26:13.551  RenderStepped occured 1607970372.6727  -  Client  -  LocalScript:33

gist.github.com

https://gist.github.com/Quenty/0d777194939e2380f89ed109cbb7752e

TestParallelLua.lua

local UserInputService = game:GetService("UserInputService")
local RunService = game:GetService("RunService")

local printRender = false

UserInputService.InputBegan:ConnectParallel(function(inputObject)
	print("Running parallel (ConnectParallel) ", tick())
	printRender = true
end)

This file has been truncated. show original

Ukendio · December 15, 2020, 2:33am

This seems so amazing! Are there any use-cases where you would discourage the implementation of using Actors? Like in what instances could concurrency issues occur with this feature?

zeuxcg · December 15, 2020, 2:40am

I believe right now we only have a single point at which parallel execution runs, but we’re likely going to have multiple points during the frame to make sure that you can complete all of the important fork/join work during one frame.

zeuxcg · December 15, 2020, 2:47am

Yeah I think you’ll get separate data stores if you use Rodux? Which is probably not what you want

This feature works best for cases where you either have a dedicated data parallel system you’d like to run, or when you model independent entities that interact with the world and would like to simulate them in parallel; it’s probably not best for things like UI code.

zeuxcg · December 15, 2020, 2:49am

We’ll have a way to exchange data between actors in later releases; data can be exchanged between parallel and serial sections of the code by just sharing it, the raytracer example shows this well.

BindableEvent.Fire and BindableFunction.Invoke right now aren’t white listed for parallel execution; Fire should work in the future though.

zeuxcg · December 15, 2020, 2:50am

Right now the answer is “no”… I think? You definitely can’t change the fields even if you can create the instance, yet.

We do plan to fix this; we thought originally that the first beta release would have this, but unfortunately there are some complications with allowing this in a way where you can’t crash the engine with this mechanism… so in this preview you can’t change instances at all even if you just created them.

Dekkonot · December 15, 2020, 3:01am

I think I might be confused – if I have a loop that iterates through a bunch of Instances, and gets a property from them, then does a bunch of processing based on that property, when would be the appropriate time to desynchronize it? My inclination would be to do it directly after I access the property, but that seems to just outright slow down execution, which is unexpected – I know it’s easy to do that with multi-threading but I don’t think switching desynchronizing code that creates a bunch of strings should do that.

plasma_node · December 15, 2020, 3:03am

This is awesome!

So I read that usage may not fully saturate all cores, with that in mind added basic shadowmap lighting to Zeux’s demo

Obviously, the raytracer takes a decent performance hit to compute lighting, but I am sitting around 50-65% usage on all cores. Not sure if this is my programming or the parallel system.

I haven’t finished reading this thread so I am not sure which questions have already been answered regarding proper ussage but I assume that we’re going to get some more documentation in the future?

EDIT: Just discovered a crash lol. Apparently putting wait(); inside :ConnectParallel makes studio angry.

OutlookG · December 15, 2020, 3:07am

So if I am reading this correctly, we simply but all the scripts into a new “Actor” instance and roblox will automatically put it onto a thread/core? Im somewhat confused lol