Parallel Lua Beta

EthicalRobot · March 11, 2021, 9:50pm

After a flurry of bug fixes and stability improvements, we are happy to announce the Parallel Lua project has graduated from a developer preview release to a full fledged Studio Beta! Special thanks to all the developers who kicked the tires on the developer preview and provided feedback and demos.

Important Note: For now, Parallel Lua features will only work within Roblox Studio! If you publish a place that has Parallel Lua API calls, things will be very broken.

The Parallel Lua APIs can be enabled via the Roblox Studio Beta Features Menu

The Basics of Parallelism

In order for your script to run in parallel with another script two things must be true:

The scripts must be parented under different Actor instances
The thread within a script must be desynchronized either via task.desynchronize() or Signal::ConnectParallel

What are Actors?

Actors are a new Instance type that lets you chop up your place into logical chunks. These chunks mark parts of the data model tree as independent. By promising that you will not access anything outside that actor in a desynchronized thread, threads from other actors can run at the same time.

Actor inherits from Model, so you should be able to replace the top-level instance type for your cars / NPCs / other 3D entities with Actor with no changes in the scripts (but see caveat about stateful ModuleScripts later).

It’s important to mention that scripts that are part of the same Actor always execute sequentially with respect to each other. For example an NPC is probably a good candidate to become an Actor. Any parallel-enabled behavior scripts underneath a single NPC Actor will run serially on a single thread, but if you have multiple NPC Actors, each of them will run in parallel on its own thread.

As a side note, Actors are the units of parallel execution but we recommend creating them based on logical units of work. The Roblox engine will distribute them appropriately among threads and cores. For example, if you want to generate voxel terrain in parallel, it’s totally reasonable to use 64 Actors or more instead of just 4 even if you’re targeting 4-core systems. This is valuable for scalability of the system and allows us to distribute the work based on the capability of the underlying hardware.

Parallel (“desynchronized”) execution

Each script still runs serially by default, but scripts running inside Actors can switch to run in parallel by using task.desynchronize function. This function is yieldable - it suspends execution of the current coroutine and resumes it at the next parallel execution opportunity.

It’s important to understand that regions of parallel execution (the scripts on various Actors) run in parallel, but the engine waits for all parallel sections to finish before proceeding with serial execution. In other words, to take advantage of this feature you can’t run a very long computation that takes seconds in parallel to the rest of the simulation - you have to break it into small pieces, but you can run these pieces on multiple cores. Your mental model should be “let me run updates for 1000 NPCs in parallel with each update potentially running on a separate core” instead of “let me run this really slow function that sequentially updates all NPC states in parallel to the rest of the world processing”.

During parallel execution, access to the Instance hierarchy is restricted. You should be able to read most objects of the hierarchy as usual, with the exception of some properties that aren’t safe to read:

GuiBase2d.AbsolutePosition
GuiBase2d.AbsoluteSize
ScrollingFrame.AbsoluteWindowSize
UIGridLayout.AbsoluteCellCount
UIGridLayout.AbsoluteCellSize
UIGridStyleLayout.AbsoluteContentSize

Currently you can’t modify any properties during the parallel phase. In the future releases we’re going to unlock the ability to modify the properties of the Actor’s instance hierarchy in a desynchronized (parallel) script. For now, to be able to modify any properties or state, you must switch back to the serial (“synchronized”) execution; you can do this by calling task.synchronize , which will suspend execution of the current coroutine and resume it at the next serial execution opportunity.

Methods exposed on Instances are safe to call only if they have been explicitly whitelisted (because many of them perform mutation of the hierarchy). So far we’ve whitelisted the following methods; the method status with regard to thread safety is exposed in the API dump, however that info has not yet been added to the online API reference:

Instance.IsA
Instance.FindFirstChild
Instance.FindFirstChildOfClass
Instance.FindFirstChildWhichIsA
Instance.FindFirstAncestor
Instance.FindFirstAncestorOfClass
Instance.FindFirstAncestorWhichIsA
Instance.GetAttribute
Instance.GetAttributes
Instance.GetChildren
Instance.GetDescendants
Instance.GetFullName
Instance.IsDescendantOf
Instance.IsAncestorOf
Part.GetConnectedParts
Part.GetJoints
Part.GetRootPart
Part.GetMass
Part.IsGrounded
CollectionService.GetTagged
CollectionService.GetTag
CollectionService.HasTag
Workspace.FindPartsInRegion3
Workspace.Raycast
Terrain.ReadVoxels

RBXScriptSignal:ConnectParallel

Instead of using task.desynchronize in signals, you can alternatively use a new ConnectParallel method. This method will run your code in parallel when that signal is triggered, which is more efficient than using Connect + task.desynchronize .

A common pattern for parallel execution that we expect to see is:

RunService.Heartbeat:ConnectParallel(function ()
    ... -- some parallel code that computes a state update
    task.synchronize()
    ... -- some serial code that changes the state of instances
end)

For the time being you will need to put all of the code that changes Instance properties in the serial portion of the update, while future releases will allow you to move more code from the serial to the parallel portion.

ModuleScripts

Scripts that run in the same Actor are running in the same Luau VM, but scripts that run in different actors may run in different VMs. You can’t control the allocation of Actors to VMs or the total number of VMs - it depends on the number of cores the processor has and some other internal parameters.

When you require a ModuleScript from a Script inside the Actor, the script is going to get loaded (& cached) in every VM it’s needed in. This means that if your ModuleScript has mutable state, this state will not be global to your game anymore - it will be global to a VM, and there may be multiple VMs at play.

We encourage use of ModuleScripts that don’t contain global state. In the future we’re going to provide a shared storage that will be thread-safe so that games that use parallel execution can use it to store truly global state, as well as ways to communicate between scripts safely using messages, but for now you should be aware of this gotcha.

Debugger

… doesn’t work on scripts inside Actors in this release. This is why this is a beta

Kairomatic · March 11, 2021, 10:20pm

NarcisPlays · March 11, 2021, 10:27pm

(post deleted by author)

nooneisback · March 11, 2021, 10:30pm

Been waiting for this for a long time. I managed to greatly increase execution speed of my custom pathfinding with the previous preview, but the stability was questionable. Can’t wait to see how it will perform now.

C_Corpze · March 11, 2021, 10:31pm

YES, I waited SO LONG for this, I have so many funny ideas for optimization and speed now.

Last time I did a experiment with 3000 zombie AI in a server it worked mostly fine for me but anyone who joined the game with a older PC would just straight up freeze or their FPS would get smacked down to 2 FPS.

This update is a gift, multi-threading, finally games can be much faster and more efficient with their code.

RuizuKun_Dev · March 11, 2021, 10:38pm

Would definitely appreciate an official tutorial, use cases and best practices.

This new technology is great but can be very dangerous too if not utilize correctly.

Some examples:

How many Actors should average game have based on average user device’s computing power
How many Actors are too many

Also typo @EthicalRobot

Crow_Wave · March 11, 2021, 10:40pm

I hope people make examples on how to use this properly and what you can do with it. because i’m not really clear on it’s benefits

LucasMZ_RBX · March 11, 2021, 10:43pm

While I find this extremely hard to understand, I understand what it is, and plan on learning it.

This is gonna be great for performance on multiple, multiple games if they utilize it. We’ll just have to wait, I guess.

nooneisback · March 11, 2021, 10:54pm

It allows execution of multiple tasks in parallel, thus performing them at the same time. There are multiple issues with this.

First of all, different threads will finish at different times depending on the workload. This can cause inconsistencies and you’ll have to yield the rest of the code until the working threads finish to avoid missing information.

Next up, multiple threads can access and write the same information. Most other languages introduce mutex, while Luau has task.synchronize. When called, it makes the thread run as a coroutine.

Finally, you can’t just slap :ConnectParallel everywhere and expect it to work. Working with threads is a messy process which will either result an inefficient and buggy mess, or a considerable speed up.

In short, imagine distributing workload in a team. Right now it’s as if one person does everything. They can only perform one task at a time. It’s slow, but with little room for error. Parallelizing is the same as distributing work among a large team. It might not look like much at first, but the members are stupid and need every little detail explained to them.

Crow_Wave · March 11, 2021, 11:11pm

Isnt that like creating a new coroutine, or using spawn()?

EthicalRobot · March 11, 2021, 11:12pm

WRT means “with regard to”, but you’re right I should have spelled this out.

Luaction · March 11, 2021, 11:21pm

Threads created through coroutine or spawn don’t run in parallel, they run in sequence.

davness · March 11, 2021, 11:25pm

Has an optimal number of threads/VMs that can be deployed in a server to maximize their utilization/performance potential been calculated yet? Or is this dynamic according to where the game is hosted (silicon lottery et al)?

EthicalRobot · March 11, 2021, 11:38pm

We have some tunable values we hope to adjust find the sweet spot for VMs/thread. We have started with some value that make sense, but when we have more content to test on this will continue to improve.

Halalaluyafail3 · March 12, 2021, 12:07am

GetService and FindService currently aren’t considered thread safe, is that intentional or a bug?

Also, is findFirstChild different from FindFirstChild in its thread safety?

(i renamed Workspace to a)

EthicalRobot · March 12, 2021, 12:19am

This is intentional, as the Service you get would be by definition outside of the Actor. This is something we will be looking to open up more in future releases.

EthicalRobot · March 12, 2021, 12:20am

findFirstChild lowercase is deprecated, so we didn’t add the ThreadSafe tag to it. Please use the uppercase version.

Luaction · March 12, 2021, 12:55am

Your puny engine restrictions have no power over me!

Script called Heartbeat:

game["Run Service"].Heartbeat:connect(function()
	local ti = tick()
	print("Start Heartbeat")
	repeat until tick() - ti > 0.01
	print("End Heartbeat")
end)

Local script called Renderstepped

game["Run Service"].RenderStepped:connect(function()
	print("------------------")
	task.desynchronize()
	task.synchronize()
	local ti = tick()
	print("Start RenderStepped")
	repeat until tick() - ti > 0.01
	print("End RenderStepped")
end)

That being said, are there plans to add another event to Runservice to allow the execution of serial lua before parallel lua within the same frame?

EthicalRobot · March 12, 2021, 1:06am

This is something we have been talking about as it would be nice to set things up, let the parallel phase run, and then use the results all in one frame. How we will enable this hasn’t been decided yet however.

nooneisback · March 12, 2021, 1:31am

Coroutines emulate threads by placing tasks in a queue. It will resume a task if all other tasks are yielding, so only one task can run at a time.