Parallel Lua Beta

Sir_Falstaff · March 18, 2021, 3:30pm

I found a really weird bug that causes some sort of memory leak that persists between playtesting sessions:

LocalScript inside of an actor in StarterPlayerScripts:

for i = 1,250 do
	RunService.RenderStepped:ConnectParallel(function()
		task.synchronize()
	end)
end

LuaHeap will continue to rise until it reaches 2000+ Mb, but the weirdest part is that if you stop the game and play again, LuaHeap will keep rising from wherever it left off. I tested this in the developer preview build and it doesn’t happen there, only on the new beta in the live version of studio.

ParallelLuaBugRepro.rbxl (22.3 KB)

jman116 · March 18, 2021, 7:28pm

So far I have managed it to completely crash Windows several times
When ran on a more… competent machine, it completely accelerated my terrain generation system.

I have tripled the speed of the my terrain generation system while making it more stable and it no longer makes the server laggy while generating planets. 5.313 seconds to fully initialize and spawn the terrain of a 2048 x 512 x 2048 section of land.

Raildex · March 19, 2021, 5:48am

What exactly can we expect from this proposed shared storage? Will it be instance based or will we be able to write tables to it for read only purposes?

Currently I have a custom grid and pathfinding algorithm and being able to multi thread this would be a great help. However, the pathfinding algorithm wouldn’t be able to access the grid data if run in parallel currently from my understanding. The only way I could transfer this information at the moment without the massive overhead of sending it through bindables would be by making the grid out of parts and trying to convey information through that and value objects. This would be incredibly unwieldy for obvious reasons.

Some kind of service with which we can write to from the main thread and read from in parallel threads would be greatly beneficial. Can we expect anything similar to that or give us any other ideas as to what to expect?

firestarroblox123 · March 19, 2021, 4:55pm

Do servers have access to multiple threads? I ask this because I want to know if it’s useful to implement parallel lua on the server.

Garnold · March 19, 2021, 7:41pm

I noticed that parallel tasks are always processed before gameStepped in the frame pipeline. This means when we call ConnectParallel on heartbeat, the actual processing is delayed until the next frame. Are there any plans to have an option to start a WaitingParallelScriptsJob at any point in the pipeline? It would be nice to have some background tasks running in heartbeat that won’t delay physics, or code that we have bound to gameStepped. Alternatively we might have stuff we want to process in parallel during PreRender, like calculations for visual effects to be applied before rendering.

Ultimately giving us more control over where to run tasks in parallel will allow us to make the most of this powerful tool.

iiNemo · March 20, 2021, 6:41pm

Will we ever have access to the number of threads available? And will we ever have access to manually assigning jobs to specific threads?

In my situation, I want to evenly distribute a set amount of jobs that take a varied amount of time (which I have a rough estimate for). Back on the main thread, it will yield until all jobs are completed to then handle the processed data. I’m not exactly sure how Roblox determines what runs on what thread, but I don’t believe they are going to have better distribution than me (considering I know how long each job will take). The main problem with this is that I could have 20/28 jobs already completed on 11/12 threads, while that 1 final thread still needs to process 8 more jobs.

Are there any plans to address a situation like this?

EDITS: Just to clarify, each job isn’t taking 4-5 seconds and yielding inside, it’s performing calculations and I have a good estimate of how long it will take to process.

Technically each Actor is a thread, but if you have more actors than threads, Roblox will start doing some dynamic distribution? This means that if we have access to the number of threads on the user’s system, we also have the ability to control job distribution?

I just want to add that it would be super helpful to have a RunService:IsThread() function, or at least something similar to see what environment the code is currently running in.

EthicalRobot · March 22, 2021, 7:42pm

Behind the scenes Actors are bucketed into Luau VMs. Because scripts can’t move between VMs theres no way to granularly assign tasks to threads as the runtime has to ensure that each VM is only ever access on a single thread. Because of this it is possible that you can get really unlucky where all the actors doing expensive stuff all get put on the same VM, but because curently the VMs are allocated ahead of time, there’s easy way around this.

iiNemo · March 22, 2021, 8:44pm

In my situation, I was thinking more of sending information to actors via BindableEvents (since there is no real way to communicate yet), then those actors do the computations and spit out a result. If we have access to know what thread an actor is running on (via read-only property) and how many threads are available to be allocated to actors, my goal could be accomplishable?

EthicalRobot · March 23, 2021, 2:24am

If you generate a random number in a module script, it will serve as a sort of id for the VM that actor is on, as each VM as a seperate view of module scripts.

iiNemo · March 23, 2021, 2:38am

That’s smart, I can create a bunch of actors until they end up viewing the same cached ModuleScript. This definitely would work, but you can see how it seems “hacky.” As parallel lua is still in beta, there is definitely room for new features and design choices that can make this more straight forward.

RaterixRGL · March 24, 2021, 7:46am

It’s finally out, a feature I have been desperate for, for years on end. This needs significant improvement however.

Firstly, I would like to be able to access all methods, including setting the position of parts. I do not care if it’s unsafe, I accept the potential dangers of this and would still like this to occur. Maybe include an additional flag to say you would like to execute unsafe methods. I have created a demonstration place to display the need for this.
Demonstration.rbxl (22.3 KB)

Secondly, I would like mutators and delegations for me to control for system i/o and core counts.

Thirdly, I would like a way to synchronize code using parallelism, maybe using impotency keys, in an efficient manor. Currently _G is a good way to do this, but I would like a more efficient system to do this.

Thank you for finally implementing this after years and years of desperate need for it.

Autterfly · March 24, 2021, 9:56am

The limitation is probably in place not because of developers writing potentially broken code but because race conditions can introduce vulnerabilities into software. Whether it be the client or server, writing to a property from 2 different threads will cause undefined behavior.

RaterixRGL · March 24, 2021, 10:22am

Race conditions can easily be dealt with and is not of my concern, I want this feature implemented and it’s relatively easy to do. I develop multithreaded code. If two threads set the same data at the same time, depending on how the virtual machine is implemented, typically a kernel overflow occurs dropping either the most recent or oldest dataset for the object. Other times it’s just a memory recourse, ie the same property is accessed in memory but the instruction is overridden by the most recent thread.

Physicionics · March 25, 2021, 12:16pm

Not sure if I understand correctly, but let’s say that 1000 parts are made each .heartbeat. Would separating it into 4 parallels that each create 250 parts be 4x faster if not more? For like 4 core systems

DrWhoInTARDIS · March 30, 2021, 10:16pm

I agree, I want to be able to access/set with no regards to bugs that might come from it.
I understand that there would be some frustration from users not knowing what’s going on.
Properties and methods should be marked with a safety level and by default should not be able to change them unless you toggle a setting.

ee0w · April 3, 2021, 3:31pm

After playing around with this feature for a bit, I’ve come across a fatal memory leak: Parallelized VMs will never have their memory freed, and will persist across playtests. This will quickly cripple your Studio experience if you create large amounts of data per playtest on parallelized VMs. Only way to reclaim this memory is to restart Studio, which is a hassle of course.

Steps to reproduce, 100% success rate for me (Windows 10):

Create a bunch of VMs (client or server)
Fill each VM with a large amount of memory
Destroy each VM in-game, observe
Stop the playtest, observe

This does not happen with synchronous code (e.g. replacing Actors with Models)

Here’s a quick repro place for testing. Entry point located in ServerScriptService > Main, use the variable debug_USE_THREADS to control the behaviour.

threads.rbxl (22.1 KB)

Overall, I’m completely in love with this feature, this is truly the start of a new era for Roblox. The possibilities with a system like this are endless!

Ergo naturally, I wrote up a quick Perlin terrain generator that generates 24x24 chunks of 64x64x64 (~151M voxels^3), which is an unnecessary amount of terrain for any purpose. It’s really big. Bigger than a really big thing.

Sequentially it took 2 minutes 41 seconds to process, when in parallel it only took 44 seconds! Most of that time was spent calling WriteVoxels() (which isn’t yet safe for dyssynchronous running). Unfortunately I couldn’t go any bigger due to this pesky memory leak; the terrain itself takes around 1.5 GB, while all the dead VMs take 18 GB!

CrispyBrix · April 9, 2021, 9:17pm

This is really great! Glad to see this coming, I wouldn’t mine a few more examples of use cases though? Personally without the ability to edit things from the Parallels(If I am understanding correctly, which I probably am not) I can’t really see many cases to use this for.

sleitnick · April 10, 2021, 7:37am

This is awesome and lets me generate cool images like this really quickly.

I’m curious if there’s a good way to figure out actor utilization. I’m trying to balance max actors and chunk size (essentially size of work each actor does). I’ve found it hard to balance these numbers and have just been fine-tuning via trial & error. Is there a better way I can figure out the best way to balance a large parallel task like this?

Luaction · April 10, 2021, 10:44pm

This is based on my interpretation of the microprofiler:
Each task you run in parallel gets assigned to a taskqueue and all tasks within such a taskqueue are essentially treated as one task as far as the task scheduler is concerned.
Considering the task scheduler doesn’t (can’t) know for how long each task will run they are assigned randomly to the task queues priority being given to the task queues that have the least amount of tasks already being assigned to them.
The maximum amount of task queues that exist is by the looks of it hard limited to exactly 24 so ideally you want to have a maximum of 24 actors running code in parallel to minimize the amount of overhead you get from multithreading as any additional tasks you try to run in parallel will just get added to one of the already existing task queues and doesn’t get you any additional benefit from the task scheduler spreading the work load more evenly across your cpu threads.

RuizuKun_Dev · April 15, 2021, 2:53pm

@EthicalRobot I think it’s weird that HttpService:GenerateGUID can not be called in parallel

is there a reason for this or is it an oversight (did someone forgot to whitelist this)?