Coroutines V.S. Spawn()... Which one should I use?

MisterHumbled · February 15, 2020, 3:15pm

Yep, what I have been saying so far. Thank You for literally seeing it from my point of view and logically thinking through what was presented as a problem. Anyway, yea I completely agree.

Anaminus · February 15, 2020, 3:23pm

The behavior is far from arbitrary. Consider the following graph:

This graph was produced with data from the following script:

for i = 0, 100 do
    spawn(function(t)
        print(t)
    end)
end

The graph displays the amount of time it took for each spawned thread to resume. It showcases the behavior of WaitingScriptJob, a task scheduler job that governs the behavior of the wait, spawn, and delay functions. The activity of WaitingScriptJob can be viewed in Studio’s Task Scheduler panel.

Firstly, the graph shows that threads have a minimum delay. This is determined directly by settings().Lua.DefaultWaitTime, which is set to 0.03 seconds, matching roughly what is seen on the graph.

Also shown is an obvious stair-step effect. This is caused by budgeting. For each 60Hz frame, there’s a delay of around 2ms, presumably from the scheduler handling other jobs. After that, WaitingScriptJob starts, and the spawned threads begin resuming. After about 1ms, the job exceeds its budget, and no more threads are resumed. Anything left over is handled in the next frame. This budget is determined by settings().Lua['Waiting Threads Budget']. Its value of 0.1 appears to correspond to 1ms, though with some rudimentary testing, this doesn’t seem to have a linear correlation.

Overall, the massive minimum delay is a deal breaker for most people; threads take at least two frames to resume. This graph was produced in an environment in which nothing else was happening, so it’s apparent that the budgeting mechanism doesn’t consider load or frame rates. Instead, it appears to have a constant time limit that’s too short for anything beyond basic usage.

Anaminus · February 15, 2020, 3:37pm

That’s without doing anything except printing a number. If threads are, say, updating a GUI, calculating positions, or casting some rays, the time will add up very quickly.

MisterHumbled · February 15, 2020, 3:40pm

You’ve precisely done what I’ve instructed against doing. Sure you could claim that this is spawns implementation problems and therefore should be avoided but do you not see the conditions you have set for this experimentation? Again you use case for calling spawn consistently for 100 iteration goes against my very argument which I said above. This is intangible proof of spawn arbitrarily reaching the threshold for each frame. Now , again I’ve never encountered problems with spawn yielding for more than expected delays so I’ll need to check this once I get on my computer.

MisterHumbled · February 15, 2020, 4:05pm

I’m still a little confused on this part, mind elaborating until I’m on PC and able to run my own tests. I was under the influence that connections were made instant with Roblox, is it that you’re claiming that that’s not the case when you say they’re under the same conditions as spawns?

MisterHumbled · February 15, 2020, 6:53pm

Alright thanks for the in-depth explanation, it helps me follow along with your thought process. However, if I wanted to get to the same conclusion as you, is there tangible steps you can give me to come to that conclusion? Basically I’m asking what you did to get the results that you got and how you know the technical sides of the Roblox thread scheduler. Mind elaborating on how these tests can be carried out and seen for an individual?

Anaminus · February 15, 2020, 7:24pm

What are you using for signals? How listener threads are resumed depends on how the signal is fired. For example, BindableEvent:Fire() is instant because Fire creates and resumes each listener thread itself. It’s the same for most signals that fire from property changes.

Signals like Stepped, RenderStepped, and Heartbeat are fired at particular moments in a frame. The microprofiler can be used to tell exactly when. Here’s another graph:

One: connections added later are resumed first, which is by design. Two: none of them have budgeting, also by design. Three: RenderStepped runs later in the frame than the other two, by design.

Interestingly, WaitForChild is budgeted. Here it is compared to wait() & friends:

WFC appearing lower on the graph indicates that it does not suffer from a minimum wait time as the others do. Taken as an average, the slopes are generally the same, with the minor deviations caused by sub-optimal or slightly different implementations:

Benchmark (LocalScript)

game.ReplicatedFirst:RemoveDefaultLoadingScreen()

local RunService = game:GetService("RunService")
local Step = RunService .RenderStepped
-- Used to avoid interference with budget.
local function Sleep(d)
	local t = tick()
	while tick()-t <= d do
		Step:Wait()
	end
end

Sleep(3)

local N = 10
local I = 1000
local a = table.create(I, 0)

local function mark(i, t)
	a[i] = a[i] + (tick()-t)
	for n = 1, 100 do -- Do some work.
		local v = math.sqrt(i)
	end
end

local RenderStepped = RunService.RenderStepped
local function doRenderStepped(i, t)
	coroutine.resume(coroutine.create(function()
		RenderStepped:Wait()
		mark(i, t)
	end))
end

local Stepped = RunService.Stepped
local function doStepped(i, t)
	coroutine.resume(coroutine.create(function()
		Stepped:Wait()
		mark(i, t)
	end))
end

local Heartbeat = RunService.Heartbeat
local function doHeartbeat(i, t)
	coroutine.resume(coroutine.create(function()
		Heartbeat:Wait()
		mark(i, t)
	end))
end

local function doSpawn(i, t)
	spawn(function()
		mark(i, t)
	end)
end

local function doDelay(i, t)
	delay(0, function()
		mark(i, t)
	end)
end

local function doWait(i, t)
	coroutine.resume(coroutine.create(function()
		wait()
		mark(i, t)
	end))
end

local wfca 
local wfcb
local function doWFC(i, t)
	coroutine.resume(coroutine.create(function()
		wfca[i]:WaitForChild("Value")
		mark(i, t)
	end))
	wfcb[i].Parent = wfca[i]
end

for n = 1, N do
	wfca = table.create(I)
	wfcb = table.create(I)
	for i = 1, I do
		wfca[i] = Instance.new("BoolValue")
		wfcb[i] = Instance.new("BoolValue")
	end
	local t = tick()
	for i = 1, I do
		doWFC(i, t)
	end
	Sleep(1)
end
for i = 1, I do
	print(i, a[i]/N)
end

Quenty · February 15, 2020, 9:13pm

I implemented WaitForChild 4 years ago as an intern, and inserted it into the same task scheduler as wait() and spawn(). Obviously changes could have been made to this system in the mean time, but this is how I originally did it.

Note that no budget is consumed if the instance exists immediately as requested, which is the vast majority of the time in my experience.

MisterHumbled · February 15, 2020, 10:01pm

This is an interesting observation, I’d like to see more posts like these in the future and your experiment seems like I would be able to replicate it. Now my only questions when it comes to these are: Are the firing times for Render Step, Stepped, and Heartbeat all fired at the same set times respectively at all times? Or can there be variations between frames (I can understand if you are speaking in terms of averages). Also, can you elaborate on what you mean when you say that WaitForChild is budgeted? Does this mean it’s not affected subject to a yield amount to resume again? So that means that WaitForChild doesn’t inherently use the wait() function correct?

EDIT: Sorry I’m still on mobile so it’s hard for me to carry out any of the experiments myself, so bear with me. Also, can you elaborate on how exactly you were able to put these in a graph and look at them from a statistical POV?

blazepinnaker · February 15, 2020, 10:17pm

What folks are missing is the rather obvious elephant in the room (though I have mentioned it several times on this thread)

If you want to use a thread scheduling mechanism because you are dynamically creating several light weight threads that need to be executed at a later time (not necessarily the next frame or even the next second) and you want to prioritize safety against FPS drop, you either

a) have to write/use a custom thread scheduler

or b) use spawn.

Yes, writing a custom thread scheduler to replace the out of box one is desirable (and I have done this). But it is most definitely not easy to do.

edit to give a quick example (and not contrived, if you think about it)… how would you do this with coroutines with both fairness and safety:

function add_lifeform()
	spawn(function() 
		local cp = workspace.Part:Clone()
		local r = Random.new()
		cp.Position=Vector3.new(r:NextInteger(-50,50),5,
		   r:NextInteger(-50,50))
		local die = false
		cp.Touched:Connect(function() die=true end)
		cp.Parent = workspace
		while not die do
			cp.Position = cp.Position + 
				Vector3.new(r:NextInteger(-2,2),0,r:NextInteger(-2,2))
			wait(math.random(1,2))
		end
		print("Died!")
		cp:Destroy()
		add_lifeform()
	end)
end

for i=1,500,1 do
	add_lifeform()
end

Anaminus · February 16, 2020, 1:48pm

As indicated by @Quenty above, a thread yielded by WaitForChild is put into the same, budgeted queue as threads yielded by wait, spawn, and delay. The only difference is that the WaitForChild thread is not scheduled with a delay. After being added to the queue, it will run as soon as WaitingScriptsJob gets around to resuming it.

Imagine an AddThreadToScheduler function that receives a thread along with a number indicating the duration to wait before the thread should be resumed. Within wait, spawn, and delay, the call might look like this:

DefaultWaitTime = 0.03
if duration < DefaultWaitTime then
	duration = DefaultWaitTime
end
AddThreadToScheduler(thread, duration)

Whereas WaitForChild would look like this:

AddThreadToScheduler(thread, 0)

The MicroProfiler will let us see how this works. Let’s use the following LocalScript, put under ReplicatedFirst, to be run with Play Solo:

game.ReplicatedFirst:RemoveDefaultLoadingScreen()
wait(5) -- Give the game some time to load and settle down.
-- May also disable Players.CharacterAutoLoads and Chat.LoadDefaultChat to
-- reduce clutter.

local function DoSomeWork(ms)
	local t = tick()
	repeat until tick()-t >= ms/1000
end

local RunService = game:GetService("RunService")

RunService.Stepped:Connect(function()
	debug.profilebegin("STEPPED")
	DoSomeWork(2)
	debug.profileend()
end)

RunService:BindToRenderStep("BIND", 0, function()
	debug.profilebegin("BIND")
	DoSomeWork(2)
	debug.profileend()
end)

RunService.RenderStepped:Connect(function()
	debug.profilebegin("RENDER")
	DoSomeWork(2)
	debug.profileend()
end)

RunService.Heartbeat:Connect(function()
	debug.profilebegin("HEARTBEAT")
	DoSomeWork(2)
	debug.profileend()
end)

while true do
	debug.profilebegin("WAIT")
	DoSomeWork(2)
	debug.profileend()
	wait()
end

Ctrl+F6 will open the profiler. The script will produce a profile that looks similar to this:

Look for RENDER, BIND, STEPPED, HEARTBEAT, and WAIT, as defined the in the script. These are the LocalScript doing work in various locations.

From what can be seen, BIND and RENDER always run in the render step on the Main thread. BIND, which allows a priority to be set, runs first. The bit of activity after BIND is the default camera script doing some work, which runs after BIND because it has a later priority. RenderStepped is designated as having the latest priority, so RENDER runs after all bound render functions.

WAIT, STEPPED, and HEARTBEAT run in one of the several worker threads each frame. Once rendering has finished, WaitngScriptsJob starts. It is not visible in the first frame because it is doing almost no work. Remember that wait() has a minimum delay of 0.03 seconds, so it resumes at least every other frame. It can be seen in the second frame because WAIT is running.

Following that is simulation. The Stepped event is dependent on the simulation being active, so it runs here. The bit of activity following STEPPED is physics simulation. Finally, HEARTBEAT starts running. The remainder of the time is spent idling to sync to the next frame. Another physics step may also occur here.

The DevHub has more information about the MicroProfiler:

My previous post has the benchmark script I used to produce the data. This data was pasted into LibreOffice Calc and rendered as a chart. Your preferred spreadsheet program should be able to do something similar.

blazepinnaker · February 16, 2020, 11:17pm

ah, you may want to profile that baby. It’s called a spin wait and it’s really brutal on the CPU. Even adding a heartbeat:wait is awful because of the minimum yield times.

For any reasonable period of wait time, you need to yield via some mechanism (spawn, or something you build yourself) and be rescheduled at the appropriate time for efficient use of CPU.

Thus the value of spawn/wait.

blazepinnaker · February 16, 2020, 11:54pm

So if I can’t use wait and delay what do I replace it with?
A Custom Heartbeat Wait?

If you need really reliable timing, yes, use a Heartbeat-based wait.

For spawning threads reliably see Crazyman32’s comments above.

After reviewing the forums, I realize now where some of the general set of misunderstanding and confusion is coming from.

People are actually using the pattern

while tick()-startTick < waitTime
heartbeat:wait()
end

This doesn’t work efficiently at all. (and I couldn’t find it in the documentation, either). It eats up the CPU terribly for any waitTime values > a few frames.

You can do a quick experiment with this. Eg, use my example above, but instead use heartbeat connect / heartbeat:wait.

The underlying problem is that thread scheduling with FPS protection is a tricky problem and requires some assumptions about budgeting and fairness which are not easy to explain to new engineers not familiar with threading.

There is some thoughts, I suspect, that multi-threading will help with this. It absolutely will for a small number of threads, but not for a large number, as that generally scales very poorly due to the need to start locking and issues with fairness.

That said, what might be interesting is the ability to have a non-preemptive thread scheduler inside of a new preemptive thread. Eg, spawn/wait but they’re all grouped in new thread or threads. Eg, 10 preemptive threads * 1000 non-preemptive coroutine threads. That way you leverage multiple cores, but still can take advantage of lightweight non-preemptive lua coroutines.

There is a really good book on threads if you’re interested, though it’s java based - https://www.amazon.ca/dp/0123973376?slotNum=0&linkCode=g12&imprToken=rWCg5y4.f5QjNtp8cr-6WQ&creativeASIN=0123973376&tag=javarevisit0c-20 Java is pretty mature when it comes to real threading though, so it’s not a bad language to use for this domain.

The beauty of lua threading model is the ability to yield a large number of ‘threads’ with only a very small performance penalty, though of course there is problem in leveraging multiple cores as everything executes on a single core.

And maybe we can start using scare quotes when we say co-routine based ‘threads’, that’s less confusing, as they aren’t really threads. And to add to the confusion, it looks like Roblox is actually using real threads. Eg, tweening/GPU work is on a separate real worker thread, something important to leverage.

Anaminus · February 17, 2020, 12:13am

That is literally the point of DoSomeWork: so that there’s something to see on the profiler.

None of the scripts I’ve posted so far are meant to be practical. Their only purpose is to benchmark the scheduler and see how it works.

blazepinnaker · February 17, 2020, 12:25am

Fair point, I just saw the same pattern as what was suggested above and jumped on it a little too fast. We might want to start a new thread for benchmarking as this is very very useful stuff and I’d hate to see it get lost in the debate of whether there are any scenarios to use spawn.

EyesInDisguise1 · July 31, 2020, 5:40am

What do you mean never use wait()? What if you want to pause your script for a certain amount time?

sparker22 · July 31, 2020, 5:57am

You measure time via Stepped, RenderStepped, or Heartbeat.

iiRealistic_Dev · July 31, 2020, 6:02am

As @sparker22 said.

There are many ways to make your own, better “wait”.

0Shank · August 2, 2020, 8:32pm

Uh no Coroutines can be used more then once

DarthFS · January 27, 2021, 1:49am

Typically whatever function you input as an argument in your thread is not supposed to yield. I do agree with you on spawn not being bad tho.