Canceling Coroutines Still Needs Work

focasds · September 27, 2022, 5:04pm

This Post Was Updated: (5/18/2023)

Currently as a Roblox developer is hard to work with the functions coroutine.close and task.cancel because the engine still assumes that threads are never cancellable, and that they will always remain alive. Which is not true, noting that these two functions set the coroutine to a dead state.

What’s the issue?
Since the release of these two functions, I have been adjusting my code base to cancel loops that are doing a certain task. In other words, for simple tasks where coroutine.yield or task.wait is called, everything it’s fine. However; if you call any other function that yields like Event:Wait, Path:ComputeAsync, RemoteFunction:Invoke (just to mention a few), everything goes down hill.

Currently you would get the error “cannot resume dead coroutine” on the output with no clear explanation from where it originated, tasks would still run on the back, and memory leaks would happen.

Examples Of Error Message: “cannot resume dead coroutine”

local Thread = task.spawn(function()
	workspace.ChildAdded:Wait()
end)

task.cancel(Thread)
Instance.new("Part", workspace)

local Thread = task.spawn(function()
end)

--> Thread is already dead
task.spawn(Thread, true) --> If you pass any value after the thread, it will trigger the error, else it doesn't.

Examples Of Memory Leaks:
This example is not as complex, but you can imagine the magnitude of it in other kinds of task where connections and many other things get leaked.

local function WaitForObject(Name)
	local Thread = coroutine.running()
	local Connection; Connection = workspace.ChildAdded:Connect(function(Object)
		if Object.Name == Name then
			Connection:Disconnect()
			task.spawn(Thread, Object)
		end
	end)
	return coroutine.yield()
end

local Thread = task.spawn(function()
    print(WaitForObject("Part"))
end)

task.cancel(Thread) --> Thread is now dead, yet the connection is still in place.
--> Cannot resume dead coroutine shows up again after the event triggers.

Proposed Solution:
This is getting out of hands at this point, it looks like coroutine.close and task.cancel was never meant to be added in Luau or Lua in general. It looks like one of the dumbest functions I could have ever used since the only thing it does its change the Status of a coroutine to a dead state. No more than that, which prevent calls that are trying to resume the dead coroutine.

There is no way to know when a coroutine has been closed unless you fill up your code with repeat until loops so we can cancel other tasks or disconnect connections, but no, I don’t want to be doing that. I would love something that triggers once the thread sets to a dead state whether it was canceled or by itself.

The “perfect” solution so it matches Lua behaviors would probably be something like xpcall but on the coroutine library:

coroutine.join(Thread, Callback).

Once the Thread has ended or set to a dead state by coroutine.close or task.cancel, the Callback gets called.

This would be useful for us the developers to avoid memory leaks; however, Roblox should still go and disable the error “cannot resume dead coroutine” and stop tasks on the background. I don’t like things like Path:ComputeAsync or external calls continue working while I cancelled the thread.

Example of a memory leak being fixed with the solution provided:

local function WaitForObject(Name)
	local Thread = coroutine.running()
	local Connection; Connection = workspace.ChildAdded:Connect(function(Object)
		if Object.Name == Name then
			Connection:Disconnect()
			task.spawn(Thread, Object)
		end
	end)

    coroutine.join(Thread, function()
        Connection:Disconnect()
    end)
	return coroutine.yield()
end

local Thread = task.spawn(function()
    print(WaitForObject("Part"))
end)

task.cancel(Thread) --> Thread is now dead and no connection in place.

Thank you, focasds.

metatablecatmaid · September 28, 2022, 2:42am

Have you considered checking the state of a thread before trying to kill it?

if coroutine.status(co) ~= dead then

Removing an error like this would create an engine inconsistency as Roblox is very defensive about edgecases like this.

I do agree, however, the engine should enforce thread resumption errors equally on task.spawn regardless of argument count, and that stack tracing around coroutines still need some work done to it

focasds · September 28, 2022, 4:12am

The issue happens when resuming threads, not when cancelling them.

In my opinion they should remove it cuz I am not planning to make a resume function that checks if the coroutine is dead when there is already two of them task.spawn and coroutine.resume (resume shouldn’t be used cuz it hangs callbacks from the engine) and also wrapping Event:Wait() which yield until the event triggers in another function that is in charge of resuming the thread if is not dead when the event triggers…

Honestly, the error is something that should not show up at all in my opinion, imagine you check if a thread is dead before resuming every time or just making a function to “silent” the error (which is what I am asking for).

As I mentioned before, I have coroutines that are yielding because of Event:Wait(). When the event triggers the engine resumes the thread and boom throws the error to the output cluttering the output at some point.

WheretIB · October 5, 2022, 1:10pm

It would be really nice to have the origin of the error available in the message.

focasds · October 5, 2022, 1:15pm

True, but what do you think of the situation of developers canceling coroutines that have events yielding in them?

Like:

local Thread = task.spawn(function()
	workspace.ChildAdded:Wait()
end)

task.cancel(Thread)
Instance.new("Part", workspace) -- For the picky ones, I used the second argument just for this example...

That code I posted throws the same error because the event ChildAdded triggered and tried to resume the ‘dead’ coroutine.

Are we expected to just wrap such situation in this? (The code below won’t trigger the error)

local function waitForEvent(Event)
      local Thread = coroutine.running()
      Event:Once(function(...)
          if coroutine.status(Thread) ~= "dead" then
              task.spawn(Thread, ...)
          end
      end)
      coroutine.yield()
end

local Thread = task.spawn(function()
	waitForEvent(workspace.ChildAdded)
end)

task.cancel(Thread)
Instance.new("Part", workspace) -- For the picky ones, I used the second argument just for this example...

0Shank · October 11, 2022, 2:18am

Support! I think this gets away of a useful pattern I was thinking of with coroutine.close()

Whenever you make a system that yields inside a event handler, on the player’s team changing many people don’t account for “async race conditions”

For example. player joins admin team, and they are given the description of a random player using a yielding function like game.GetHumanoidDescriptionFromUserId(). People on the normal team are just given their own outfit ID.

Now what happens if the player changes teams to admin and then quickly to normal before the network request, game.GetHumanoidDescriptionFromUserId() is completed? Then once that request is completed they will be given the admin random morph even though their team isn’t admin currently and is normal!

Example of what I mean:

local NormalDescription = Instance.new("HumanoidDescription")

local NormalTeam = Instance.new("Team", game:GetService('Teams'))
NormalTeam.TeamColor = BrickColor.new(255,0,0)
NormalTeam.Name = "Normal"
NormalTeam.AutoAssignable = false

local AdminTeam = Instance.new("Team", game:GetService('Teams'))
AdminTeam.Name = "Admin"
AdminTeam.TeamColor = BrickColor.new(255,255,0)
AdminTeam.AutoAssignable = false

game.Players.PlayerAdded:Connect(function(Player: Player)
	Player:GetPropertyChangedSignal("Team"):Connect(function()
		local Character = Player.Character or Player.CharacterAdded:Wait()

		if Player.Team ==AdminTeam then
			local AdminMorph = game.Players:GetHumanoidDescriptionFromUserId(3783643606)

			Character.Humanoid:ApplyDescription(AdminMorph)
		else
			Character.Humanoid:ApplyDescription(NormalDescription) -- internally figures out and uses latest description passed so no need to worry about race conditions
		end
	end)

	task.delay(5,function()
		print "START"
		Player.Team = AdminTeam
		Player.Team = NormalTeam -- after the network request from the GetHumanoidDescription from the admin team is completed. it will make the player have the admin morph even though their team is normal
	end)
end)

This is tryable in studio it will team you a random player’s humanoid description which only should happen when the team is admin, even though your team is NormalTeam. This is due the race conditions I was talking about
Ofcourse, you can pre-emptively cache any requests for these use cases, but it’s not always possible. An other way is having a guard clause after a yielding function to see if the current player.Team matches with the team the player had when you called this function.

I don’t feel like it’s the most cleanest way. This is the pattern I was thinking of.

local IdentityTable = {}

local function UseLatestCall(IdentityArg, Functor) --  The pattern which makes this possible easily
	local MyFunction = nil

	MyFunction = function(...)

		local SelectedArg = select(1, ...)
		local Args = {...}

		if IdentityTable[MyFunction][SelectedArg] then
			coroutine.close(IdentityTable[MyFunction][SelectedArg])
			IdentityTable[MyFunction][SelectedArg] = nil
			print "Cancelled thread"
		end

		IdentityTable[MyFunction][SelectedArg] = coroutine.create(function()
			print(IdentityTable[MyFunction][SelectedArg])
			Functor(unpack(Args))
			IdentityTable[MyFunction][SelectedArg] = nil
		end)
		task.spawn(IdentityTable[MyFunction][SelectedArg])
	end

	IdentityTable[MyFunction] = {}
	return MyFunction
end

local NormalDescription = Instance.new("HumanoidDescription")

local Update = UseLatestCall(1, function(Player)
	local function Guard()
		if Player.Character == nil or Player.Chracter.Parent == nil then return true end
	end if Guard() then return end
	
	if Player.Team == game.Teams.Admin then
		local DressPlayerAsRoblox = game.Players:GetHumanoidDescriptionFromUserId(1)
		if Guard() then return end
		Player.Character.Humanoid:ApplyDescription(DressPlayerAsRoblox)
		print "Your team is admin"
	else
		Player.Character.Humanoid:ApplyDescription(NormalDescription)
	end
end)

local NormalDescription = Instance.new("HumanoidDescription")


game.Players.PlayerAdded:Connect(function(Player: Player)
	Player:GetPropertyChangedSignal("Team"):Connect(function()
		Update(Player)
	end)
	
	Player.CharacterAdded:Connect(function()
		Update(Player)
	end)

	task.delay(5,function()
		print "START"
		Player.Team = game.Teams.Admin
		Player.Team = game.Teams.Normal
	end)
end)

Ofcourse there are still guard clauses in here, but they are only to check if the character still exists since humanoid:applyDescription() will not work if the character is parented to nil.

Not to check if a new request was called and cancel the old one, that gets dirty quickly for reusability. This code does work, but it errors cannot resume dead coroutine like in this post’s case.

Elttob · November 1, 2022, 6:09pm

This seems to partially be an error on Roblox’s end.

When you call into a yielding method in Roblox’s API (examples: HttpService:GetAsync(), Event:Wait()), Roblox’s task scheduler will yield your thread. At a later time, it resumes your thread, usually once some work has been done and a result can be returned.

This would be fine, except the task scheduler does not check the thread is still alive. This is because Roblox’s internal task scheduler is built with the assumption that a suspended thread cannot become dead. This assumption is no longer true now that threads are cancellable, and so Roblox will now generate these errors which are useless to the end developer.

Please fix that.

TM18 · January 25, 2023, 4:13pm

Having this issue right now. My script is running as intended as far as I can see. Why do I still get this error when using task.spawn() after having ran and cancelled it once.

GrayzcaIe · March 24, 2023, 8:41am

Bump! Like everyone else here, my script seems to be running as intended. After running the thread once and closing it using coroutine.close sends an error to the output saying “cannot resume dead coroutine.” As mentioned by @Elttob, it is definitely a fault on Roblox’s end for assuming suspended threads cannot be in a dead state. Please fix it.

PeZsmistic · March 27, 2023, 11:50pm

A bug report is required or this will not be tracked appropriately. Use @Bug-Support if you are unable to.

lolmanurfunny · March 29, 2023, 3:54am

It frustrates me how WaitForChild doesn’t respect the fact that the thread it was called in has died, assuming it has yet to resolve at the time the thread is killed.

Additionally, this may be an engine-level memory leak. Logically speaking, it would only make sense that the thread, which is dead, is still in memory due to the wfc call. ^{^{Could possibly be why character spawning/LoadCharacter leaks client memory.}}

DukeAunarky · May 13, 2023, 7:30pm

Support
I don’t wanna rely on adding even more hacks to ‘fix’ this.

focasds · May 18, 2023, 9:21pm

Just updated this post to request another feature in it and be a bit clearer, since now that I adjusted a lot of my code base to use coroutine.close, my console/output gets spammed by the message “cannot resume dead coroutine”.

OptimizedFunction · July 4, 2023, 6:32pm

Bump! I was working on some code which used coroutines and it has been a nightmare to debug this. It wasn’t breaking anything, but there is absolutely no trace to see why it is happening.

Of course this error shouldn’t be thrown in the first place because the thread is dead. A fix would be appreciated!

Maelstorm_1973 · July 5, 2023, 3:07am

To be honest, I would file this as a bug report because the error messages does not state where the error is. I’ve had this problem myself with using this. What I was forced to do to resolve the issue was something like this module script (I call it pthread):

local mutexTable = {}

-- Waits for the mutex to become available then
-- locks it and returns.
local function mutexLock(mutex)
	if mutexTable[mutex] == nil then
		mutexTable[mutex] = true
	else
		while mutexTable[mutex] = true do
			task.wait(0)
		end
		mutexTable[mutex] = true
	end
end

-- Waits for the specified time in seconds for the mutex
-- to become available.  Then locks it and returns.
-- Returns true if the mutex was acquired, false if not.
local function mutexLockTimed(mutex, waitTime)
	local flag = false
	if mutexTable[mutex] == nil then
		mutexTable[mutex] = true
		flag = true
	else
		local timer = waitTime
		while mutexTable[mutex] = true and timer > 0 do
			timer -= task.wait(0)
		end
		if mutexTable[mutex] == false then
			mutexTable[mutex] = true
			flag = true
		end
	end
	return flag
end

-- Unlocks the mutex.
local function mutexUnlock(mutex)
	mutexTable[mutex] = false
end

-- ******** Module

local module = {}

module.mutexLock = mutexLock
module.mutexLockTimed = mutexLockTimed
module.mutexUnlock = mutexUnlock

return module

Normally, when dealing with spinlocks and threads, at the assembly level various forms of CMPXCHG are used to swap and compare the value of a memory location that’s a mutex lock. The instruction is guaranteed to be atomic. Because LUA is an interpreted language, there is absolutely no way to guarantee that my code above is atomic, but it’s the best that I can do in this environment.

So to use the module, do this:

local pthread = require(pthread)
local MUTEX = 25
local cancleThread = false

-- A loop in a different thread that does something
-- and checks if it should exit every iteration.
task.spawn(function()
	-- Setup
	local count = 0
	local status

	-- Loop
	while true do
		count += 1
		pthread.mutexLock(MUTEX)
		status = cancelThread
		pthread.mutexUnlock(MUTEX)
		if status == true then
			break
		end
		task.wait(0.1)
	end
end)

-- Cancels the above loop after 15 seconds.
task.wait(15)
pthread.mutexLock(MUTEX)
cancelThread = true
pthread.mutexUnlock(MUTEX)

After implementing this and using it, I haven’t had any more problems with it. However, there’s always a risk about things not being atomic.

vvv331 · July 5, 2023, 10:23am

despite task.cancel and coroutine.close being hacky ways to stop threads, this still needs to be handled better regardless. RBXScriptSignals seem to ignore the fact the thread is dead and attempt to resume anyway

Weldify · August 7, 2023, 8:51pm

Bump! I’m also experiencing very annoying coroutine cancellation issues!

sevenpolygons · August 8, 2023, 8:05am

Although this would be nice, the main problem this thread focused on still hasn’t been addressed. This error occurs on the movement handler in my game, and trying to debug while seeing this error message every time the player moves is frustrating. Note: this only appears to happen in Studio. I would make a bug report, but I don’t have high enough permissions. I’ve attached an example of the issue.
coroutineCloseError.rbxl (44.7 KB)

WheretIB · August 8, 2023, 11:28am

We fixed the ‘cannot resume dead coroutine’ error in your example on August 2nd.

I cannot reproduce the issue using it any more. Not sure why you don’t have the update 588.

And a side note, this issue was fixed long ago in task.wait when used together with task.cancel.

(I only talk about the example by @sevenpolygons, not the original post)

WheretIB · August 8, 2023, 11:31am

As of August 3rd, this example now gives an error immediately on task.spawn call (“cannot spawn non-suspended coroutine with arguments”) instead of reporting “cannot resume dead coroutine” later with no call stack.