Canceling Coroutines Still Needs Work

GrayzcaIe · March 24, 2023, 8:41am

Bump! Like everyone else here, my script seems to be running as intended. After running the thread once and closing it using coroutine.close sends an error to the output saying “cannot resume dead coroutine.” As mentioned by @Elttob, it is definitely a fault on Roblox’s end for assuming suspended threads cannot be in a dead state. Please fix it.

PeZsmistic · March 27, 2023, 11:50pm

A bug report is required or this will not be tracked appropriately. Use @Bug-Support if you are unable to.

lolmanurfunny · March 29, 2023, 3:54am

It frustrates me how WaitForChild doesn’t respect the fact that the thread it was called in has died, assuming it has yet to resolve at the time the thread is killed.

Additionally, this may be an engine-level memory leak. Logically speaking, it would only make sense that the thread, which is dead, is still in memory due to the wfc call. ^{^{Could possibly be why character spawning/LoadCharacter leaks client memory.}}

DukeAunarky · May 13, 2023, 7:30pm

Support
I don’t wanna rely on adding even more hacks to ‘fix’ this.

focasds · May 18, 2023, 9:21pm

Just updated this post to request another feature in it and be a bit clearer, since now that I adjusted a lot of my code base to use coroutine.close, my console/output gets spammed by the message “cannot resume dead coroutine”.

OptimizedFunction · July 4, 2023, 6:32pm

Bump! I was working on some code which used coroutines and it has been a nightmare to debug this. It wasn’t breaking anything, but there is absolutely no trace to see why it is happening.

Of course this error shouldn’t be thrown in the first place because the thread is dead. A fix would be appreciated!

Maelstorm_1973 · July 5, 2023, 3:07am

To be honest, I would file this as a bug report because the error messages does not state where the error is. I’ve had this problem myself with using this. What I was forced to do to resolve the issue was something like this module script (I call it pthread):

local mutexTable = {}

-- Waits for the mutex to become available then
-- locks it and returns.
local function mutexLock(mutex)
	if mutexTable[mutex] == nil then
		mutexTable[mutex] = true
	else
		while mutexTable[mutex] = true do
			task.wait(0)
		end
		mutexTable[mutex] = true
	end
end

-- Waits for the specified time in seconds for the mutex
-- to become available.  Then locks it and returns.
-- Returns true if the mutex was acquired, false if not.
local function mutexLockTimed(mutex, waitTime)
	local flag = false
	if mutexTable[mutex] == nil then
		mutexTable[mutex] = true
		flag = true
	else
		local timer = waitTime
		while mutexTable[mutex] = true and timer > 0 do
			timer -= task.wait(0)
		end
		if mutexTable[mutex] == false then
			mutexTable[mutex] = true
			flag = true
		end
	end
	return flag
end

-- Unlocks the mutex.
local function mutexUnlock(mutex)
	mutexTable[mutex] = false
end

-- ******** Module

local module = {}

module.mutexLock = mutexLock
module.mutexLockTimed = mutexLockTimed
module.mutexUnlock = mutexUnlock

return module

Normally, when dealing with spinlocks and threads, at the assembly level various forms of CMPXCHG are used to swap and compare the value of a memory location that’s a mutex lock. The instruction is guaranteed to be atomic. Because LUA is an interpreted language, there is absolutely no way to guarantee that my code above is atomic, but it’s the best that I can do in this environment.

So to use the module, do this:

local pthread = require(pthread)
local MUTEX = 25
local cancleThread = false

-- A loop in a different thread that does something
-- and checks if it should exit every iteration.
task.spawn(function()
	-- Setup
	local count = 0
	local status

	-- Loop
	while true do
		count += 1
		pthread.mutexLock(MUTEX)
		status = cancelThread
		pthread.mutexUnlock(MUTEX)
		if status == true then
			break
		end
		task.wait(0.1)
	end
end)

-- Cancels the above loop after 15 seconds.
task.wait(15)
pthread.mutexLock(MUTEX)
cancelThread = true
pthread.mutexUnlock(MUTEX)

After implementing this and using it, I haven’t had any more problems with it. However, there’s always a risk about things not being atomic.

vvv331 · July 5, 2023, 10:23am

despite task.cancel and coroutine.close being hacky ways to stop threads, this still needs to be handled better regardless. RBXScriptSignals seem to ignore the fact the thread is dead and attempt to resume anyway

Weldify · August 7, 2023, 8:51pm

Bump! I’m also experiencing very annoying coroutine cancellation issues!

sevenpolygons · August 8, 2023, 8:05am

Although this would be nice, the main problem this thread focused on still hasn’t been addressed. This error occurs on the movement handler in my game, and trying to debug while seeing this error message every time the player moves is frustrating. Note: this only appears to happen in Studio. I would make a bug report, but I don’t have high enough permissions. I’ve attached an example of the issue.
coroutineCloseError.rbxl (44.7 KB)

WheretIB · August 8, 2023, 11:28am

We fixed the ‘cannot resume dead coroutine’ error in your example on August 2nd.

I cannot reproduce the issue using it any more. Not sure why you don’t have the update 588.

And a side note, this issue was fixed long ago in task.wait when used together with task.cancel.

(I only talk about the example by @sevenpolygons, not the original post)

WheretIB · August 8, 2023, 11:31am

As of August 3rd, this example now gives an error immediately on task.spawn call (“cannot spawn non-suspended coroutine with arguments”) instead of reporting “cannot resume dead coroutine” later with no call stack.

Moduluous · August 15, 2023, 10:08pm

Any updates on error coming from this code? (mentioned in the post too)

local Thread = task.spawn(function()
	workspace.ChildAdded:Wait()
end)

task.cancel(Thread)
Instance.new("Part", workspace)

WheretIB · August 18, 2023, 3:37pm

Fix for Wait/WaitForChild is implemented internally and should come in a future update (will be mentioned in the Release Notes 592+).

WheretIB · September 7, 2023, 9:27pm

Threads cancelled while waiting on Wait/WaitForChild will no longer report an error.

focasds · September 7, 2023, 9:29pm

Will they still stay in memory after cancelling them? Like keeping the coroutine in memory still until an object gets added or when the event gets triggered?

Or is just the error being removed? (Anyways I like that this was changed)

WheretIB · September 7, 2023, 9:45pm

When task.cancel is called, only the reference to the thread object remains, parameters and target function reference/upvalues are cleared immediately.
Wait completion handler is still registered, but when it’s triggered, it will no longer attempt to resume the target that was cancelled.

Your suggestion for task cancellation callback is still a good idea to handle cases where connections have to be disconnected. Can’t do that automatically - functions passed to Connect are not owned by the thread and in some cases might be intended to remain connected.
But I have no timeline for when such feature will be added.

EmesOfficial · April 16, 2024, 8:41pm

I’m still encountering this error with more niche yielding methods like InvokeServer().

Moduluous · April 19, 2024, 11:24am

Currently having a similar issue with hanging HTTP requests (run the code a few times)

local HttpService = game:GetService("HttpService")

HttpService.HttpEnabled = true

local function Request(timeout)
	HttpService:RequestAsync({ Url = "https://httpstat.us/504?sleep=" .. timeout * 1000 })
end

local function Timeout(timeout)
	local reqThread = task.spawn(Request, timeout)

	task.delay(timeout - 1, function()
		task.cancel(reqThread)
		print("Cancelled")
	end)
end

Timeout(2)

It seems to work fine with GetAsync but, from what I’ve observed, this dead coroutine issue affects almost any method that yields & tries to resume in a thread that has been cancelled after the method yielded.

Is there any chance that this could be patched within the task & coroutine libraries (or internals of them), or maybe in the way that yielding is done? As it seems to be the root cause of the majority of these errors.
In other words, RequestAsync isn’t at fault here, imo.

CarefreeCarrot · May 31, 2024, 10:36pm

Getting this error with remote functions. Was attempting to cancel a thread if client doesn’t respond and times out.

local ResponseReceived = false
local TimedOut = false
local time0 = time()
local thread = task.spawn(function()
  RemoteFunction:InvokeClient(player)
  ResponseReceived = true
end)
while not ResponseReceived do
  task.wait()
  if time() - time0 > 10 then TimedOut = true break end
end
if TimedOut then
  task.cancel(thread) --successful
end

If the server doesn’t receive a response after 10 seconds, it cancels the thread with the InvokeClient, which works properly. HOWEVER, task.cancel seems to not actually kill the thread, since the InvokeClient() call continues to run. I know this because:

--client
RemoteFunction.OnClientInvoke = function()
  task.wait(20)
  return
end

The client waits 20 seconds to respond to the invocation. After 10 seconds, the server times out and supposedly “cancels” the thread. After another 10 seconds, the client returns and the server logs an error message:

cannot resume dead coroutine

This means that the invocation thread is being marked as “dead”, but the thread clearly continues to run, since the invocation manages to receive the client response, even after it was “cancelled”