PSA: ModuleScripts and RBXScriptSignals don't work together exactly how you'd expect

qwertyexpert · February 11, 2022, 9:44am

TL;DR

Don’t connect to RBXScriptSignals directly inside of ModuleScript methods, or you could have a very bad time debugging. There is a way to do this without incurring headaches; see the very bottom of this post for solutions.

`ModuleScript`s

ModuleScripts are very useful for game development. They allow you to centralize your state management such that the module becomes the source of truth and other scripts in your game can freely require the module in order to access the state that it manages.

A module can represent ownership of whatever it does, and provide a convenient API that other scripts can access, without any of the expensive value cloning or bookkeeping that using BindableFunctions typically entails. This is because calling a ModuleScript method is just calling a regular function.

This really allows you to mesh your game’s systems together in a way that was just too difficult or impossible to do before. You can freely share objects and functions, mutate state and all of this instantly and with no overhead, because it doesn’t have to cross the reflection boundary. Once you’ve required a module, it’s yours.

Trying to make a modern Roblox game without ModuleScripts is a pile of duplicated code, difficult interop, and technical debt just waiting to happen. Almost every type of game lends itself very nicely to being separated out into ModuleScripts, and it’s almost always better to use them than not.

I use ModuleScripts for persistent state that hangs around, and Scripts whenever execution needs to start on something. For example, tools contain a Script that interact with the game’s ModuleScripts to activate certain behaviors. The very, very eagle-eyed of you may already see where this is going, but do keep reading.

`RBXScriptSignal`s

RBXScriptSignals are similarly useful for game development, and you will find them absolutely everywhere. They’ve been around since the very beginning of the engine. Nearly anything useful you do is going to involve these.

Most game development is going to involve a lot of event connections to incrementally update your state and the data model as needed. There are some really cool and efficient ways to use events and I am a huge advocate of yieldless code without any task.wait.

When `ModuleScript`s and `RBXScriptSignal`s mix

Chances are that your ModuleScript does not just contain an object for other scripts to read from. Sometimes they do, but let’s say you’re making one that manages some real gameplay state in the data model. I’m going to use a real example for this one – a cloaking module that manages making player characters invisible to others.

Thanks to it being a ModuleScript, it can provide functions on the module boundary for cloaking players, and can ensure that overlaps do not result in glitchy behavior, because it can keep track of everything in one centralized location.

That script is going to need to use events for any semblence of efficiency. After all, traversing descendant trees every single frame is very expensive compared to cloaking/uncloaking individual parts as they enter and leave the cloaked player’s character.

Why am I going on about this? You’ll see.

How Roblox handles connecting to events

When you connect to an event, Roblox links the event connection to your script so that it can be cleaned up when your script is destroyed. Normally, this isn’t a problem, but when you’re directly calling functions from other scripts, Roblox might incorrectly determine where a connection comes from, leading to the script whose function you called actually observing the event disconnecting when it’s not supposed to.

You see, the lifetime of a script connection is not at all linked to the lifetime of the RBXScriptConnection and whether or not it gets garbage collected. In fact, the majority of them are garbage collected almost immediately after creation due to them not being stored in a variable.

It’s linked only to the shorter of two lifetimes - the lifetime of the Instance itself (lasting from when it’s created to when it’s Destroyed), and the lifetime of the script that created the connection. When you call RBXScriptSignal:Connect, Roblox silently links the connection to the calling script.

But there’s a mistake. A tiny flaw in the engine that I’m told works exactly as intended, but might not actually exhibit behavior that makes sense unless you know what’s going on.

How Roblox determines which script connected to a signal

Here’s the real meat of the problem. Roblox script execution is based on coroutines. Roblox can only store information about where code comes from per-coroutine. The exact object that’s stored behind that userdata pointer is known internally and to most reverse engineers (i.e. exploits) as “RBXExtraSpace” (or RBX::ExtraSpace).

It stores information such as the thread’s security context level, the script the thread came from, and probably a whole lot more, but what’s really important here is that the “calling script” is stored in a per-coroutine object.

For some of you this probably just clicked. When you call a method from a ModuleScript, that function is executing in your coroutine, not the ModuleScript’s. That means that any event connections the module makes inside of that method is attributed to the calling script, not the module that defined the method.

Sometimes this is ok. Maybe the event connections are only used to pipe data to the caller. Maybe the module doesn’t actually hook them up to any internal bookkeeping. But maybe it’s not ok. Maybe the module needs those connections longer than its caller does.

How this starts to break down and cause subtle bugs in practice

The engine behavior is technically sound. From an abstract standpoint this is an acceptable way to do things. This is nothing that will cause unsafety or privilege escalation, even though it would if security context levels weren’t separated into multiple global states (which they are, BTW). However I consider it a bug because it introduces pitfalls in userland code that result in subtle misbehaviors that are nearly impossible for junior developers to comprehend.

Let’s say you have a cloaking tool that is single-use. When you activate it, it adds 10 seconds of cloaking and then consumes the tool. The cloaking is supposed to cover any new tool you equip as well as any tools you pick up, so the cloaking module uses a DescendantAdded connection to cloak any new descendants of your character.

The first time you activate the cloaking tool, the cloaking module will add its DescendantAdded listeners inside the method call, because you aren’t already cloaked. Roblox will attribute these connections to the tool, not the module, and then when the tool consumes itself, these connections will disconnect.

But they shouldn’t. The cloaking module still needs those connections. It’s still using those connections. The cloaking module didn’t go anywhere, but roblox “cleaned up” things that it was actually depending on and using. That means this behavior is not only observable, but it actually causes issues.

What happens? Well, the cloak stops working properly. It’s actually possible to read the RBXScriptConnection.Connected property and see it switch to false the first time the event would have fired after the tool is deleted.

Is this a bug?

I have contacted @Bug-Support about this issue. They have assured me that this is perfectly intended behavior despite not being documented anywhere. ~~I am working with them and DevRel to get this filed as a bug report, but since I am a New Member it might either take a while or just never happen.~~

I have also talked with @zeuxcg about the issue and he seems to agree that this is relatively unexpected behavior. I haven’t gotten too many statements from him on the matter other than “I doubt it’s intentional”.

I believe a case can be made that even if this is perfectly intended, functional behavior, it causes really subtle bugs, and something should probably be done about it.

EDIT: I have indeed been told that this is completely intended behavior, not a bug, and that changing it would require a feature request. As New Members are just flat-out not allowed to make any sort of feature requests, it is completely impossible for me to attempt to change this now. I encourage any full Members to make a feature request to fix this if they can.

What can be done about it?

The root cause of the issue is that you’re allowing your signal connections to be made inside a coroutine owned by someone else. Even though you defined the function in your ModuleScript, it’s not being used there.

There are two ways to solve this:

Fix the Y problem. Run the RBXScriptSignal:Connect function in a coroutine owned by the ModuleScript so that it’ll be linked to the right script.
Fix the root cause, the X problem. Create a general-purpose solution for transferring function execution onto a coroutine that is properly linked to your ModuleScript so that signal connections, but also everything else, find the right script.

Fixing the Y problem

First we have the Y problem - the fix for connections being linked to the wrong script:

--!strict

return function()
	return coroutine.wrap(function(signal: RBXScriptSignal, handler: (...any) -> ...any)
		while true do signal, handler = coroutine.yield(signal:Connect(handler)) end
	end) :: (signal: RBXScriptSignal, handler: (...any) -> ...any) -> RBXScriptConnection
end

Here’s how to use it:

local connect = require(Path.To.Connect)()

function Module:SomeMethod()
	-- ...
	local connection = connect(instance.DescendantAdded, function(desc)
		print(desc:GetFullName())
	end)
	-- ...
end

The source looks a little scary, so let me explain it step-by-step:

Calling the function returned by the module creates a wrapped coroutine that, crucially, inherits the ModuleScript’s userdata. This is completely invisible to your code but what happens behind the scenes is that when a coroutine is created, userthread is called, which, among other things specific to the Roblox engine, links the new coroutine to the same script as the one that created it.
Calling the wrapped coroutine executes signal:Connect inside of it, ensuring the connection is linked to the right script.

Using this method, connections will no longer ever disconnect on you - since ModuleScripts cannot be cleaned up (their objects may still be in use by other scripts in the game!).

Fixing the X problem

To fix the X problem we want a general-purpose solution for shoving functions onto a different coroutine. We want yielding to work correctly and we want proper stack traces if an error occurs.

To do this you can use a variation of the fastSpawn pattern with a BindableFunction instead of a BindableEvent, so that your code execution yields until the function’s completion, and can get its result:

--!nocheck

return function()
	local boundFunc: (...never) -> ...any
	local boundArgs: {[number]: any, n: number}
	local boundReturns: {[number]: any, n: number}

	local bindable = Instance.new('BindableFunction')
	bindable.OnInvoke = function()
		boundReturns = table.pack(boundFunc(table.unpack(boundArgs, 1, boundArgs.n)))
	end

	return function<T..., R...>(func: (T...) -> R..., ...: T...): R...
		boundFunc = func
		boundArgs = table.pack(...)
		bindable:Invoke()
		return table.unpack(boundReturns, 1, boundReturns.n)
	end
end

(Note: The above code exclusively uses upvalues to avoid cloning overhead)

You use this very similarly to the connect example above:

local run = require(Path.To.Run)()

function Module:SomeMethod(arg1: number, arg2: boolean)
	return run(function(arg1: number, arg2: boolean)
		-- ...
		local connection = instance.DescendantAdded:Connect(function(desc)
			print(desc:GetFullName())
		end)
		-- ...
	end, arg1, arg2)
end

(Note: The above example takes advantage of identical function objects being rawequal, which is why I use arguments instead of upvalues)

Conclusion

Due to the way ModuleScripts are typically used, there are some subtle engine behaviors that get slightly messed up if you’re not careful. This is yet another one of those cases where two features are completely fine on their own, but have gotchas when used together.

It is my hope that this post gives you insight on this problem that you might have experienced or might experience in the future. You should now have all the knowledge required to understand what this issue is, why it happens, and why my solutions both fix the problem.

Happy scripting!

Let me know how I did with this write-up, this is my first long-form DevForum post in a while. :)

AsynchronousMatrix · February 12, 2022, 4:06pm

Ah, I recently found a problem with ModuleScripts too, It has to do with the Coroutine Library.
If you yield a module with coroutine, then resume it after, it ends up never returning an object and infinitely yielding.

task.delay(2, coroutine.resume, coroutine.running(), { })

return coroutine.yield()

Or maybe this is intentional as well?

qwertyexpert · February 12, 2022, 5:03pm

Doubt it, it’s probably a limitation, due to the ModuleScript returning while not under the Roblox task scheduler. The issue should go away if you use task.spawn instead of coroutine.resume.

In this specific case you’d just get rid of the indirection, i.e. task.delay(2, task.spawn, coroutine.running(), {}) → task.delay(2, coroutine.running(), {}) → wait(2) return {}

AsynchronousMatrix · February 12, 2022, 8:51pm

Yeah, I ended up wrapping it with a weird repeat statement…
Doesn’t task.wait / wait work the same way as coroutine.yield? It produces the same error if it tries to resume a dead thread, though that is all I can judge it on.

qwertyexpert · February 13, 2022, 6:55am

The difference between the task family of functions and the coroutine family of functions is that coroutine resumes a coroutine “raw”, which means it doesn’t have any of the Roblox scheduler niceties like printing a message to the output when it errors. task resumes it “managed”, which means Roblox will print an error if the coroutine errors, and apparently will be able to process ModuleScript returns.

I’ve never run into any issues with WaitForChild/etc in a ModuleScript, so it is probably just the fact that you’re not resuming it under the Roblox scheduler.