[Beta] Deferred Lua Event Handling

What’s this delay exactly by the way? Is it like work cycles? Post says invocation point which isn’t really descriptive for me. If it’s like that isn’t there a possibility of two events running at the same time?

6 Likes

Edit: Oh neat, this may not be a problem.

This seems to solve this problem of signals firing re-entrance. As long as events fired in the data model are ok, we should be fine.


I’m sorry, but please do not release this change as-is. This will destroy responsiveness in all of my experienes, and (I assume) many others experiences. The code will still function, but this change may introduce significantly latency (read: multiple frames) of latency.

The root of the issue is that I, and many others chain signals together that flow across our data models. If anyone has code like this, this change introduces latency into core data models.

This change breaks how I, and many other developers, program, forcing me to add latency into my game. I use Roblox’s datamodel as the source of truth. We gave a [programming talk]( 5 Powerful Code Patterns Behind Top Roblox Games) at RDC about this. We use signals such as:

  1. Attributes
  2. CollectionService
  3. ChildAdded/ChildRemoved
  4. PropertyChanged events

I also use signals for internal models that don’t use Roblox as a source of truth.

By doing this, we introduce unavoidable latency into my datamodel updates. For example, consider this scenario.

  1. I instantiate a new object into the game, tag with CollectionService
  2. Event fires (later now), we set some properties and then we maybe instantiate 2-3 other objects with tags. Finally, we parent these.
  3. These child added events fire (later now), and so we do this again
  4. We repeat a few more times, and now stuff loads in over 3-4 frames, instead of one.

Another example if where I recursively replicate properties in my virtual data-model using signals downwards. This will now occur over multiple frames, missing first-rate responsiveness. This is a super common pattern.

  1. Listen to input
  2. Abstract input into a data model and fire off the event (like setting a bool value, or having a custom signal)
  3. Listen to this abstraction instead of the true input event.

By making this change, we introduce a round of latency between this input and the response to the user. In more complicated data models, we may introduce even FURTHER rounds of latency, leading to multiple frame delays. In something like a VR, this is deadly.

It will be very hard to audit all of the places that I am using signals. :Connect occurs 1923 times across 910 scripts in my experiences. These thousands of concurrent users, and my code is used in many more places.

Please consider an alternative behavior. I’m willing to sit down and chat about this, because this is a truly disruptive and breaking change for us.

54 Likes

They will still be handled in the same invocation point before general engine execution continues. You can think of an invocation point like this:

while #deferredInvocations > 0 do
     local nextInvocation = table.remove(deferredInvocations, 1)
     nextInvocation:Run() --> this may end up adding more invocations to the list
end

In that way, this change should never add frames of latency, only change at what point within a given frame stuff happens. In fact, the reason it fully exhausts the invocation queue before continuing is to avoid the exact kind of frame latency that you’re worried about.

34 Likes

For people who are confused. Before, the event firing order was messed up. Now the engine makes sure the order is right to make for a more expected results and avoid nasty order bugs that are hard to track down and chaotic.

That comes with a really tiny delay that you won’t notice at all.

15 Likes

I didn’t realized that many of the issues with coroutine.wrap have been fixed, so that’s nice. It’s going to be annoying to have to write a custom Signal wrapper now though, because this affects a lot of libraries I wrote in the past that used to be standalone and use Bindables, but now need to have an extra implementation in them (not to mention users of said library are going to have it break for them).

One thing I can think of is Rocrastinate, although there’s quite a number of existing lua libraries that have similar behavior, which will now have unpredictable behavior due to this change.

I would strongly encourage considering making this off by default, or having some way for existing code to be backwards-compatible by default.

I use Bindables all the time for state management and in-game events, among other things, and I need listeners to run immediately. Now I have to switch to a custom implementation.

10 Likes

Replica and Rocrastinate—both public libraries that have been used by many other people—are both affected like this, and will now have unpredictable behavior. Now I need to fix both libraries, and then people using my code need to update their forked versions of these libraries or else their games will break. Not how I wanted to spend my Thursday.

8 Likes

If I understand the update well. then it’s like that the lua state stops the events from firing until the previous events get fired. like some kind of mutex in C++.

Right?

5 Likes

I think there’s some confusion about “delay”: It’s not like this change is introducing some arbitrary amount of delay into things. If you have Lua code that invokes an event, and code that handles that event, the total time to run both the handler and the main code will be exactly the same after this change as before it:

image

46 Likes

I have a concern about security with this update.

Right now, certain internal engine events happen between the server receiving client packets, and lua code being able to react to it.

wait() resume is not reliable at all, and it can lead to potentially accumulating performance/lag issues.

Wouldn’t this gap being closed add a new vector for exploiters that can’t be fixed by any lua code measures?

11 Likes

For me the problem with coroutine.wrap is that the wrapped function call can cause the calling thread to be terminated if it errors before it yields.

Here is a code example that illustrates the problem:

coroutine.wrap(error)("test")
print("This does NOT print")

This will never happen when you call coroutine.resume, it is safer and makes protected function calls.

coroutine.resume(coroutine.create(error), "test")
print("This prints")

We do however have to make sure that we manually catch errors when we call coroutine.resume.

Here is my FastSpawn implementation that catches errors without terminating the calling thread:

function Thread.LogAsError(message: string, stacktrace: string): ()
    --@NOTE: Errors printed by TestService will not be caught by the ScriptContext.Error event.
    --So in order for any error logging services to work properly, they need to be manually invoked here.
    TestService:Error(message)
    TestService:Message("\n" .. stacktrace)
end

function Thread.FastSpawn(func: any, ...: any): ()
    local thread = coroutine.create(func)
    local ok, msg = coroutine.resume(thread, ...)
    if(not ok) then Thread.LogAsError(msg, debug.traceback(thread) .. debug.traceback(nil, 2)) end
end
10 Likes

This is honestly an extremely concerning change, mainly because of how broad it is and how many unforeseen bugs will be introduced by changing the execution order of our code. If this goes through, it is going to take a lot of time and effort to go through all of the code in all of my games to fix all of the breaking changes.

I have always coded things under the assumption that events would be fired immediately. I have years and years of code which follows this assumption, and now I’m going to have to audit all of it again.

Really wish there was a backwards-compatibility solution provided for this change. While it may seem like just a minor change in execution order, this will have a lot of unintended consequences for game logic which makes heavy use of events, reading game state as soon as events are fired, and expecting things to be set in a certain order.

34 Likes

You mean we are never going to be able to use “Immediate” form again? I have almost 5k lines of codes just running every frame with RenderStepped and it would break, literally, the entire game and my years of work.

11 Likes

Slight Annoyance: TestService throws this annoying “TestService:” prefix. Also, it doesn’t actually link you to the source script when you click on it on the console, since that would require information that just isn’t present in the error.
image

Kinda sucks that this is now the only alternative to FastSpawn.

4 Likes

Oh dear god…
I’ve found a hacky alternative to FastSpawn that actually works, and I’m afraid is actually practical enough that I’m sure people will use it

Instance tree:
image

FastSpawn (ModuleScript):

--!strict

local function FastSpawn(func: any)
	local thread = coroutine.create(function()
		_G.FunctionBeingFastSpawned = func
		local module = script.FastSpawnRunner:Clone()
		require(module)
		module:Destroy()
	end)
	coroutine.resume(thread)
	_G.FunctionBeingFastSpawned = nil
end

return FastSpawn

FastSpawnRunner (ModuleScript):

--!nocheck
_G.FunctionBeingFastSpawned()

return nil

Example code:

local FastSpawn = require(script.FastSpawn)

local f2 = function()
	error('Hello, World!')
end
FastSpawn(
	function(...)
		f2(...)
	end
)

Output (Works like old FastSpawn):
image

Edit: okay, maybe not
image

Edit 2: See my below reply for a better implementation.

6 Likes

Just to clarify as I’m not sure. I was wondering what would now happen in this case:

something.Event:Connect(someFunction)
dispatch(something.Event) 

someFunction is now added to the queue to be invoked
and will probably be invoked in a few milliseconds now
(as opposed to immediately)

except some other script does the following
just before someFunction is invoked from the handler queue:

something:Destroy() → something is destroyed → something.Event automatically disconnects the
connection as a result → Does someFunction still end up being invoked?

If it still invokes, it might run code that assumed the object “something” still exists and hence will error.

If it doesn’t, I might have mistakenly assumed that someFunction would have eventually ran because something.Event was dispatched which would be my fault granted for assuming but could a warning be shown in this case saying something like

This particular dispatched event was destroyed before it 
could complete the invocation of its connecting functions
6 Likes

I think there is also a problem with Players.PlayerAdded:Connect(player) event which it simply doesn’t triggers as the game launches in the studio test.

9 Likes

Okay, I just found another, much more performant, and much more elegant way to do FastSpawn… use BindableFunctions instead of BindableEvents!

EDIT: Updated to work with nested calls, since BindableFunctions can’t invoke themselves during an OnInvoke callback:

--!strict

local FAST_SPAWN_BINDABLE = Instance.new('BindableFunction')
local FAST_SPAWN_CALLER = function(cb: () -> ())
	cb()
end :: any
local LOCK_IS_INVOKING = false
FAST_SPAWN_BINDABLE.OnInvoke = FAST_SPAWN_CALLER

local function FastSpawn(func: () -> ())
	if LOCK_IS_INVOKING then
		local nestedCallFastSpawnBindable = Instance.new('BindableFunction')
		nestedCallFastSpawnBindable.OnInvoke = FAST_SPAWN_CALLER
		coroutine.resume(
			coroutine.create(function()
				nestedCallFastSpawnBindable:Invoke(func)
			end)
		)
	else
		LOCK_IS_INVOKING = true
		coroutine.resume(
			coroutine.create(function()
				FAST_SPAWN_BINDABLE:Invoke(func)
			end)
		)
		LOCK_IS_INVOKING = false
	end
end

return FastSpawn

Stack trace (Works just like old FastSpawn):
image

Overhead (The “SlowCodeAfterFastSpawn” label is due to external code, so the actual overhead is only a small percentage of what’s showing up here):
image
image

Overhead: 0.036 - 0.029 = 0.007 ms for a 0.029 ms loop:

game:GetService('RunService').Heartbeat:Connect(function()
	for i = 1, 10 do
		debug.profilebegin('FastSpawnOverhead')
		FastSpawn(function()
			debug.profilebegin('SlowCodeAfterFastSpawn')
			for i = 1, 1000 do
				math.random()
			end
			debug.profileend()
		end)
		debug.profileend()
	end
end)
7 Likes

This is a nice update I think. The only thing it might break for me is that in some places I use ValueInstance.Changed+Tweenservice to tween models because there’s currently no way direct way:

local val=Instance.new'CFrameValue'
val.Value=x0
val:GetPropertyChangedSignal'Value':Connect(function()
		model:PivotTo(val.Value)
end)
5 Likes

One immediate issue I’m finding with this behavior is anything using Event:Wait(). If that event is triggered more than once during the deference period, only the first result is making it to the Wait() call. I first discovered this with some Touched events, but then found it would work in basically any circumstance. Consider the contrived example below:

local bindable_event = Instance.new("BindableEvent")

local function spam_event()
	bindable_event:Fire("a")
	bindable_event:Fire("b")
	bindable_event:Fire("c")
	bindable_event:Fire("d")
	bindable_event:Fire("e")
end

spawn(spam_event)
while true do
	print(bindable_event.Event:Wait())
end

In this block, “a” is the only result that ever prints; the rest of the events are entirely ignored. This sinking behavior is not present when using Event:Connect(), but it also does not occur with Immediate Signal Behavior. Is this an intended change?

7 Likes

What I believe is happening here is you’re firing six events at once. And due to how Roblox’s task scheduler handles yields, the deferred event could be preventing any future :Wait() calls being handled.

Without signals deferred, they’d bunny hop between each other, since one call would release a yield, and so it handles that, then reactivates the thread that called it

Obviously thats just speculation.

5 Likes