[Beta] Deferred Lua Event Handling

Good luck fps games. you have my best wishes <3

local step = 0

game:GetService("RunService"):BindToRenderStep("hello", 2001, function()
	step = "while bindtorender"
end)

game:GetService("RunService").RenderStepped:Connect(function()
	step = "while render"
end)

game:GetService("RunService").Heartbeat:Connect(function()
	step = "before everything"
end)

game:GetService("ContextActionService"):BindAction("cbt", function()
	print(step)
end, false, Enum.UserInputType.MouseMovement)
--[[ game:GetService("UserInputService").InputBegan:Connect(function()
    print(step)
end) --]]

Something to note, using deferred behavior in both methods, ContextActionService preserves the existing immediate behavior, whereas input began and by extension other input events do not.

Also, a solution to the PlayerAdded event problem is to loop through existing players and call your added function for them. you probably should have been doing this anyway, but i completely agree with the counter arguments to this.

We should not have to update all our code that relies on something as fundamental as events to reflect this change that nobody really needs (and even if it were needed, this is still inappropriate as far as ergonemics go) it’s going to for sure increase performance in places where signals are abused, but the compatibility disadvantages far outweight that. Please make this forever optional, or just never force this change in the first place.

4 Likes

Thanks for all the feedback, everyone. This feature is clearly proving to be much more complicated than we anticipated, and we will adjust the development and release plans accordingly. I want to go over some misunderstandings in this thread, and our plan for next steps - you’re welcome to ask for clarification on any given point but please bear with us.

Please read this in full before posting further comments on the thread.

Is this change necessary?

There’s a misconception that this change breaks existing games for no benefit. Our view is that if we don’t ever do this, in 5 years the platform will have been much worse off. I won’t belabor this too much but just three quick examples:

  • Client and server-side processing of incoming packets is largely single-threaded. We parallelized physics processing to an extent and are working on some event improvements on the server side, but while we have a contract with developers that events from properties coming in the network stream immediately invoke Lua callbacks, this will not change.
  • The events around instance reparenting are very complicated; doing any modification of the said hierarchy from a Lua callback can often result in bugs that are extremely difficult for us to fix. We have found cases where these mutations can break subtle invariants in engine state, cases where this can result in complex memory leaks, and all of these are very difficult for us to fix systemically without performance degradation in a different way.
  • Due to the sporadic nature of Lua events, it’s difficult to ensure the engine sandbox safety. We want to make sure it’s impossible to compromise security of any system that runs any Roblox application; many vulnerabilities we’ve fixed in the past would not have been possible to achieve if this change was in effect.

In short - it’s not just that there’s some minor performance gains to be had from this change - the immediate nature of Lua event invocation coupled with the very high density of the event invocation points severely hinders our ability to innovate wrt performance and robustness.

When will this change happen?

Something that we should have been much more clear about in the initial announcement - the reason why we don’t have a timeline for this yet is precisely because we don’t know to what extent:

a) The new set of rules works for most existing games
b) We can expect developers to adapt to the new set of rules in a timely manner

We tested this change internally, however unfortunately we can’t really test this on any games you have created, because we don’t have source access and we can’t just enable something like this for testing for a given game; as such, the set of different games we’ve tested was limited. Our intention was to start talking to the community to help highlight areas of concern and problems due to this change - hence this thread - but:

It is not our intention to break your games.

Our intention is to polish this set of behaviors until it works out of the box for most games with minimal set of issues, work with developers to test their games using the new mode, change the defaults after we’ve given plenty of notice, then work with developers who had to disable this change for their games to understand how best to proceed, and only if the set of games that had to opt out is very small, would we completely switch to the new behavior.

We would expect this entire process to take years. Our initial hope was that we would be able to change the defaults at the end of this year, but given the volume of feedback we will need to reassess to what extent this is practical.

Again, we will not change the behavior of your games until we’re convinced that doing so results in minimal disruption through a combination of us improving the system and developers adapting to change over time.

Should I switch to the new mode right now?

To give us a bit of time to carefully comb through the feedback, categorize it and make sure we have a plan for everything, we are changing the new mode to only be active in Studio. This more accurately reflects the status of this change, which is an exploration / beta.

Please still let us know when you find issues with this change. There is some leeway for us to change the behavior here; for example, people brought up FastSpawn as an example. I’m hoping that there’s no disagreement that FastSpawn is a hack and really shouldn’t exist - in fact, we have a set of new APIs like task.defer that should make it unnecessary, but unfortunately we didn’t correctly order this beta so you don’t have access to these - but if we need to keep it working, we can change rules around some events to be immediate, because specifically bindables are less problematic than other events as the engine will never trigger these directly.

Our intention is for the new mode to be reasonably easy to migrate to (this should be much less difficult than FilteringEnabled) - we are clearly not there yet but we will try to get there.

What are your next steps?

Again, thanks for all the feedback. Here’s what we’re doing in response.

We will carefully review each specific problem that people brought up and see if we can come up with fixes to the existing rule set to make code like this just work. Please continue to submit examples of code that gets broken as a result of this change, but please be as specific as possible to help us debug this problem.

We have changed the new behavior to only be active in Studio for the time being to not give folks a wrong impression - we have bugs to fix and more testing to do. We will not give a timeline for any changes in existing behavior until we’re confident that the new behavior doesn’t have bugs and we’ve done all we can to mitigate the breakage from our end.

We will perform more internal testing, reaching out to some of you that expressed concerns about the game malfunctioning as a result of the change and working with you to understand how to best proceed.

We will release a separate document on DevHub explaining the difference between old and new behavior in detail as well as explaining when this change can lead to issues and how to best fix them.

We will release the task.* APIs to make sure developers have good clean alternatives to migrate to when using this new mode, when this is necessary.

We will investigate and fix bugs in our own scripts, thanks @DataBrain for bringing these up.

After this is done, we will enable this change on client/servers again (only for games that chose to opt into the new behavior) which would mean that developers can start adjusting their games in hopefully minimal ways to make them compatible with this change.

Only after we see some initial feedback from that will we be ready to announce the first timeline - one where the change would become the default (with an understanding that for any developer that has a game severely impacted by this it takes a single click in Studio to go back to the old behavior).

And it’s only after that that we will be able to see, based on working with developers on this migration, whether a full switch to the new behavior is practical and when it can realistically happen. Expect this to take years.

123 Likes

How come you don’t check it on your side?

2 Likes

@zeuxcg Thanks for this very clarifying and thoughtful response.

Small point on FastSpawn, since I do agree it’s a hack and should not exist—I don’t think task.defer (if it works in any ways similar to golang’s defer statement) has the same effect that is trying to be achieved with FastSpawn; rather, FastSpawn is meant to run a new coroutine immediately (which is already possible with coroutine.wrap and coroutine.create), but in a separate thread that does not stop the current thread if there is an error, while maintaining full debug functionality (i.e. being able to click on the error or parts of the traceback from a thread that was fastspawned and have it link back to the source script). Most people prefer FastSpawn over coroutines for debug purposes only. Typically this overhead isn’t significant enough to cause lag compared to the code processes in general or just the Roblox engine itself.

That’s why my FastSpawn implementation goes through the overhead of using BindableFunctions; even though it spawns a new coroutine to call that bindable function, it abuses the fact that OnInvoke will create a new traceback for which, if an error occurs, it will be like it came from a unique entry point in a regular Script, LocalScript, or ModuleScript. The only difference is that this creation of a new entry point is not deferred, and runs immediately during the FastSpawn call, unless the thread being FastSpawned yields or errors.

10 Likes

Yeah, thanks for this clarification. The initial version of my reply had task.spawn in it but we changed it to task.defer; I thought that we already have a immediate spawn in our planned task. APIs but it appears that we don’t yet; we will address this as well.

23 Likes

Thank you for this detailed response, this definitely helps ease a lot of the initial worries I had about the rollout of this change and is incredibly appreciated. The lack of information was the main concern for me, as it was extremely hard to gauge just how much would be affected by this change and how we would be able to fix affected code. With more resources being made available to assist us with this change, this will definitely help make the transition go a lot smoother.

17 Likes

Exactly how much of a performance gain could this give?

1 Like

This update seems to be breaking some Roblox core scripts, too. Sometimes it’s as simple as a single, inconsequential error being thrown every time the developer console is opened, and sometimes it’s as broken as, well…

This error wasn’t even triggered with events. It’s odd to develop with with differed signals in mind when Roblox itself seems unprepared.

3 Likes

With the deferred behaviour will it be possible to do something like this?

local myEvent = Instance.new("BindableEvent")

myEvent.Event:Connect(function()
    wait(1.5) -- some work
end)

myEvent.Event:Connect(function()
    wait(0.5) -- some work
end)

myEvent:Fire()
myEvent.Event:Wait() -- thread resumes after all event handlers have finished (~1.5 seconds)

If you try the above code today, you’ll just yield forever because by the time you call Wait() on the event, the signal has already fired. If anyone knows of a clean way to do this let me know :slight_smile:

3 Likes

For people who’s complaining about this change.
If your game breaks because of it. that’s a sign of a bad game structure.
Because if so. your game was relying on a chaotic behavior that will eventually bug down when the right conditions are met.
This new update is really good. Roblox is making sure the results are consistent with-in all sessions for events firing order. From what I’ve seen, they really needed to change this because they’re working on synchronizing roblox instances with parallel lua I suppose.

1 Like

This sort of messed up the respawn logic inside one of my games. I use the CharacterRemoving event to check the player’s position right before they respawn so I can set the respawn location to the nearest spawn rather than a random spawn location. Since the event is deferred, it runs after the player respawns, which completely ruins the logic. I would expect this kind of behavior on the CharacterRemoved event rather than the CharacterRemoving event.

3 Likes

In Deferred SignalBehavior mode, it seems that changes to gui items are also deferred. For example making a gui element like a Frame not visible changing is Size and/or Position then making it Visible again will give ghost artifacts of the Frame in the previous position/size before it is shown in the new position/size.

The simple example below makes a frame not visible, make the size 0, 0 and moves it to the position of the left mouse click. Then makes the frame visible and tweens to size. 1 seconds later it makes the frame not visible. When you left click again you will see a flash of the frame in the old position. This does not happen in Immediate mode.

At this point the only way to avoid the artifacts in Deferred mode is to add a wait() after changing the size and position before making it visible again. This is not a desirable and definitely not a performance boost.

local Players = game:GetService("Players")
local player = Players.LocalPlayer
local pGui = player:WaitForChild("PlayerGui")
local mouse = player:GetMouse()

local DEFAULT_FRAME_SIZE = UDim2.fromOffset(100, 100)
local MINIMIZED_FRAME_SIZE = UDim2.fromOffset(0, 0)

local sGui = Instance.new("ScreenGui", pGui)
sGui.Enabled = true

local frame = Instance.new("Frame", sGui)
frame.Visible = false
frame.Size = DEFAULT_FRAME_SIZE
frame.AnchorPoint = Vector2.new(0.5, 0.5)
frame.BorderSizePixel = 0
frame.Position = UDim2.fromOffset(50, 50)
frame.BackgroundColor3 = Color3.new(1, 0, 0)


local debounce = false

local function positionFrame(pos: Vector2)
	frame.Visible = false
	frame.Size = MINIMIZED_FRAME_SIZE
	frame.Position = UDim2.fromOffset(pos.X, pos.Y)
	frame.Visible = true
	frame:TweenSize(DEFAULT_FRAME_SIZE, Enum.EasingDirection.In, Enum.EasingStyle.Quad, 0.5, true, function()
		wait(1)
		frame.Visible = false
		debounce = false
	end)
end

mouse.Button1Down:Connect(function()
	if not debounce then
		debounce = true
		positionFrame(Vector2.new(mouse.X, mouse.Y))
	end
end)
7 Likes

Hopefully there is a noticeable performance improvement in big servers with this feature enabled. I tested my game and it seemed like nothing was broken.

Both Roblox and developer code appears not to be working as a sign of this update, so I’m not sure how this can be chalked up to bad code structure. Could you, by any chance, elaborate on what developers should have been doing to avoid these issues? How could we have future-proofed our code so that it would work effectively whether or not events eventually had a Deferred mode on?

17 Likes

Just because I got the monkey brain, if I dont understand any of this should I have to worry about updating my games or worry about stuff breaking? Can someone give me a laymen terms version of what’s actually changing here?

That’s not good. I work with UI and the ability to set elements to invisible or move them in the span of one frame is absolutely critical. @zeuxcg should take a look at this.

I’m glad this change has been clarified, though I wish it had been so sooner. I just hope we work out all the kinks like above first.

2 Likes

Basically, a lot of programmers have written code based on a lot of assumptions on how their code will function/what order things will run in.

Now, all of the sudden, roblox wants to change the order in which certain pieces of code run in order to optimize the engine in general, and make way for multithreaded code that runs on all processors in the future. However, as has been shown, a lot of existing (especially well-structured) code will all the suddenly break, and there are a lot of unpredictable bugs depending on how much you relied on the assumptions that are no longer true under this change.

Luckily, by default, this change is not enabled, and according to Arseny it won’t be enabled by default until it can be ensured that most games won’t completely break as a result. Right now there’s some bugs even with roblox’s own core scripts that control the camera and UI, so you really have to test it out yourself.

In workspace, the property called “SignalBehavior” controls whether these changes are enabled or not. If you set the SignalBehavior to Deferred, it will enable the new changes, so you can test out for yourself whether or not you notice any differences or bugs in your existing games. If there are issues, then you should set the SignalBehavior to Immediate, which will permanently disable these changes.

Right now, SignalBehavior is set to “Default” on all places, which right now will disable these new changes, but in the future they might be enabled if it’s left on “Default”.

5 Likes

I adjusted my game and plugins to support the deferred signal behavior this afternoon. I mainly needed to adjust my “safe call” implementation to use xpcall / coroutine.wrap instead of bindable events. I hadn’t noticed coroutine.wrap’s error tracing was fixed before now. Here’s what I ended up replacing it with:

Code
-- Silencing errors is bad. This error handler prints out clean errors using a bindable event.
-- It's unfortunate that we can't log the errors properly with red and blue text.
local bindable = Instance.new("BindableEvent")
bindable.Event:Connect(error)
local function errorHandler(msg)
	bindable:Fire(tostring(msg) .. "\nStack Begin\n" .. debug.traceback() .. "Stack End")
end


return {
	-- Immediately call f, but prevent it from interrupting the thread.
	SafeCall = function(f, ...)
		coroutine.wrap(xpcall)(f, errorHandler, ...)
	end,

	-- Immediately call f, but prevent it from interrupting the thread.
	-- It is assumed that 'f' doesn't yield. My debug implementation
	-- calls f during a __index metamethod.
	SafeCallNoYield = function(f, ...)
		xpcall(f, errorHandler, ...)
	end,

	-- This is what I use mostly. I only use it for "Async" Roblox APIs.
	FastSpawn = function(f, ...)--$inline_AssumeEllipsesSafe
		coroutine.wrap(f)(...)
	end,
}

This doesn’t seem ideal. Everything else is doable, but this is extremely tedious to fix for every case. If I disconnect a function from an event, I expect that it will no longer be called. This is likely to introduce rare edge cases in event-based code that otherwise works 99% of the time. My project uses 100% Lua-based signals for game logic, and I defer events until the end of the frame for a lot of things. When I disconnect an object (like a UI element or animation), I always make sure to remove anything it has added from the invocation queue.


I do it exactly like this at the end of my main RunService connections, except without table.remove for performance reasons.


Most experiences on Roblox are just hacked together and fall apart easily as the project grows. These are my recommendations for semi-advanced scripters planning to develop a huge project:

  1. Design everything to be disconnectable. A maid class makes this easy. Every event:Connect(...) should have a corresponding connection:Disconnect() (unless you destroy the signal’s instance and know what you’re doing.)
  2. Avoid yielding / spawning (wait, spawn, delay, WaitForChild, etc.) Roblox APIs are the exception, but I don’t like it.
  3. Create a a custom Lua-based scheduler that is disconnectable. This is a solid alternative to wait. When a player leaves, you just disconnect their maid and you don’t need to worry about any threads or memory references hanging around due to wait or other yielding nonsense.
  4. Use table-based signals if you can. They are lightweight and more efficient than BindableEvents, but it’s important to watch out for edge cases where a method disconnects a second method that is yet to be called while the event is firing. Custom signals don’t even need to use coroutine.wrap when nothing yields! (see 2.)
  5. Use ModuleScripts with 1 Script and 1 LocalScript as entry points. This gives you complete control over when your code runs. Get WaitForChild and those pesky “this script ran before that one” edge cases out of your life.
  6. Don’t keep your guis in StarterGui. Create them programmatically or clone them from ReplicatedStorage. Otherwise you will need to WaitForChild for every single button and instance.
  7. The first thing you need to set up is server and client error reporting. After 5 years of full time development, my project is 150k lines of code and it’s easy to keep completely error-free because of this.
  8. Keep ReplicatedStorage as slim as possible. Store as much as you can in ServerStorage. I replicate instances (even ModuleScripts) privately to players by parenting it to their PlayerGui, firing a reference to them using a RemoteEvent, then immediately parenting it to nil. This can vastly improve join times, make your project many times more scalable, and use less memory.

I have developed dozens of buggy frameworks before this. These are the main reasons I’ve been able to work on this project for so long, and why it was easy for me to transition to deferred event handling.

52 Likes

By the way, if you want table-based signal behavior that acts identically to Roblox signals with respect to what happens when you connect / disconnect handlers in the middle of the event firing, this is the code you want. Pretty efficient too because it doesn’t use any tables other than the signal object itself and the connection objects (no list of handlers / invalidation state is needed):

Table based event code
	function Connection:Disconnect()
		assert(self._connected, "Can't disconnect a connection twice.", 2)
		self._connected = false

		if self._signal._handlerListHead == self then
			self._signal._handlerListHead = self._next
		else
			local prev = self._signal._handlerListHead
			while prev and prev._next ~= self do
				prev = prev._next
			end
			if prev then
				prev._next = self._next
			end
		end
	end

	function Signal:Connect(fn)
		local connection = setmetatable({
			_fn = fn,
			_next = self._handlerListHead,
		}, Connection)
		self._handlerListHead = connection
		return connection
	end

	function Signal:Fire(...)
		local item = self._handlerListHead
		while item do
			if item._connected then
				-- Or spawn a coroutine depending on the behavior you want
				item._fn(...)
			end
			item = item._next
		end
	end
34 Likes