ComputeModule - a module for optimizing long computation

Hello world!

When doing work that takes a significant amount of time to compute, the usual solution is to yield ever so often to avoid causing lag. However, doing so leaves performance on the table
Frametime
When yielding, the script is resumed on the next frame. Since most users run at a capped fps of, usually 60 fps, the engine just sleeps in between the time all the work for the frame was done, and the point where 16.6ms is reached (1/0.0166 = 60 fps)
Thus, implementing a yield into your code makes the computation take longer to compute, compared to no yield. Though not implementing a yield is bad practice due to the lag spikes it could create

Moreover, fine tuning the point where yielding happens, to ensure the least amount of wasted time, doesn’t work for two reasons. Some devices will do the computation faster than others, and, other scripts in your game already take up some of the 16.6ms, and that amount is likely to be variable


How ComputeModule fixes that issue

ComputeModule keeps track of when the frame started, and when it is supposed to end. With that information, it is able to cram the computation within that area, yielding right before the frame is supposed to end
By default, it detects the fps automatically on the client

ComputeModuleExample

Copy the code
local Signal = ComputationModule:BindFunction(function()
	local n = 0
	
	for i = 1, 10000 do
		
		for i = 1, 1000 do 
			n += 1
		end
		
		ComputationModule:EvaluateYield()
	end
end)

Signal:Wait() -- Computation was completed

– Methods –

To use ComputeModule, require it, then use :BindFunction(Function) and pass a function to it
Once you do some work, call :EvaluateYield(). This method evaluates if there is still time available to keep doing the computation, or if it should yield to the next frame. It also allows other binded functions to run
:EvaluateYield() should be used at appropriate intervals, maybe around 0.2ms to 0.01ms. Doesn’t have to be precise though

When calling :BindFunction(Function), a custom signal is returned (signal only has :Wait() and :Once()). That signal is fired when the function did all the work, or had an error. No arguments are passed by this signal

You can also use :UpdateTargetFPS(TargetFPS) to set the target fps the module will follow. Default is 60 on the server, and Automatic on the client

Since the module calculates the fps for the automatic setting, there is a :GetFPS() method, free of charge :D

– Settings –

DEBUG_ENABLED - Prints information into the output
1 - The amount of time spent not doing actual work (the overhead), in milliseconds, and the % of the total time it represents
2 - The amount of threads that were resumed. More thread resumed = more overhead. The same threads are resumed many times per frame
3 - The amount of active threads at the end of the frame

TIME_BUFFER - How much time to reserve, as a buffer. Since there are no events right at the beginning and right at the end of a frame, some stuff still runs after the compute module (a task named Thread, idk what it does, and Data Sender, on some frames)
Might change this setting to not be a fixed time

MINIMUM_WORK_TIME - The minimum amount of time the module will reserve to let binded functions do work. This is to ensure the work gets done, even if that means causing lag

MINIMUM_CYCLES - Serves a similar purpose to MINIMUM_WORK_TIME, but instead of being the time, it is the minimum cycles it will do. A cycle is every function running once (1 call to :EvaluateYield())

TARGET_FPS - Fps the module will follow. Default is 60 on the server, and Automatic on the client. Can be modified at runtime with :UpdateTargetFPS(TargetFps)

– Files –

Creator Hub
Compute Module.rbxl (122.4 KB) includes the code I have used to test it

Updates

([DD/MM/YYYY], screw you America)

[05/07/2024] - Added a custom signal implementation, fixed grammatical mistakes and renamed stuff from ComputationModule to ComputeModule


The module goes anywhere you want

If you have some code binded to RunService.PostSimulation, you might want to require it from a client script and/or a server script (with RunContext set to server) from ReplicatedFirst. This will ensure that the connection to RunService.PostSimulation in the module is fired after the connections in your own code


If anyone has questions about how it works or whatever, I’ll gladly answer them
If the module doesn’t work properly on your device (such as not being able to get high fps when TARGET_FPS is automatic, or something else), please tell me

Stupid things

There is no event at the beginning of a frame and at the end of a frame, which causes some issues with the module. It’s stupid

Cannot get the fps setting of the menu. Because of that, I had to calculate the fps manually, and set the TargetFps to be a bit higher than the fps, to prevent the module from restricting the fps. If the frametime is higher than expected, it will instead tank the fps down to 60

28 Likes

This is exactly the module I need, I’ve been working with computationally expensive tasks in my experience and have just had to suffer with massive lag-spikes since adding waits, causes them to fire at a bad time and make the operation take ages. Thanks for making this!

It appears you may have accidentally forgotten to make the module public on Creator Hub. :sweat_smile:

5 Likes

I’ve seen that happen recently on other threads, and now it’s my turn lol. Though, I believe assets were public by default before, is that right?

5 Likes

now I can make rtx ray tracing using frames!

4 Likes

This is actually one of the best resources I’ve ever seen only if it works exactly like you claim it does.
But, I have a few suggestions:

Suggestions

Now I know that requesting API changes especially if it involves namespace migration is ridiculous, but I think that :BindFunction() could be renamed to :Compute() which more accurately represents what the function is doing.

I also personally would not rely on BindableEvents for high precision code. A callback implementation would be a lot more reasonable in my opinion. (Another reason for this is that, realistically, you’re only adding 1 event listener to that signal, so it’s definitely a waste of resources.)

Another thing is, I don’t know how you calculate the device FPS, but if you’re doing that every time :EvaluateYield() is called, that’s a huge problem. Calculating FPS also yields by 1/$FPS seconds, so if that’s how you implemented it you should definitely change that.

Btw, you spelled “yield” wrong in your entire post, so you might want to fix that lol.

5 Likes

Are you spelling yield wrong intentionally? But anyways, cool resource! I will check this out.

4 Likes

I personally don’t like callbacks, it is also easier to turn a yield into a callback than a callback into a yield, and roblox practically never have callbacks. I do agree that a BindableEvent is not ideal, though I can’t imagine the performance impact is significant. I am thinking about either making :BindFunction() a yielding function, or making it return a custom signal that has only :Wait() and :Once()

It is calculated every frame, using the last 24 frames. It uses a mix of a median and an average. While it runs every frame, I measured the performance impact and it’s fairly insignificant

For god sake
I never realized I wrote it wrong. Now it’s ingrained in my muscle memory, and I’ll probably keep writing it wrong if I don’t catch it lol. Same thing with length, I write is as lenght…

4 Likes

At least use a custom ScriptSignal module. BindableEvents are incompetent and also, their behavior may differ from game to game since you can change the SignalBehavior property inside the workspace object.

Another way you can do this is just returning a custom object with functions :Await() and :Once() as you said, should be relatively simple to implement.

4 Likes

if an expensive loop is using actors, does it defeat the purpose of this module?

2 Likes

It doesn’t work out of the box, but it does seem to be possible

And it seems to be working as expected

I will probably do an update to officially implement this in the module, but if you want to use it right now., the changes are pretty simple

You’ll have to change RunService:BindToRenderStep() to this. For some reason BindToRenderStepped breaks when used from an actor
image

You’ll also have to use ConnectParallel at the PostSimulation event. This is what makes it, and the binded functions, run in parallel

As for the binded functions, since what calls them runs in parallel, they run in parallel by default. So, there aren’t any changes to be made there

A big issue is that, if you have the module running in serial, and it running in parallel, the module in serial will take up most of the frametime, leaving little to no time for the module in parallel to run. Maybe could be fixed by detecting when it is running in serial and parallel, and making it take half the frametime instead of all of it

I have a module for running code in parallel, and it would be nice if I could combine the two. Though I think, having the parallel modules call ComputeModule would be the best solution, although each actor requiring the module would lead to the fps calculations and other stuff running for every actor, which isn’t ideal

5 Likes

thank you for the quick and helpful response

5 Likes

Sorry to bump, but I made a little discovery that could improve the code with this module, specifically regarding the CurrentFps, and FrameEndTime calculations.

Currently, a list of 1 / deltaTime (from RenderStepped) is stored, and the average is calculated for the CurrentFps. They are averaged because deltaTime jumps around a lot and isn’t accurate using a single frame.

I found that 1 / game.Stats.FrameTime perfectly returns the FPS since it is used in the built-in FPS viewer you can enable by pressing Shift + F5 or by pressing View > Stats > Summary in studio.

image

Here’s the same thing but using 1 / RenderStepped:Wait() to get the deltaTime instead. It’s much more sporadic and not as usable.

And here’s ComputeModule:GetFPS(). I noticed it was a bit off, and even jumped to 300 FPS for some reason…? That happened when I unfocused the window.


Using 1 / game.Stats.FrameTime seems to be the best option, as it’s the most accurate and exactly matches the built-in FPS viewer. If the CurrentFps is just replaced with this, ComputeModule should be a bit more reliable! ComputeModule:GetFPS() will also be more accurate!

Fun fact: There’s also a RenderCPUFrameTime and RenderGPUFrameTime value in the Stats service!

I also noticed this comment in the module:

I wonder what it means? Is this the optimal time leftover to actually render in the frame time? I was able to calculate a close percentage with code like this:

local LogService = game:GetService("LogService")
local Stats = game:GetService("Stats")

for i = 1, 400 do
	LogService:ClearOutput()

	local cpuFps = Stats.RenderCPUFrameTime
	local fps = Stats.FrameTime
	local diff = cpuFps - fps 

	print("CPU FPS", 1 / cpuFps)
	print("FPS:", 1 / fps)
	warn("Ratio:", `{math.floor(diff / fps * 1000) / 10}%`)

	task.wait(1/60)
end

image

This ratio flucates a lot though, and it went negative at one time. I guess if it goes negative, that means the CPU Time > Frame Time. :cry:


Have fun with this information! Let me know if you have any questions.

I’m @bluebxrrybot! :3

1 Like

Interesting

game.Stats.FrameTime would definitely be very useful if it is stable, rather than doing my own fps calculation. However, looking at the documentation, it is only available on the client, but there is HeartbeatTime as well in the Stats service

My fps calculation system works by doing an average of delta times (calculated as a difference of os.clock() from the last frame to the current frame, and not from dt returned by RunService, don’t ask me why…), but it’s not a typical average. The table of previous frametimes is sorted, and only the middle of the array is averaged and used for the fps value. The idea is that the extreme values (low or high frametimes) will go to either the beginning of the array or the end, and I wanted to factor them out

This “buffer” (actually a 10% buffer I think, because I did not update my comment lol) is because the last event of the frame, RunService.PostSimulation, is not the last point of the frame. The roblox engine still uses up a decent portion of cpu time after that, and I have no way of knowing how much that is. ComputeModule is made to keep computing until the target end time of the frame is reached, but I have to leave some room for those roblox tasks to happen after
I do that already here, but um, decided to add more lol

-- This is for (what I think is) Data Replication that happens on some frames, and a "Thread" thread that happens on every frame. Idk what that thread is
-- These happen after the heartbeat event, the last event fired. There is no actual way to run the computation after these threads that I know of
local TIME_BUFFER : number = 0.0035 -- 3.5 ms

I also wanted to avoid at all cost a case where ComputeModule would overestimate the end time of the frame, leading to a lower fps, overestimation again, lower fps, again and again, which would cause the fps to tank to the 60 fps minimum I’ve set

However, thinking about it, I could probably calculate the target fps while removing the time ComputeModule itself is taking, factoring it out of the fps calculations


This module is one I want to revisit for sure. There is a lot of potential to optimize code using parallel lua along with this module, but doing so will be challenging as it would have to manage multiple instances of the module running, under different actors, and communication between actors is costly for performance

Perhaps task.defer() could allow the engine to resume other ComputeModules within one frame without having a complicated cross actor communication system between each ComputeModule instance. That’s an idea likely to fail, but worth checking

I don’t have to the time for it though, god I wish I was young again lol (not that I’m old…)

2 Likes