Why nearly every function in Roblox at least TWICE as slow as manually written alternative?

Hello, guys.
I wanted to test speeds of roblox functions, like math.deg, math.rad, math.sqrt, lerping 2 values…
And I compared them to my own alternatives like for math.rad = n * 0.017453292519943295
And every time I tried doing so, I ended with results that Roblox built-in functions are at least TWICE as slow, then my alternative.
So I got questions:

  1. WHY this occurs?
  2. Should I replace every roblox function with my own if possible (not possible I think is smth like math.random)
2 Likes

It shouldn’t really matter here, If you only using it a couple of times, or once everytime an event fires, we are speaking about only a microsecond difference between them, making it an unnessecary change.

4 Likes

While in most situations this acceptible, there’s a lot of situations when you need use such functions millions or more times per second. Like world generation.

3 Likes

It’s a bit slower because it’s a function call. Have you tried localizing individual functions from the math lib? You may be able to speed them up by around 30%. Otherwise, parallel luau offers some performance benefits for raycasting and world generation by splitting the processes between CPU cores in a more efficient manner.

local mathRad = math.rad
mathRad()

The slowest one is probably math.pow(), but you can use the exponent (^) operator.

EDIT.

In most programming langauges operation speed goes approximately in this order:

addition > subtraction > multiplication > division

sqrt > exp > sin and cos > tan > any arcus function

4 Likes

Hm. Localizating functions really speeds up functions by 20% But, they are still slower than making alternatives in code.
Just how I tested:

local function test(b)
	return b * 0.017453292519943295
end
local start = os.clock()
for i = 1, 1000000, 1 do
	local xd = test(30)
end
print(os.clock() - start) -- slowest

start = os.clock()
for i = 1, 1000000, 1 do
	local xd = 30 * 0.017453292519943295
end
print(os.clock() - start) -- fastest

start = os.clock()
for i = 1, 1000000, 1 do
	local xd = math.rad(30)
end
print(os.clock() - start) -- 3-rd

local rad = math.rad
start = os.clock()
for i = 1, 1000000, 1 do
	local xd = rad(30)
end
print(os.clock() - start) -- 2nd

Output:

0.0326233191335632
0.008704013995156856
0.02040619624312967
0.017391963981935987

Also

This applies only if your servers have big amount of players, so until your server have 1-19 server size, your game will have 1 core. 20-29 = 2 cores, 30-49 = 3 cores. 100 = 5, 700 = 9…

3 Likes

It’s a negligible difference.

As people have said, it’s just the same thing plus indexing and calling a function from a global.

There is nothing wrong with this, think about it as taking more steps to get what you want.

1 Like

You’re competing against heavily optimized C functions and functions that are interpreted by Luau’s VM. You aren’t going to win. Roblox also applies specific optimizations to math calls under the hood.

Please also note that most of your test cases do nothing. Luau optimizes the math away before the test runs, so you’re basically benchmarking nothing at all: Constant folding - Wikipedia

3 Likes

so, if I want make real testing, I need use diffirent values every time?

2 Likes

That’s a good point. Lua VM also constantly keeps accessing C, so the operations in C are undoubtedly faster, but the loss comes from the communication and the resumption of the code.

Same goes for multiplying. Multiplication in lua is slower than pure multiplication and C, but still faster than calling a math/string/table lib. Localizing at it’s core saves some time it takes to index a global.

2 Likes

I’m pretty sure --!optimize 0 at the top of your code will disable all of Luau’s optimizations if you really want to test raw performance. However, if you’re trying to actually make math functions faster than Roblox’s built-in functions, you’re wasting your time. Luau’s compiler and VM are very smart when it comes to optimization (as are most languages).

3 Likes

but still faster than calling a math/string/table lib

This argument is valid in vanilla Lua, but Luau optimizations have turned doing this into a micro-optimization. Luau now has specific operations dedicated to handling ‘imports’ (chained global indexes like math.sin, etc.) and ‘fast calls’ (certain functions that are specifically identified by the compiler to be optimized and called in a faster way). Most (if not all) of the math library’s functions and some of the string library’s functions are optimized with this fast call behavior.

2 Likes

Either I’m stupid, or idk what else possible, but even if I’ll change 30 to i, and place --!optimize 0 or --!optimize 2, I’ll get same result in terms of what’s faster.

--!optimize 2
local function test(b)
	return b * 0.017453292519943295
end
local start = os.clock()
for i = 1, 1000000, 1 do
	local xd = test(i)
end
print(os.clock() - start)
start = os.clock()
for i = 1, 1000000, 1 do
	local xd = i * 0.017453292519943295
end
print(os.clock() - start)
start = os.clock()
for i = 1, 1000000, 1 do
	local xd = math.rad(i)
end
print(os.clock() - start)
local rad = math.rad
start = os.clock()
for i = 1, 1000000, 1 do
	local xd = rad(i)
end
print(os.clock() - start)
1 Like

You have to think about the consistency of the results. If you execute this on Luau’s demo site multiple times, you will notice how the results vary every time you execute the benchmark:

0.0049999999999954525
0.004000000000019099
0.007999999999981355
0.007000000000005002
0.003999999999990678
0.003999999999990678
0.007000000000005002
0.007000000000005002
0.003999999999990678
0.004000000000019099
0.006000000000000227
0.008000000000009777

Three different executions and all yield different results. But they all have one thing in common: they’re basically the same results. The first two tests are essentially the same (if test is inlined which is most likely is) and the last 2 are the same since they are both interpreted as a fast call to math.rad. The differences you are seeing is simply caused by other factors (CPU speed, other programs, etc.).

1 Like

So, first 2 are faster than second 2, and this means that writing alternatives in some cases are better than using build in functions?

1 Like

In this case, yes. Since your function gets inlined to a single multiplication operation, it’s going to be faster than calling a math function. However, I’m not sure how Luau’s inlining behavior works and it could be unpredictable or limited in certain ways. You should be careful relying on this.

Note that function calls are one of the slowest operations in Lua. They have a lot of overhead which is partially eliminated by Luau’s fast call system. Personally, I would just stick to using the built-in math functions. There’s no reason to micro-optimize at this level unless you are doing some serious number crunching (and even then it might be negligible).

3 Likes

Sorry for the delayed reply, I was interrupted and couldn’t continue the live conversation.

A small paragraph about function inlining

Function inlining means that if a function is simple enough, the compiler can decide to inline its body with the code where the function was called. That comes at a cost of spending a tad more memory but it eleminates function overhead (pushing to the stack and returning). Inlining is only going to be considered if the function is called rapidly and the estimated cost of inlining is benefitial.

In C and C++, there’s inline keyword that can suggest inlining.

In luau functions are considered for inlining as long as they’re not recursive or the environment is deoptimized with loadstring, getfenv, setfenv etc.

Tl;dr: Luau does use function inlining (as of almost 2 years ago I believe), while vanilla lua doesn’t support it at least up until version 5.4 (haven’t checked later versions).


Those results in the demo are pretty surprising and actually quite encouraging, @bmcqqq. Sadly I haven’t been able to reproduce this in the studio and normal function call was always at least a magnitude slower.

@GamEditoPro when you’re working with heavy code it’s best to avoid too much abstraction (splitting code into functions) when possible anyway.


I finally found the section in Luau documentation that I was looking for. Visiting the Performance - Luau page again also reinstated the fact that luau is an amazing language!

  • Global access chains are meant to replace localizing because it’s easy to forget (imports optimization). bmcqqq already said this, and that’s why you’re probably not seeing that significant improvement with localizing you’d see in vanilla lua.

  • Fast call:

image

Which further confirms bmcqqq’s advice:

3 Likes