FAST vector magnitude algorithm?

ndporter2 · February 12, 2025, 7:27pm

Ok, you have me there. I have only one question: why run your own equation to find the magnitude if the square root function runs slower than the magnitude function?

Would it not make sense to use the Magnitude variable if Roblox can calculate that variable faster than it could run the square root function?

Speaking from a game engine design standpoint, Roblox is designed to allow young new developers to learn the basics of game development. So it would make sense for them to optimize the heck out of the .magnitude feature, as it would be more likely for a young developer to grasp the understanding of what that value is, before they learn the math that goes on behind the scenes to find it. Thus I would assume that .magnitude would be lighter on a low end machine than running your own equation.

That is just my two cents though.

EmilyBendsSpace · February 14, 2025, 1:46am

Your test code has a flaw though, because you have Vector3.new() calls inside your loops, and those allocations are a significant part of what you’re measuring, and the same in each test case. Move the Vector3.new() call outside the loops, and only do the magnitude calls inside. Without the --!native directive, vector.magnitude should be more like 5 or 6 times faster. In server code though, using --!native makes them basically equal (and another 2-3 times faster than without).

This is not exactly the case. A lot of the Roblox API that presents as properties on the Lua side actually maps to getter and setter functions on the C++ side of the bridge. Magnitude is one of these; it calls the function that does the calculation every time you read the property, and it’s not cached by the Vector3 data type. Whether or not it’s cached at all is hardware dependent and not something you have no control over or can make any assumptions about.

SeargentAUS · February 14, 2025, 3:15am

For what I was testing (namely which one is faster), it doesn’t matter that the Vector3.new() calls are inside the loops, especially given that it is done in both test cases.

If I adjust it to determine the actual relative performance (which is not what I was initially testing), then yes, the speed is about 3.8x faster using vector.

Test code

local t0 = tick()
local v1 = Vector3.new(5,5,5)
for i = 1, 100000000 do -- A
	local m = vector.magnitude(v1)
end
local t1 = tick()
local v2 = Vector3.new(5,5,5)
for i = 1, 100000000 do -- B
	local m = v2.Magnitude
end

local t2 = tick()
local a = t1-t0
local b = t2-t1
print("Time for A:", a)
print("Time for B:", b)
warn(`A is {b/a}x times faster that B`)

NyrionDev · February 14, 2025, 3:11pm

This is my attempt of implementing a fast square root algorithm in luau:

--!native
local rs, ba = bit32.rshift, bit32.band
local bo, ls = bit32.bor, bit32.lshift
@native local function fastSqrt(n: number): number 
	--[the following mess calculates the exponent]
	local e, n1 = 0, n
	if n1 >= 65536 then e += 16 n1 = rs(n1, 16) end
	if n1 >= 256 then e += 8 n1 = rs(n1, 8) end
	if n1 >= 8 then e += 4 n1 = rs(n1, 4) end
	if n1 >= 4 then e += 2 n1 = rs(n1, 2) end
	if n1 >= 2 then e += 1 end
	--[end of mess]
	local u = 1597463007 - bo(e*4194304 + 532676608, 4194304 * (n / ls(1, e) - 1))
	local y = rs((1 + ba(u, 8388607) / 8388608)*65536, (-ba(u/8388608, 255) + 127))/65536
	return 0.5 * n * y * (3 - n * y * y)
end

local n = 6483295 --random example
print("Fast sqrt("..n..") = "..fastSqrt(n))
print("math.sqrt("..n..") = "..math.sqrt(n))

All that mess and it’s still 20 times slower than the default math.sqrt implementation…

My implementation also returns 0 for some inputs with unidentified properties, so you can consider it backdoored as well(it’s a feature).

C_Corpze · February 14, 2025, 6:24pm

I think there’s a flaw in this benchmark.

You’re supposed to measure the time after the first loop because everything is measured at the end now, when all the time is added up together if I’m not mistaken.

If vector.magnitude() is faster however then I’m pretty surprised as to why that is.

SeargentAUS · February 15, 2025, 7:08am

There are 3 points where the time is measured. One is before the first loop, one is after the first loop (and subsequently just before the second loop), and the third is after the second loop:

t0 and t1 are used to calculate the first time, and t1 and t2 are used to calculate the second time

EmilyBendsSpace · February 18, 2025, 6:46pm

But you were printing out A is X times faster than B ?!

It does matter though, because memory allocations are not just slow, they are also not guaranteed to be constant-time operations. Also, if the time to allocate a Vector3 turned out to be a lot longer on average than a call to either magnitude option, and the magnitude options were close to the same speed, you would not be able to discern which is faster with any confidence, the small difference would be lost in the variability of the slower memory operations. Even if you got the same result every time on your PC, you couldn’t say for sure that the result would be the same on a machine with less free memory, faster/slower memory, a different architecture, etc.

I guess what I’m trying to drive home here is that, even when you don’t need to quantify the difference, it’s still desirable when benchmarking to isolate what you’re testing as much as possible for the highest-confidence result.

C_Corpze · February 18, 2025, 8:13pm

Unfortunate that it’s slower because I see there’s so much going in in here.
Kind of a shame that we can’t do C+±level optimizations on Luau.

It’s a high level programming language after all but I’d love having the ability to extend Roblox engine code with Luau with memory management that compiles to machine code on every platform regardless of client or server.

BendsSpace · February 18, 2025, 8:49pm

When using distance cutoffs or comparing distances relatively, you can skip the square root.

For example, if you have a distance cutoff D and a <X, Y> vector you can do X^2 + Y^2 < D^2.

There is really no reason to optimize this though, as many people have said. This will basically never be the main bottleneck of any sufficiently complicated Luau code.

index_self · February 18, 2025, 10:50pm

To answer your question, yes, using vector.magnitude is the fastest method, especially when using raw vectors.

This is because the VM and Codegen can utilize SIMD platform instructions and they also don’t carry any userdata garbage from Roblox, meaning there’s nothing to garbage collect. The function is also fastcalled if I recall correctly.