FAST vector magnitude algorithm?

Ok, you have me there. I have only one question: why run your own equation to find the magnitude if the square root function runs slower than the magnitude function?

Would it not make sense to use the Magnitude variable if Roblox can calculate that variable faster than it could run the square root function?

Speaking from a game engine design standpoint, Roblox is designed to allow young new developers to learn the basics of game development. So it would make sense for them to optimize the heck out of the .magnitude feature, as it would be more likely for a young developer to grasp the understanding of what that value is, before they learn the math that goes on behind the scenes to find it. Thus I would assume that .magnitude would be lighter on a low end machine than running your own equation.

That is just my two cents though.

Your test code has a flaw though, because you have Vector3.new() calls inside your loops, and those allocations are a significant part of what youā€™re measuring, and the same in each test case. Move the Vector3.new() call outside the loops, and only do the magnitude calls inside. Without the --!native directive, vector.magnitude should be more like 5 or 6 times faster. In server code though, using --!native makes them basically equal (and another 2-3 times faster than without).

This is not exactly the case. A lot of the Roblox API that presents as properties on the Lua side actually maps to getter and setter functions on the C++ side of the bridge. Magnitude is one of these; it calls the function that does the calculation every time you read the property, and itā€™s not cached by the Vector3 data type. Whether or not itā€™s cached at all is hardware dependent and not something you have no control over or can make any assumptions about.

2 Likes

For what I was testing (namely which one is faster), it doesnā€™t matter that the Vector3.new() calls are inside the loops, especially given that it is done in both test cases.

If I adjust it to determine the actual relative performance (which is not what I was initially testing), then yes, the speed is about 3.8x faster using vector.

image

Test code
local t0 = tick()
local v1 = Vector3.new(5,5,5)
for i = 1, 100000000 do -- A
	local m = vector.magnitude(v1)
end
local t1 = tick()
local v2 = Vector3.new(5,5,5)
for i = 1, 100000000 do -- B
	local m = v2.Magnitude
end

local t2 = tick()
local a = t1-t0
local b = t2-t1
print("Time for A:", a)
print("Time for B:", b)
warn(`A is {b/a}x times faster that B`)

This is my attempt of implementing a fast square root algorithm in luau:

--!native
local rs, ba = bit32.rshift, bit32.band
local bo, ls = bit32.bor, bit32.lshift
@native local function fastSqrt(n: number): number 
	--[the following mess calculates the exponent]
	local e, n1 = 0, n
	if n1 >= 65536 then e += 16 n1 = rs(n1, 16) end
	if n1 >= 256 then e += 8 n1 = rs(n1, 8) end
	if n1 >= 8 then e += 4 n1 = rs(n1, 4) end
	if n1 >= 4 then e += 2 n1 = rs(n1, 2) end
	if n1 >= 2 then e += 1 end
	--[end of mess]
	local u = 1597463007 - bo(e*4194304 + 532676608, 4194304 * (n / ls(1, e) - 1))
	local y = rs((1 + ba(u, 8388607) / 8388608)*65536, (-ba(u/8388608, 255) + 127))/65536
	return 0.5 * n * y * (3 - n * y * y)
end

local n = 6483295 --random example
print("Fast sqrt("..n..") = "..fastSqrt(n))
print("math.sqrt("..n..") = "..math.sqrt(n))

All that mess and itā€™s still 20 times slower than the default math.sqrt implementationā€¦

My implementation also returns 0 for some inputs with unidentified properties, so you can consider it backdoored as well(itā€™s a feature).

1 Like

I think thereā€™s a flaw in this benchmark.

Youā€™re supposed to measure the time after the first loop because everything is measured at the end now, when all the time is added up together if Iā€™m not mistaken.

If vector.magnitude() is faster however then Iā€™m pretty surprised as to why that is.

There are 3 points where the time is measured. One is before the first loop, one is after the first loop (and subsequently just before the second loop), and the third is after the second loop:
image

t0 and t1 are used to calculate the first time, and t1 and t2 are used to calculate the second time

1 Like

But you were printing out A is X times faster than B ?!

It does matter though, because memory allocations are not just slow, they are also not guaranteed to be constant-time operations. Also, if the time to allocate a Vector3 turned out to be a lot longer on average than a call to either magnitude option, and the magnitude options were close to the same speed, you would not be able to discern which is faster with any confidence, the small difference would be lost in the variability of the slower memory operations. Even if you got the same result every time on your PC, you couldnā€™t say for sure that the result would be the same on a machine with less free memory, faster/slower memory, a different architecture, etc.

I guess what Iā€™m trying to drive home here is that, even when you donā€™t need to quantify the difference, itā€™s still desirable when benchmarking to isolate what youā€™re testing as much as possible for the highest-confidence result.

2 Likes

Unfortunate that itā€™s slower because I see thereā€™s so much going in in here.
Kind of a shame that we canā€™t do C+Ā±level optimizations on Luau.

Itā€™s a high level programming language after all but Iā€™d love having the ability to extend Roblox engine code with Luau with memory management that compiles to machine code on every platform regardless of client or server.

When using distance cutoffs or comparing distances relatively, you can skip the square root.

For example, if you have a distance cutoff D and a <X, Y> vector you can do X^2 + Y^2 < D^2.

There is really no reason to optimize this though, as many people have said. This will basically never be the main bottleneck of any sufficiently complicated Luau code.

1 Like

To answer your question, yes, using vector.magnitude is the fastest method, especially when using raw vectors.

This is because the VM and Codegen can utilize SIMD platform instructions and they also donā€™t carry any userdata garbage from Roblox, meaning thereā€™s nothing to garbage collect. The function is also fastcalled if I recall correctly.