Ok, you have me there. I have only one question: why run your own equation to find the magnitude if the square root function runs slower than the magnitude function?
Would it not make sense to use the Magnitude variable if Roblox can calculate that variable faster than it could run the square root function?
Speaking from a game engine design standpoint, Roblox is designed to allow young new developers to learn the basics of game development. So it would make sense for them to optimize the heck out of the .magnitude feature, as it would be more likely for a young developer to grasp the understanding of what that value is, before they learn the math that goes on behind the scenes to find it. Thus I would assume that .magnitude would be lighter on a low end machine than running your own equation.
Your test code has a flaw though, because you have Vector3.new() calls inside your loops, and those allocations are a significant part of what youāre measuring, and the same in each test case. Move the Vector3.new() call outside the loops, and only do the magnitude calls inside. Without the --!native directive, vector.magnitude should be more like 5 or 6 times faster. In server code though, using --!native makes them basically equal (and another 2-3 times faster than without).
This is not exactly the case. A lot of the Roblox API that presents as properties on the Lua side actually maps to getter and setter functions on the C++ side of the bridge. Magnitude is one of these; it calls the function that does the calculation every time you read the property, and itās not cached by the Vector3 data type. Whether or not itās cached at all is hardware dependent and not something you have no control over or can make any assumptions about.
For what I was testing (namely which one is faster), it doesnāt matter that the Vector3.new() calls are inside the loops, especially given that it is done in both test cases.
If I adjust it to determine the actual relative performance (which is not what I was initially testing), then yes, the speed is about 3.8x faster using vector.
Test code
local t0 = tick()
local v1 = Vector3.new(5,5,5)
for i = 1, 100000000 do -- A
local m = vector.magnitude(v1)
end
local t1 = tick()
local v2 = Vector3.new(5,5,5)
for i = 1, 100000000 do -- B
local m = v2.Magnitude
end
local t2 = tick()
local a = t1-t0
local b = t2-t1
print("Time for A:", a)
print("Time for B:", b)
warn(`A is {b/a}x times faster that B`)
This is my attempt of implementing a fast square root algorithm in luau:
--!native
local rs, ba = bit32.rshift, bit32.band
local bo, ls = bit32.bor, bit32.lshift
@native local function fastSqrt(n: number): number
--[the following mess calculates the exponent]
local e, n1 = 0, n
if n1 >= 65536 then e += 16 n1 = rs(n1, 16) end
if n1 >= 256 then e += 8 n1 = rs(n1, 8) end
if n1 >= 8 then e += 4 n1 = rs(n1, 4) end
if n1 >= 4 then e += 2 n1 = rs(n1, 2) end
if n1 >= 2 then e += 1 end
--[end of mess]
local u = 1597463007 - bo(e*4194304 + 532676608, 4194304 * (n / ls(1, e) - 1))
local y = rs((1 + ba(u, 8388607) / 8388608)*65536, (-ba(u/8388608, 255) + 127))/65536
return 0.5 * n * y * (3 - n * y * y)
end
local n = 6483295 --random example
print("Fast sqrt("..n..") = "..fastSqrt(n))
print("math.sqrt("..n..") = "..math.sqrt(n))
All that mess and itās still 20 times slower than the default math.sqrt implementationā¦
My implementation also returns 0 for some inputs with unidentified properties, so you can consider it backdoored as well(itās a feature).
Youāre supposed to measure the time after the first loop because everything is measured at the end now, when all the time is added up together if Iām not mistaken.
If vector.magnitude() is faster however then Iām pretty surprised as to why that is.
There are 3 points where the time is measured. One is before the first loop, one is after the first loop (and subsequently just before the second loop), and the third is after the second loop:
t0 and t1 are used to calculate the first time, and t1 and t2 are used to calculate the second time
But you were printing out A is X times faster than B ?!
It does matter though, because memory allocations are not just slow, they are also not guaranteed to be constant-time operations. Also, if the time to allocate a Vector3 turned out to be a lot longer on average than a call to either magnitude option, and the magnitude options were close to the same speed, you would not be able to discern which is faster with any confidence, the small difference would be lost in the variability of the slower memory operations. Even if you got the same result every time on your PC, you couldnāt say for sure that the result would be the same on a machine with less free memory, faster/slower memory, a different architecture, etc.
I guess what Iām trying to drive home here is that, even when you donāt need to quantify the difference, itās still desirable when benchmarking to isolate what youāre testing as much as possible for the highest-confidence result.
Unfortunate that itās slower because I see thereās so much going in in here.
Kind of a shame that we canāt do C+Ā±level optimizations on Luau.
Itās a high level programming language after all but Iād love having the ability to extend Roblox engine code with Luau with memory management that compiles to machine code on every platform regardless of client or server.
When using distance cutoffs or comparing distances relatively, you can skip the square root.
For example, if you have a distance cutoff D and a <X, Y> vector you can do X^2 + Y^2 < D^2.
There is really no reason to optimize this though, as many people have said. This will basically never be the main bottleneck of any sufficiently complicated Luau code.
To answer your question, yes, using vector.magnitude is the fastest method, especially when using raw vectors.
This is because the VM and Codegen can utilize SIMD platform instructions and they also donāt carry any userdata garbage from Roblox, meaning thereās nothing to garbage collect. The function is also fastcalled if I recall correctly.