Vectorized GetBoundingBox Function

Daw588 · May 23, 2025, 11:21pm

I’ve only seen zeux’s implementation which uses scalar math, so I decided to optimize it further by vectorizing it. If there are any optimizations I missed, let me know.

--!strict

local GetComponents = CFrame.identity.GetComponents
local RbxGet: (target: Instance, property: string) -> unknown

xpcall(function()
	return game[""]
end, function()
	RbxGet = debug.info(2, "f")
end)

local function GetBoundingBox(parts: {BasePart}): (vector, vector)
	local min = vector.create(math.huge, math.huge, math.huge)
	local max = vector.create(-math.huge, -math.huge, -math.huge)
	for _, part in parts do
		local x, y, z, R00, R01, R02, R10, R11, R12, R20, R21, R22 = GetComponents(RbxGet(part, "CFrame") :: CFrame)
		local size = RbxGet(part, "Size") :: vector
		local pos = vector.create(x, y, z)
		local v1 = vector.abs(vector.create(R00, R10, R20)) * size.x
		local v2 = vector.abs(vector.create(R01, R11, R21)) * size.y
		local v3 = vector.abs(vector.create(R02, R12, R22)) * size.z
		local ws = (v1 + v2 + v3) * 0.5
		min = vector.min(min, pos - ws)
		max = vector.max(max, pos + ws)
	end
	return (max + min) * 0.5, max - min
end

Benchmarks

There are 2 additional variants used to compare performance:

XAXA
Pyseph

Both were slightly modified to take parts as input and return two vectors.

Modified XAXA:

--!strict

local function GetBoundingBox(parts: {BasePart}): (Vector3, Vector3)
	local abs = math.abs
	local inf = math.huge
	local minx, miny, minz = inf, inf, inf
	local maxx, maxy, maxz = -inf, -inf, -inf
	for _, obj in parts do
		local cf = obj.CFrame
		local size = obj.Size
		local sx, sy, sz = size.X, size.Y, size.Z
		local x, y, z, R00, R01, R02, R10, R11, R12, R20, R21, R22 = cf:GetComponents()
		local wsx = 0.5 * (abs(R00) * sx + abs(R01) * sy + abs(R02) * sz)
		local wsy = 0.5 * (abs(R10) * sx + abs(R11) * sy + abs(R12) * sz)
		local wsz = 0.5 * (abs(R20) * sx + abs(R21) * sy + abs(R22) * sz)
		if minx > x - wsx then minx = x - wsx end
		if miny > y - wsy then miny = y - wsy end
		if minz > z - wsz then minz = z - wsz end
		if maxx < x + wsx then maxx = x + wsx end
		if maxy < y + wsy then maxy = y + wsy end
		if maxz < z + wsz then maxz = z + wsz end
	end
	local omin, omax = Vector3.new(minx, miny, minz), Vector3.new(maxx, maxy, maxz)
	return (omax+omin)/2, (omax-omin)
end

Modified Pyseph:

--!strict

local GetComponents = CFrame.identity.GetComponents
local inf = math.huge
local negInf = -inf

local function GetBoundingBox(parts: {BasePart}): (Vector3, Vector3)
	local minx, miny, minz = inf, inf, inf
	local maxx, maxy, maxz = negInf, negInf, negInf
	for _, part in next, parts do
		local cf = part.CFrame
		local size = part.Size
		local sx, sy, sz = size.X, size.Y, size.Z
		local x, y, z, R00, R01, R02, R10, R11, R12, R20, R21, R22 = GetComponents(cf)
		local wsx = 0.5 * (math.abs(R00) * sx + math.abs(R01) * sy + math.abs(R02) * sz)
		local wsy = 0.5 * (math.abs(R10) * sx + math.abs(R11) * sy + math.abs(R12) * sz)
		local wsz = 0.5 * (math.abs(R20) * sx + math.abs(R21) * sy + math.abs(R22) * sz)
		minx = minx > x - wsx and x - wsx or minx
		miny = miny > y - wsy and y - wsy or miny
		minz = minz > z - wsz and z - wsz or minz
		maxx = maxx < x + wsx and x + wsx or maxx
		maxy = maxy < y + wsy and y + wsy or maxy
		maxz = maxz < z + wsz and z + wsz or maxz
	end
	local omin, omax = Vector3.new(minx, miny, minz), Vector3.new(maxx, maxy, maxz)
	return (omax + omin) * 0.5, (omax - omin)
end

16 parts (R15)

o2

Variant	Throughput	Latency	vs. Baseline
XAXA	292,748 ops/sec	3.28 µs	1.00x
Pyseph	298,619 ops/sec	3.21 µs	1.02x
Daw	364,471 ops/sec	2.69 µs	1.25x

o2 + native

Variant	Throughput	Latency	vs. Baseline
XAXA	360,628 ops/sec	2.68 µs	1.00x
Pyseph	376,760 ops/sec	2.49 µs	1.04x
Daw	432,832 ops/sec	2.19 µs	1.20x

568 parts (Classic House)

o2

Variant	Throughput	Latency	vs. Baseline
XAXA	8,582 ops/sec	114.37 µs	1.00x
Pyseph	8,817 ops/sec	111.96 µs	1.03x
Daw	10,299 ops/sec	96.18 µs	1.20x

o2 + native

Variant	Throughput	Latency	vs. Baseline
XAXA	10,586 ops/sec	92.11 µs	1.00x
Pyseph	11,285 ops/sec	86.78 µs	1.07x
Daw	12,563 ops/sec	79.11 µs	1.19x

6780 parts (Roblox HQ Towers)

o2

Variant	Throughput	Latency	vs. Baseline
XAXA	709 ops/sec	1.40 ms	1.00x
Pyseph	729 ops/sec	1.36 ms	1.03x
Daw	842 ops/sec	1.18 ms	1.19x

o2 + native

Variant	Throughput	Latency	vs. Baseline
XAXA	865 ops/sec	1.15 ms	1.00x
Pyseph	920 ops/sec	1.07 ms	1.06x
Daw	1,007 ops/sec	993.70 µs	1.16x

13,521 parts (16 Doomspire towers)

o2

Variant	Throughput	Latency	vs. Baseline
XAXA	327 ops/sec	3.01 ms	1.00x
Pyseph	334 ops/sec	2.94 ms	1.02x
Daw	390 ops/sec	2.53 ms	1.19x

o2 + native

Variant	Throughput	Latency	vs. Baseline
XAXA	402 ops/sec	2.48 ms	1.00x
Pyseph	427 ops/sec	2.32 ms	1.06x
Daw	470 ops/sec	2.12 ms	1.17x

Unlike other implementations, my variant doesn’t provide orientation parameter and instead of a model, it takes list of parts to provide more control over what parts get included which is essential for me as I wanted to figure out a bounding box of a player character without their accessories. If you don’t need this level of control, you should use Model:GetBoundingBox as it is the fastest and most convenient.

athar_adv · May 24, 2025, 2:30am

I modified this to accept an array of bounding boxes instead, very cool and concise

matrixgeg · May 25, 2025, 8:42pm

Not a big optimization but i would replace for _, part in parts to for _, part in next, parts

ChatGGPT · May 27, 2025, 2:49pm

Bro nobody uses next and what would that change anyways?

matrixgeg · May 28, 2025, 7:01pm

The performance would change (30charlimit)

ChatGGPT · May 29, 2025, 1:42am

Code: (I used ChatGPT)

local parts = table.create(100000)
for i = 1, 100000 do
	parts[i] = Instance.new("Part")
end

local function benchmark(name, loopFunc)
	local startTime = os.clock()
	loopFunc()
	local duration = os.clock() - startTime
	print(string.format("%s took %.6f seconds", name, duration))
end

local function testNone()
	for _, part in parts do
		local a = part
	end
end

local function testNext()
	for _, part in next, parts do
		local a = part
	end
end

benchmark("none", testNone)
benchmark("next", testNext)

Result:

21:41:28.227  none took 0.000463 seconds  -  Edit
21:41:28.228  next took 0.000497 seconds  -  Edit

xChris_vC · May 29, 2025, 7:04am

Using next as micro-optimization is counter effective, Luau is already highly optimized for pairs and for loops.

The next() function proves itself useful if you need control over the iteration. It’s generally not a performance hack however.

Yarik_superpro · May 29, 2025, 8:02am

Yay people starting to use fast metamethod calls
Absolutelly

certified.

Yarik_superpro · May 29, 2025, 8:04am

Please read Performance - Luau

matrixgeg · May 29, 2025, 9:00am

Myyyyyy badddddddddd (30charlimit)

1Xayd1 · May 29, 2025, 5:04pm

what is this for?
charlimit123

Yarik_superpro · May 29, 2025, 5:17pm

Fast metamethod calls