Vectorized GetBoundingBox Function

I’ve only seen zeux’s implementation which uses scalar math, so I decided to optimize it further by vectorizing it. If there are any optimizations I missed, let me know.

--!strict

local GetComponents = CFrame.identity.GetComponents
local RbxGet: (target: Instance, property: string) -> unknown

xpcall(function()
	return game[""]
end, function()
	RbxGet = debug.info(2, "f")
end)

local function GetBoundingBox(parts: {BasePart}): (vector, vector)
	local min = vector.create(math.huge, math.huge, math.huge)
	local max = vector.create(-math.huge, -math.huge, -math.huge)
	for _, part in parts do
		local x, y, z, R00, R01, R02, R10, R11, R12, R20, R21, R22 = GetComponents(RbxGet(part, "CFrame") :: CFrame)
		local size = RbxGet(part, "Size") :: vector
		local pos = vector.create(x, y, z)
		local v1 = vector.abs(vector.create(R00, R10, R20)) * size.x
		local v2 = vector.abs(vector.create(R01, R11, R21)) * size.y
		local v3 = vector.abs(vector.create(R02, R12, R22)) * size.z
		local ws = (v1 + v2 + v3) * 0.5
		min = vector.min(min, pos - ws)
		max = vector.max(max, pos + ws)
	end
	return (max + min) * 0.5, max - min
end
Benchmarks

There are 2 additional variants used to compare performance:

Both were slightly modified to take parts as input and return two vectors.

Modified XAXA:

--!strict

local function GetBoundingBox(parts: {BasePart}): (Vector3, Vector3)
	local abs = math.abs
	local inf = math.huge
	local minx, miny, minz = inf, inf, inf
	local maxx, maxy, maxz = -inf, -inf, -inf
	for _, obj in parts do
		local cf = obj.CFrame
		local size = obj.Size
		local sx, sy, sz = size.X, size.Y, size.Z
		local x, y, z, R00, R01, R02, R10, R11, R12, R20, R21, R22 = cf:GetComponents()
		local wsx = 0.5 * (abs(R00) * sx + abs(R01) * sy + abs(R02) * sz)
		local wsy = 0.5 * (abs(R10) * sx + abs(R11) * sy + abs(R12) * sz)
		local wsz = 0.5 * (abs(R20) * sx + abs(R21) * sy + abs(R22) * sz)
		if minx > x - wsx then minx = x - wsx end
		if miny > y - wsy then miny = y - wsy end
		if minz > z - wsz then minz = z - wsz end
		if maxx < x + wsx then maxx = x + wsx end
		if maxy < y + wsy then maxy = y + wsy end
		if maxz < z + wsz then maxz = z + wsz end
	end
	local omin, omax = Vector3.new(minx, miny, minz), Vector3.new(maxx, maxy, maxz)
	return (omax+omin)/2, (omax-omin)
end

Modified Pyseph:

--!strict

local GetComponents = CFrame.identity.GetComponents
local inf = math.huge
local negInf = -inf

local function GetBoundingBox(parts: {BasePart}): (Vector3, Vector3)
	local minx, miny, minz = inf, inf, inf
	local maxx, maxy, maxz = negInf, negInf, negInf
	for _, part in next, parts do
		local cf = part.CFrame
		local size = part.Size
		local sx, sy, sz = size.X, size.Y, size.Z
		local x, y, z, R00, R01, R02, R10, R11, R12, R20, R21, R22 = GetComponents(cf)
		local wsx = 0.5 * (math.abs(R00) * sx + math.abs(R01) * sy + math.abs(R02) * sz)
		local wsy = 0.5 * (math.abs(R10) * sx + math.abs(R11) * sy + math.abs(R12) * sz)
		local wsz = 0.5 * (math.abs(R20) * sx + math.abs(R21) * sy + math.abs(R22) * sz)
		minx = minx > x - wsx and x - wsx or minx
		miny = miny > y - wsy and y - wsy or miny
		minz = minz > z - wsz and z - wsz or minz
		maxx = maxx < x + wsx and x + wsx or maxx
		maxy = maxy < y + wsy and y + wsy or maxy
		maxz = maxz < z + wsz and z + wsz or maxz
	end
	local omin, omax = Vector3.new(minx, miny, minz), Vector3.new(maxx, maxy, maxz)
	return (omax + omin) * 0.5, (omax - omin)
end

16 parts (R15)

o2

Variant Throughput Latency vs. Baseline
XAXA 292,748 ops/sec 3.28 µs 1.00x
Pyseph 298,619 ops/sec 3.21 µs 1.02x
Daw 364,471 ops/sec 2.69 µs 1.25x

o2 + native

Variant Throughput Latency vs. Baseline
XAXA 360,628 ops/sec 2.68 µs 1.00x
Pyseph 376,760 ops/sec 2.49 µs 1.04x
Daw 432,832 ops/sec 2.19 µs 1.20x

568 parts (Classic House)

o2

Variant Throughput Latency vs. Baseline
XAXA 8,582 ops/sec 114.37 µs 1.00x
Pyseph 8,817 ops/sec 111.96 µs 1.03x
Daw 10,299 ops/sec 96.18 µs 1.20x

o2 + native

Variant Throughput Latency vs. Baseline
XAXA 10,586 ops/sec 92.11 µs 1.00x
Pyseph 11,285 ops/sec 86.78 µs 1.07x
Daw 12,563 ops/sec 79.11 µs 1.19x

6780 parts (Roblox HQ Towers)

o2

Variant Throughput Latency vs. Baseline
XAXA 709 ops/sec 1.40 ms 1.00x
Pyseph 729 ops/sec 1.36 ms 1.03x
Daw 842 ops/sec 1.18 ms 1.19x

o2 + native

Variant Throughput Latency vs. Baseline
XAXA 865 ops/sec 1.15 ms 1.00x
Pyseph 920 ops/sec 1.07 ms 1.06x
Daw 1,007 ops/sec 993.70 µs 1.16x

13,521 parts (16 Doomspire towers)

o2

Variant Throughput Latency vs. Baseline
XAXA 327 ops/sec 3.01 ms 1.00x
Pyseph 334 ops/sec 2.94 ms 1.02x
Daw 390 ops/sec 2.53 ms 1.19x

o2 + native

Variant Throughput Latency vs. Baseline
XAXA 402 ops/sec 2.48 ms 1.00x
Pyseph 427 ops/sec 2.32 ms 1.06x
Daw 470 ops/sec 2.12 ms 1.17x

Unlike other implementations, my variant doesn’t provide orientation parameter and instead of a model, it takes list of parts to provide more control over what parts get included which is essential for me as I wanted to figure out a bounding box of a player character without their accessories. If you don’t need this level of control, you should use Model:GetBoundingBox as it is the fastest and most convenient.

12 Likes

I modified this to accept an array of bounding boxes instead, very cool and concise :+1:

2 Likes

Not a big optimization but i would replace for _, part in parts to for _, part in next, parts

Bro nobody uses next and what would that change anyways?

The performance would change (30charlimit)

Code: (I used ChatGPT)

local parts = table.create(100000)
for i = 1, 100000 do
	parts[i] = Instance.new("Part")
end

local function benchmark(name, loopFunc)
	local startTime = os.clock()
	loopFunc()
	local duration = os.clock() - startTime
	print(string.format("%s took %.6f seconds", name, duration))
end

local function testNone()
	for _, part in parts do
		local a = part
	end
end

local function testNext()
	for _, part in next, parts do
		local a = part
	end
end

benchmark("none", testNone)
benchmark("next", testNext)

Result:

21:41:28.227  none took 0.000463 seconds  -  Edit
21:41:28.228  next took 0.000497 seconds  -  Edit

Using next as micro-optimization is counter effective, Luau is already highly optimized for pairs and for loops.

The next() function proves itself useful if you need control over the iteration. It’s generally not a performance hack however.

Yay people starting to use fast metamethod calls :muscle:
Absolutelly

certified.

Please read Performance - Luau

Myyyyyy badddddddddd :sob: :sob: :sob: (30charlimit)

what is this for?
charlimit123

Fast metamethod calls