Optimizing matrix library

CoderHusk · November 12, 2021, 7:56pm

So I did some benchmarking and found that my matrix library that gets called I would estimate a couple hundred thousand times per frame is killing my fps. I was wondering if anybody could see any room for improvement, even if it is tiny.

local matrix = {}
matrix.__index = matrix

--[[compData is 4x4 2 dimensional array]]--
function matrix.new(inheritedFromVector, compData)
	if not compData and not inheritedFromVector then
		compData = {
			{1,0,0,0};
			{0,1,0,0};
			{0,0,1,0};
			{0,0,0,1}
		}
	end
	if inheritedFromVector then
		
		if typeof(compData) == "Vector3" then
			compData = {
				{compData.X,0,0,0};
				{compData.Y,0,0,0};
				{compData.Z,0,0,0};
				{1,0,0,0}
			}
		else
			compData = 
				{{0,0,0,0};
			{0,0,0,0};
			{0,0,0,0};
			{1,0,0,0}}
		end
	end
	return setmetatable(compData, matrix)
end

--[[convert the homogonous coordinate vector to a 3 dimensional roblox vector]]--
function matrix:m2v3()
	return Vector3.new(self[1][1]/self[4][1], self[2][1]/self[4][1], self[3][1]/self[4][1])
end

--[[create a rotational matrix, axis is a string of X, Y, or Z]]--
function matrix:rotate(axis, step)
	if axis == "X" then
		local mtx = {
			{1,0,0,0};
			{0, math.cos(step), -math.sin(step), 0};
			{0,math.sin(step), math.cos(step), 0};
			{0,0,0,1}
		}
		return matrix.new(false, mtx)
	end
	if axis == "Y" then
		local mtx = {
			{math.cos(step),0,math.sin(step),0};
			{0, 1,0, 0};
			{-math.sin(step),0, math.cos(step), 0};
			{0,0,0,1}
		}
		return matrix.new(false, mtx)
	end
	if axis == "Z" then
		local mtx = {
			{math.cos(step),-math.sin(step),0,0};
			{math.sin(step), math.cos(step),0, 0};
			{0,0,1, 0};
			{0,0,0,1}
		}
		return matrix.new(false, mtx)
	end
end

--[[create a perspective matrix, n is near clipping plane and f is far clipping plane]]--
function matrix:perspective(fov,n,f)
	--https://www.scratchapixel.com/lessons/3d-basic-rendering/perspective-and-orthographic-projection-matrix/building-basic-perspective-projection-matrix
	local s = 1/math.tan(fov*0.00872664625998)
	local mtx = {
		{s,0,0,0};
		{0,s,0,0};
		{0,0,-f/(f-n),-1};
		{0,0,-f*n/(f-n), 0}
	}
	return matrix.new(false, mtx)
end

--[[x,y,z are the scale factors for each axis]]
function matrix:scale(x,y,z)
	local mtx = {
		{x,0,0,0};
		{0,y,0,0};
		{0,0,z,0};
		{0,0,0,1}
	}
	return matrix.new(false, mtx)
end

--[[lookat matrix, parameters are roblox vector3s]]--
function matrix:lookAt(eye, target)
	local forward = (eye-target).Unit
	local right = Vector3.new(0,1,0):Cross(forward)
	local up = forward:Cross(right)
	local coordSpace = matrix.new(false, {
		{right.X, right.Y, right.Z, 0};
		{up.X, up.Y, up.Z, 0};
		{forward.X, forward.Y, forward.Z, 0};
		{0,0,0,1}
	})
	local invertedCameraPosition = matrix.new(false, {
		{1,0,0,-eye.X};
		{0,1,0,-eye.Y};
		{0,0,1,-eye.Z};
		{0,0,0,1}
	})
	return coordSpace*invertedCameraPosition
end

--[[perform matrix multiplication]]--
matrix.__mul = function(m1, m2)
	local mtx = {}
	for i = 1,#m1 do
		mtx[i] = {}
		for j = 1,#m2[1] do
			local num = m1[i][1] * m2[1][j]
			for n = 2,#m1[1] do
				num = num + m1[i][n] * m2[n][j]
			end
			mtx[i][j] = num
		end
	end
	return matrix.new(false, mtx)
end

return matrix

MettaurSp · November 12, 2021, 8:40pm

The code contains some abstractions which are good for ease of use, but they unfortunately do come with a performance penalty. The inner tables do come with some additional costs and you can get some savings back by switching to a 1D array with custom indexing math like: [4 * (a - 1) + b] instead of [a][b].

The code is also using string comparisons to merge multiple functions into 1, but I think it’d be worthwhile to make specialized implementations of the functions for each axis, and only keep a string based one if you really need to dynamically pick the axis. Something like matrix:rotateX, matrix:rotateY, matrix:rotateZ along with matrix:rotate.

The comparison logic is also using string comparisons, so it might be worthwhile to switch to number enums instead, like matrix.axisX = 1, matrix:rotate(matrix.axisX, angle).

For matrix:lookAt you could do the internal matrix multiplication by hand since one of the matrices is just a translation. That would save on creating and destroying a matrix per call.

If you can limit the usage of the perspective matrix to a very specific use case I think it’d be possible to use CFrames to store the data instead of tables, and only expand it to a full table once the perspective matrix needs to be applied. CFrames seem like they preserve exact values that are fed into the component constructor, but that’ll take some testing to see if it’s worth doing.

There’s a chance that Roblox’s built in structure is more optimized to the point that you can see performance benefits from a memory management perspective, but you also run the risk of building up costs from crossing the Lua C boundary instead.

I’d do the 1D array optimization first because some brief testing showed up to a 25% performance increase on a large number of multiplications, and it doesn’t make the code look too much worse.

Edit: Another optimization that might be worthwhile to borrow from the C world would be making versions of the functions where you can specify what table you want to output to. A lot of the times you don’t always need a whole new matrix to dump results to so you could save some time by avoiding using the constructor & GC.

Edit 2: fixed typo with the suggested array offset math

D0RYU · November 12, 2021, 8:54pm

matrix.__mul = function(Matrix1, Matrix2)
	local Matrix = {}
	
	for Index1 = 1, #Matrix1 do
		table.insert(Matrix, {})
		
		for Index2 = 1, #Matrix2[1] do
			table.insert(Matrix[Index1], 0)
			
			for Index3 = 1, #Matrix2 do
				Matrix[Index1][Index2] += Matrix1[Index1][Index3] * Matrix2[Index3][Index2]
			end
		end
	end
	
	return Matrix
end

messed with the matrix multiplication function a bit

Brickman808 · November 12, 2021, 10:20pm

This paper says that you can save some performance by storing math.tan in a variable first. The same applies to all other function from the math global.

The paper also mentions that you can optimize a table by predefining its length. This is useful for optimizing matrix.__mul.

When you write {true, true, true}, Lua knows beforehand that
the table will need three slots in its array part, so Lua creates the table with
that size. Similarly, if you write {x = 1, y = 2, z = 3}, Lua will create a table
with four slots in its hash part. As an example, the next loop runs in 2.0 seconds:

By the way, I found a Git repository from 2012 named ‘lua-matrix’. Maybe you find this interesting.

D0RYU · November 13, 2021, 12:36am

this rule doesn’t apply to luau

Daw588 · November 13, 2021, 12:44am

That’s something new, I thought it won’t have much difference. Shouldn’t compiler optimize that? Like creating a variable for you for repetitive things like math.function. Because if it already does that then it doesn’t make any noticeable difference.

And compilers often optimizes your code better than yourself and so you shouldn’t worry that much.

D0RYU · November 13, 2021, 12:45am

again doesn’t apply in luau

they are looking at lua

Daw588 · November 13, 2021, 12:46am

OH. It’s tricky name, sorry. But I believe that this also applies to languages like C++ and Javascript right?

D0RYU · November 13, 2021, 12:47am

not entirely sure about those yet

MettaurSp · November 13, 2021, 1:49am

Javascript is much closer to Lua than C++ in overall design and performance characteristics I believe, but I believe that the most modern implementations of it are also JIT compiled, so in theory I think they could optimize it. I can’t say one way or another for sure though because that language isn’t one I regularly use or study about for performance characteristics.

C++ on the other hand does (almost) always refer directly to the memory addresses of the functions and members being referenced. In any case that isn’t a virtual function C++ doesn’t need to look up the addresses in structures at run time to find them because the compiler knows them at compile time. Even with virtual functions there are instances where the compiler has the freedom to skip the virtual table lookup for which child class definition to use if it can guarantee that the code will only ever need that specific version of the function.

In C++ scopes, things like

namespace MyNamespace
{
void foo(); // inside the namespace's scope
}
// or
class Bar
{
public:
static void foo(); // inside the class's scope
}

the scopes aren’t actual data structures. They’re organizational tools to help uniquely identify one copy of a function from another, and so you know who each copy belongs to.

A call to MyNamespace::foo() doesn’t look in a structure called MyNamespace, it just sees it needs to call a function who’s definition was in a scope called MyNamespace. Same thing with Bar::foo(), it just sees a function defined in a scope that is owned by a class called Bar.

Non-static functions don’t need look ups either

class Player
{
public:
void Kick();
}

because what they see is a function void Player::Kick() that also requires knowledge of which instance of Player to pass into the function. Logically it would look like void Player::Kick(Player* player), but like a lot of things in C++ the syntax is a little uglier:

Player* myPlayer;
auto myPlayerFunction = &Player::Kick; // get pointer to function Player::Kick

(myPlayer.*myPlayerFunction )(); // call the member function "myPlayerFunction" as a member of object "myPlayer"

which would be similar to

local myPlayer = -- some player
local myPlayerFunction = myPlayer.Kick

myPlayerFunction(myPlayer, "no rule breaking")

To my knowledge, Luau handles regular member function calls like myPlayer:myPlayerFunction("no rule breaking") just about as well as it handles the method above because they made a special metamethod called __namecall that streamlines the process of lookup, allowing for optimizations in that area. I don’t think it’s quite the same as the way C++ does it.