Review of my 2D-grid module

PysephDEV · March 14, 2021, 8:38pm

Hello!
I’ve been working on a minimalistic and efficient grid module to be able to easily detect what region a point is in, and I’d like to receive some criticism, especially in some specific areas.
I’m mostly worried about how I handle stored grids - Should I put StoredGrids in Grid (i.e Grid.StoredGrids and Grid.New), keep as is, or do something else (such as having Grid.new return the already-created grid instead of erroring)?
I’m also interested in the performance impact on using a 2D-nested array instead of what I’m currently doing, which is to generate a unique identifier from 2 numbers (i.e a pairing function). I’m assuming my current solution would be more performant since I’m only allocating space into a flat array, and never create any new tables for the Y axis (t[x][y]).

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

local Grid = {}
Grid.__index = Grid

function Grid:SnapToGrid(N)
	return N - N % self.CellSize
end
function Grid:SnapVector2ToGrid(Position)
	return Vector2.new(self:SnapToGrid(Position.X), self:SnapToGrid(Position.Y))
end
function Grid:FillCell(X, Y, Data)
	local Pair = PairFunction(X, Y)
	self.Cells[Pair] = Data
end
function Grid:FillRect(P0, P1, Data)
	local SnappedP0 = self:SnapVector2ToGrid(P0)
	local SnappedP1 = self:SnapVector2ToGrid(P1)

	local X0, Y0 = SnappedP0.X, SnappedP0.Y
	local X1, Y1 = SnappedP1.X, SnappedP1.Y

    -- Yes, I'm aware I could use math.min & math.max, but it's inefficient.
	local MinX = X0 < X1 and X0 or X1
	local MinY = Y0 < Y1 and Y0 or Y1

	local MaxX = X0 > X1 and X0 or X1
	local MaxY = Y0 > Y1 and Y0 or Y1

	for CellX = MinX, MaxX, self.CellSize do
		for CellY = MinY, MaxY, self.CellSize do
			self:FillCell(CellX, CellY, Data)
		end
	end
end
function Grid:GetPointData(X, Y)
	local Pair = PairFunction(self:SnapToGrid(X), self:SnapToGrid(Y))
	return self.Cells[Pair]
end

local StoredGrids = {}
return {
	Grid = {
		new = function(CellSize, GridName)
			assert(type(CellSize) == 'number', 'You must provide a cell size')
			assert(type(GridName) == 'string', 'You must provide a grid name')
			assert(StoredGrids[GridName] == nil, 'Grid already exists')

			local CreatedGrid = setmetatable({
				CellSize = CellSize,
				Cells = {}
			}, Grid)
			StoredGrids[GridName] = CreatedGrid
			return CreatedGrid
		end
	},
	StoredGrids = StoredGrids
}

PysephDEV · March 14, 2021, 8:58pm

I just remembered the Rect datatype - I’ll rewrite my code to use that instead.

DarthFS · March 15, 2021, 3:31am

When your code is so good I can’t critique anything

Only thing I don’t see the point of is having a nested table to store only 1 function. What I mean is you don’t really need the Grid table:

return {
	new = function(CellSize, GridName)
		assert(type(CellSize) == 'number', 'You must provide a cell size')
		assert(type(GridName) == 'string', 'You must provide a grid name')
		assert(StoredGrids[GridName] == nil, 'Grid already exists')

		local CreatedGrid = setmetatable({
			CellSize = CellSize,
			Cells = {}
		}, Grid)
		StoredGrids[GridName] = CreatedGrid
		return CreatedGrid
	end,
	StoredGrids = StoredGrids,
}

Also, you have some unneeded abstraction here (Your FillCell method). You seem to call the method in only 1 area of your code. In that area, you are calling that function multiple times in a nested loop that’s O(n^2) time complexity meaning a lot of function calls will be made. Since your FillCell method is only 1 - 2 lines, you could just paste the function code there. At the same time though, it would ruin consistency since you utilize the PairFunction in other sections of your code so my suggestion is debatable

for CellX = MinX, MaxX, self.CellSize do
	for CellY = MinY, MaxY, self.CellSize do
		local Pair = 0.5 * (x + y) * (x + y + 1) + y
		self.Cells[Pair] = Data 
	end
end

In terms of architecture, your code looks perfect. I’m not an expert on 2D grids so nothing much I could suggest on that aspect. The concept of what you are trying to do definently does remind me of a QuadTree datastructure. So maybe you might wanna look into that if you haven’t already

PysephDEV · March 15, 2021, 8:08am

I made that because I wasn’t sure how to cache grids - when writing that, I was thinking of something like so:

local Module = require(Path.Grid)
local Ocean = Module.StoredGrids.Ocean or Module.Grid.new(20, 'Ocean')

But I later realized this was just really unecessary, and just went with this:

local Grid = require(Path.Grid)
local Ocean = Grid.new(20, 'Ocean') -- if the grid 'Ocean' already exists, don't create a new one

As for the quadtree data structure, that is a data structure usually used when implementing LoD and nested cells, which I don’t have. You could call this a more naïve implementation

As for regarding the FillCell method, I added it to have the code look cleaner overall, but I suppose it would be faster to embed the method directly into the loop. I’ll look into optimizing this further, thanks!

Nonaz_jr · March 21, 2021, 3:32am

Well, that’s interesting. But it’s a heavy computation compared to making a table, in itself there is no direct advantage it seems to me:

wait(8)

local testSize = 100
local start = tick()

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

local function pairsTest()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do
			
			grid[PairFunction(i, j)] = true
			
		end
	end
end

pairsTest()

print(tick() - start)
wait(5)
start = tick()

local function nestedTest()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end
end

nestedTest()

print(tick() - start)

It’s just a very rough, possibly unreliable test script, but it tells me:
0.0028567314147949 - Server - Script:24
0.0004427433013916 - Server - Script:43

However, the real difference may be in accessing the grid:

wait(8)

local testSize = 100
local start = tick()

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

local function pairsTest()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do
			
			grid[PairFunction(i, j)] = true
			
		end
	end
	
	
	local function loop()
		for i, v in pairs(grid) do
			local myOperation = v == true
		end
	end
	
	for i = 1, 10 do
		loop()
	end
	
	
	
end

pairsTest()

print(tick() - start)
wait(5)
start = tick()

local function nestedTest()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end
	
	
	local function loop()
		for i, v in ipairs(grid) do
			for j, w in ipairs(v) do
				local myOperation = w == true
			end
		end
	end

	for i = 1, 10 do
		loop()
	end
	
end

nestedTest()

print(tick() - start)

0.00826096534729 - Server - Script:38
0.0059802532196045 - Server - Script:71

Maybe not entirely fair, using ipairs… How about this:

wait(8)

local testSize = 100
local start = tick()

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

local function pairsTest()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do
			
			grid[PairFunction(i, j)] = true
			
		end
	end
	
	local function accessGridPoint(x, y)
		return grid[PairFunction(x, y)]
	end

	local function loop()
		for i = 1, testSize do
			for j= 1, testSize do
				local myOperation = accessGridPoint(i, j)
			end
		end
	end
	
	for i = 1, 10 do
		loop()
	end

end

pairsTest()

print(tick() - start)
wait(5)
start = tick()

local function nestedTest()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end
	
	local function accessGridPoint(x, y)
		return grid[x][y]
	end
	
	local function loop()
		for i = 1, testSize do
			for j= 1, testSize do
				local myOperation = accessGridPoint(i, j)
			end
		end
	end

	for i = 1, 10 do
		loop()
	end
	
end

nestedTest()

print(tick() - start)

0.016347646713257 - Server - Script:44
0.0047152042388916 - Server - Script:81

Admittedly, it’s a pretty bad test, with a messy quick script, just pressing play in studio… My results are not very consistent, but I tried changing the order and for me, the nested array seems faster in every case. The waits are to prevent interference from loading and garbage collector, but yeah…

Interesting though, and I still think its a nice trick… Technically the convention is not to store non-consecutive integers in a lua table. Someone might think to use ipairs on it To be honest I didn’t grasp the rest of your grid module yet, but it in general looks very well written. Interesting post, thanks!

Extra:

Kind of unrelated: I wondered if the order of accessing the nested tables would affect performance… Often it does due to the way the CPU reads cache lines from memory. But the results of tests like this are not accurate enough… I guess it only demonstrates that my quick tests are pretty bad

wait(8)

local testSize = 100
local start = tick()

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

local function pairsTest()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do
			
			grid[PairFunction(i, j)] = true
			
		end
	end
	
	local function loop()
		for i, v in pairs(grid) do
			local myOperation = v == true
		end
	end
	
	for i = 1, 10 do
		loop()
	end
	
end

pairsTest()

print(tick() - start)
wait(5)
start = tick()

local function nestedTest1()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end
	
	local function loop()

		for i = 1, testSize do
			for j= 1, testSize do
				local myOperation = grid[i][j] == true
			end
		end
	end

	for i = 1, 10 do
		loop()
	end
	
end

nestedTest1()

print(tick() - start)

wait(5)
start = tick()

local function nestedTest2()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end

	local function loop()

		for i = 1, testSize do
			for j= 1, testSize do
				local myOperation = grid[j][i] == true
			end
		end
	end

	for i = 1, 10 do
		loop()
	end

end

nestedTest2()

print(tick() - start)

0.0032415390014648 - Server - Script:38
0.0030889511108398 - Server - Script:72
0.0030274391174316 - Server - Script:108

p.s. I removed some empty lines so the print line numbers might not work out.

PysephDEV · March 21, 2021, 6:33am

Nonaz_jr:

wait(8)

local testSize = 100
local start = tick()

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

local function pairsTest()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do
			
			grid[PairFunction(i, j)] = true
			
		end
	end
end

pairsTest()

print(tick() - start)
wait(5)
start = tick()

local function nestedTest()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end
end

nestedTest()

print(tick() - start)

Well, first off, the difference would be pretty substantial because youre calling PairFunction, which is quite expensive, rather than calculating the pair directly. This already makes it unfair since the nested test doesn’t call any external function. If you switch out PairFunction and apply the calculation directly, this is the result:

(I let this run for a bit to see the difference more clearly)

PysephDEV · March 21, 2021, 6:46am

Forgot to mention: This is an old module anyway that has issues, since the pairing function only works on natural numbers, not negative ones.

Nonaz_jr · March 21, 2021, 3:12pm

Wow, that’s interesting Your method of measurement is much more fair, because my method has all kinds of background noise… But I wonder how you put this in your script to measure it like this. In any case, in my naive approach I consistently get results similar to this in the below script:

0.0015835762023926 - Server - Script:24
0.0015559196472168 - Server - Script:43
0.00063824653625488 - Server - Script:61
0.0008091926574707 - Server - Script:86

(Order: pairs function, pairs calculated directly, nested, nested with dummy function call)

It seems to me a simple function call should not be as expensive as the multiplication. I suppose at this point all kinds of architectural differences come into play, as well. But I wonder if, as the table gets larger, indexing might not become actually slower than indexing two smaller nested tables (O(1) lookup might not hold or require bigger hash codes). And as the numbers get larger, the multiplication gets heavier. I wonder if it could really be computationally cheaper.

Nonetheless, it is a beautiful function, a nice mapping of a 2D grid space to the natural numbers. And indeed, continuation to the negative and complex numbers gives some real food for thought.

Also note how an infinite grid (starting at index 0) would actually fill all the natural numbers, it’s not just mapping to unique integers. Though any finite rectangular grid size>0 will have gaps (for example [1,1] gives 4 but [0,2] gives 3). Quite elegant, I’m sure there are many more interesting use cases for this.

wait(8)

local testSize = 100
local start = tick()

local function PairFunction(x, y)
	return 0.5 * (x + y) * (x + y + 1) + y
end

function pairsTest()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do
			
			grid[PairFunction(i, j)] = true
			
		end
	end
end

pairsTest()

print(tick() - start)
wait(5)
start = tick()


function pairsTest2()
	local grid = {}

	for i = 1, testSize do
		for j= 1, testSize do

			grid[0.5 * (i + j) * (i + j + 1) + j] = true

		end
	end
end

pairsTest2()

print(tick() - start)
wait(5)
start = tick()
function nestedTest()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = true

		end
	end
end

nestedTest()

print(tick() - start)
wait(5)
start = tick()

local function dummyFunction(i, j)
	if i and j then
		return true
	end
end

function nestedTest2()
	local grid = {}

	for i = 1, testSize do
		grid[i] = {}
		for j= 1, testSize do

			grid[i][j] = dummyFunction(i, j)

		end
	end
end

nestedTest2()

print(tick() - start)

In general, I would advise against factoring out functions, it rarely gives significant performance improvements. Same goes for temporary variables. Only perhaps in core areas, such as nested loops, is it ever even worth considering. In all other cases, readability and maintainability of the code is of far greater importance: contrary to in the past, nowadays such amounts of time wasted by a computer is of much less importance than time wasted by us. Almost always such optimizations have neglible effect, and make the code less accesssible and can give the next programmer (often, you in a few months) a big headache. In fact, one thing about the code posted by @PysephDEV is that it is very well structured, in single purpose functions, with good naming: beautiful code! (though I am just not the biggest fan of the way the last return statement is written, but that aside, very nice)

Edit: It’s often hard to tell what actually happens under the hood. I just came upon this spec for lua 5 mentioning the following:

“New algorithm for optimizing tables used as arrays: Unlike other scripting languages, Lua does not offer an array type. Instead, Lua programmers use regular tables with integer indices to implement arrays. Lua 5.0 uses a new algorithm that detects whether tables are being used as arrays and automatically stores the values associated to numeric indices in an actual array, instead of adding them to the hash table. This algorithm is discussed in Section 4.”

Section 4 then explains how non-sparse (over half the keys used) integer-keyed tables get an optimization… This might interact questionably with your script, hence the convention. An alternative might be to use strings as keys but I suspect performance will drop.

I also wondered if this approach using table.insert could be faster, but it was in fact slower (probably because now the index needs to be calculated, while it was already available):

start = tick()


local function nestedTest3()
	local grid = {}

	for i = 1, testSize do
		table.insert(grid, {})
		for j= 1, testSize do

			table.insert(grid[i], true)

		end
	end
end

nestedTest3()

print(tick() - start)