Optimizing 2D array generation for flow fields

Psudar · October 22, 2021, 7:51am

Hey guys, been a while since I’ve used the dev forum.

I’m working on creating flow fields that will be used to pathfind for large groups of units to essentially work towards the same goal.

For an explanation of flow fields in context see:
Flow-Field Explanation

This is the code I use to represent my world as a 2d grid. Each X position is an array that is indexed by corresponding Y positions. (eg. array[x][y]). At a[x][y] exists a “cell,” which is essentially a table of information for that location in the grid.

-- the meat of the code
function Grid.new(gridSize)
	local grid = {}	
	-- create new Cell {} at every grid [x][y] index
	for x = 1, gridSize.X do
		grid[x] = {}
		for y = 1, gridSize.Y do
			grid[x][y] = {}
		end
	end
	return grid
end

My issue is that this method of generation is very slow when creating large grids.
For example, through some (probably slightly inaccurate) tick() measurements, a 1000x1000 grid takes about 3 to 4.5 seconds to generate. In fact, running this through some custom debug code I’ve made to visualize it usually ends in some sort of “execution run time exceeded” error (or something of that nature) when trying to visualize large grids.

Here’s an example of what this may look like when implemented and later visualized:

The green cell is the goal cell and the surrounding cells all point to it. Essentially an AI would know which cell it’s standing in and follow the direction stored at that location. Thus finding its way to the goal.
(I’ve excluded how this is all drawn and calculated as it’s not important.)

This is a solution I tried with representing the grid but the results were more or less the same:

   -- from lua.org/pil/
    mt = {}
    for i=1,N do
      for j=1,M do
        mt[i*M + j] = 0
      end
    end

Thus the time taken seems to be mostly due to the sheer amount of code being run than anything with indexing/writing to a table or memory itself (or so I believe).

One thing I’ve yet to do but am considering is splitting the world up into larger tiles, then figuring out which path the units must take through the large tiles to reach a certain goal. Once that path of large tiles is found, I would then proceed to generate smaller tiles inside of those for avoiding collisions and whatnot. This method can be seen here: Pathfinding w/ 30k Units

Overall, I hope one of you can help me figure out a method that speeds up generation time. Ideally, I’d like to be able to still get the information about my tiles in terms of X,Y notation, whether that be literal array[x][y] as I’m doing now or something less intuitive.

Thanks for reading this far, and let me know about any questions/corrections I should address.

Hexcede · October 22, 2021, 10:31pm

My first recommendation would be to use table.create to optimize allocations:

function Grid.new(gridSize)
	local grid = table.create(gridSize.X)
	for x = 1, gridSize.X do
		grid[x] = table.create(gridSize.Y)
		for y = 1, gridSize.Y do
			grid[x][y] = {}
		end
	end
	return grid
end

This results in new values not having to reallocate the size of the table, and can significantly speed up large table creation when you know the size (even if you’re filling unique values!)

My second recommendation would be to use one-depth tables and calculate indices, which further improves allocation speed since you’re only allocating one table instead of gridSize.X tables:

function Grid.new(gridSize)
	local grid = table.create(gridSize.X * gridSize.Y)
	for cellNumber=1, gridSize.X * gridSize.Y do -- Area = width * height
		grid[cellNumber] = {}
	end
	return grid
end

-- Get cell at x, y
grid[x + gridSize.X * (y - 1))
-- x selects column, then we add the horizontal size to "wrap" to the next row
-- so if the grid's width is 10, the 10th cell is the last in the first row
-- that makes the 20th cell the last in the second row
-- etc

This is similar to how you would implement data oriented design, you’re using a “handle,” the cell’s number, and you’re only using one giant table of all the possible cells, which is significantly faster (and more memory efficient!) compared to a multi-depth table. This is how I would personally do something like flow fields, where you potentially have hundreds of thousands of cells, since even shaving a few ms off per cell adds up to some very large numbers.

If you want to get the X and Y position inside your generator loop, using two for loops for x and y doesn’t hurt (you can use the same index calculation), but if you aren’t satisfied by that and want to squeeze every last millisecond out you can always take the opposite approach, calculate the X and Y from the cell number:

-- Since the cell number starts at 1, we subtract 1 to have it start at 0 (So the modulo and division math works correctly)
-- Since the x and y also start at 1, we then add that 1 back
local x = (cellNumber-1) % cellSize.X + 1 -- Every width cells is a new row, so we want the remainder
local y = math.floor((cellNumber-1) / cellSize.X) + 1 -- Every width cells is a new row, so, the number of times the width goes into the cell number must be the y value

A last and final recommendation is that, if you calculate your flow field’s direction based on X and Y you do so during the creation of each cell, directly inside of the cell’s table:

function Grid.new(gridSize)
	local width, height = gridSize.X, gridSize.Y -- Index optimization (I think this only really applies to Vector2s now with native Vector3s but who knows)
	local grid = table.create(width * height)
	-- Fill the flow field grid
	for cellNumber=1, width * height do
		local x, y = -- Calculate X and Y coordinates from the cell number
			(cellNumber-1) % width + 1,
			math.floor((cellNumber-1) / width) + 1
		grid[cellNumber] = {
			Direction = Vector3.new(x, 0, y).Unit -- Example
		}
	end
	return grid
end

Hopefully this is helpful, and, I’d love to hear how this turns out for you

P.s. os.clock is designed for benchmarking, and probably serves better than tick now that its in the engine.

Psudar · October 23, 2021, 5:13am

Thanks dude, let me get home and see how this fares. Also thanks for the extra info about os.clock, never used that before. I tried using the debugger w/ debug.profilebegin() and such but couldn’t figure out how to read the graph. Anyways, I completely forgot about lua allocation idiosyncrasies, especially with tables. I wasn’t even thinking about that.

Edit: Wow, this is nearly 6x faster.

CoderHusk · October 27, 2021, 4:45am

I should also add, for my render engine ive been making in roblox, the first common problem i noticed was that it would take like milliseconds to rasterize but it would take seconds to update that data to the screen. now why was this? turns out strings, and for that matter anything that inherits the roblox data model (instances, vector3s, cframes) are sorta slow on a large scale. What I ended up doing is I put the guis reference into my grid array then referenced it from that array, instead of checking its string name. This allows me to get rainbow bunny at real time!