Is there any benefit to implementing non-table data structures in lua?

AstroCode · November 25, 2019, 4:30pm

In my java class, we’re learning about single/double/circular linked lists and hash maps. AFAIK tables in lua have all of these functionalities. From reading here on the forum, trees seem to be an important data structure to define ourselves, but would these be of any benefit? What are some beneficial custom data structures?

stravant · November 25, 2019, 5:10pm

Intrusive doubly linked lists can be useful at times (Basically, using the Lua objects you already have themselves as the nodes in a linked list by adding previous and next fields on them). The reason being that an object can remove itself from an intrusive linked list which is a nice property to have.

Hash maps: Yes, every Lua table is just a hash map. There’s nothing to implement here, but there’s a lot of common coding patterns on Roblox using hash map behavior since Lua gives it to you for free.

Heaps: Be very wary of implementing these in Lua, I have yet to see a Lua heap implementation that was actually faster than just using a flat array, without a JIT compilation the math / function calls to do heap management / traversal are just too slow compared to linearly iterating an array.

IdiomicLanguage · November 25, 2019, 7:13pm

In every computer science programs that I know of, there is at least one class dedicated to data structures. The study of data structures is based around “asymptotic runtimes,” which is a fancy way of saying how does the algorithm scale when we add more data. The main notation for the runtime is the big-O “O(n)”, which means in the worst case how does our runtime scale? It doesn’t consider the constant time cost of individual operations, only the scale. Thus, as @stravant said, heaps which have better asymptotic runtime than arrays for insertion in ordered data, may be less performant in some languages due to runtimes. Big-O is a theoretical measure meant to be independent of the machine it is running on, thus allowing algorithms to be better compared.

Since tables are the only data structure in Lua, all other structures in Lua must be built with tables (although technically you could implement custom ones using upvalues and closures, but we’ll ignore that).

There are many, many data structures and each have their own properties and make specific applications more efficient. For example, given an array of Vector3 points A how would you find all the points with distance d from a given point b? You would have to perform a scan over the entire array. If the array had n elements, this would result in a runtime of O(n). If you use an octree, the search takes O(log(n)), MUCH faster for larger numbers of points.

Spatial structures like the Octree allow fast spatial operations. Priority queues keep an ordered list of elements for quick access and insertion. Graphs have a HUGE number of applications, although sometimes not considered a “data structure”. Graph theory finds its fingers in neural networks / algorithmic differentiation and their related optimization of loss functions, pathfinding, computer vision, electric potentials, symbolic AI, compilers, and MUCH more. There are many ways to represent graphs, problems can be solved on them directly like in the case of pathfinding or image segmentation for computer vision, or graphs can be manipulated to optimally solve other problems like NNs and compilers. Graphs can efficiently store data, operations, 3D objects, regions, or pretty much anything else. Trees are a subset of graphs without cycles and a specified root node.

But to get back to the benefit of data structures in Lua… it depends on what you want to do in Lua. Many on this forum are of the opinion that more resource intensive applications should be done in the engine in C++ and not in Lua and the cases for which an intense algorithm would need to run in a script to make a game are few, thus these data structures are not needed. Most of the cases I’ve seen are for preprocessing of maps / parts to automatically generate relationships, creating a custom pathfinder to get features not included in Roblox’s, or plugins. By far most applications are offline when a server is not running.

Challenge accepted! What problem / conditions did you have in mind?

Results

Okay, so I’ve written a heap implementation, verified it, and tested it. Here are is the code:

Code

local function getMin(array, last)
	local min_element = math.huge
	for _, element in ipairs(array) do
		if element < min_element and element > last then
			min_element = element
		end
	end
	return min_element
end

local function heapInsert(heap, element)
	local i = #heap + 1
	local p = (i - i % 2) / 2
	while p > 0 do
		if element <= heap[p] then
			heap[i] = heap[p]
			i = p
			p = (i - i % 2) / 2
		else
			break
		end
	end
	heap[i] = element -- if i == 1, this will initialize the heap.
end

local function popMin(heap)
	local i = 1
	local c = 2 * i
	local min = heap[1]
	local element = heap[#heap]
	heap[#heap] = nil
	if #heap == 0 then
		return min
	end
	while c < #heap do
		if heap[c + 1] < heap[c] then
			c = c + 1
		end
		if element <= heap[c] then
			heap[i] = element
			return min
		else
			heap[i] = heap[c]
			i = c
			c = 2 * i
		end
	end
	if c == #heap and element > heap[c] then
		heap[i] = heap[c]
		i = c
	end
	heap[i] = element
	return min
end

local scale = {[0] = '', 'milli', 'micro', 'nano', 'pico', 'fempto', 'atto', 'zepto', 'yocto'}
local function toMetric(v, t)
	local i = 0
	while v < 1 do
		v = v * 1000
		i = i + 1
	end
	return ('%.3f %s%s'):format(v, scale[i], t)
end

local function avgTime(f, n, ...)
	local runs = {}
	wait()
	for i = 1, n do
		local start = tick()
		f(...)
		local finish = tick()
		runs[i] = finish - start
	end
	wait()
	local avg = 0
	for i, v in ipairs(runs) do
		avg = avg + v
	end
	avg = avg / n
	
	local std = 0
	for i, v in ipairs(runs) do
		std = std + (v - avg) ^ 2
	end
	std = math.sqrt(std) / (n - 1)
	return toMetric(avg, 'seconds'), toMetric(std, 'seconds')
end

local table_insert = table.insert
local function verify(num_el, get_first_k, insert, pop)
	local a = {}
	for i = 1, num_el do
		insert(a, math.random())
	end

	local last = 0
	for i = 1, get_first_k do
		local new = pop(a, last)
		print(new)
		assert(new >= last)
		last = new
	end
end

local function test(num_el, get_first_k, insert, pop)
	local a = {}
	for i = 1, num_el do
		insert(a, math.random())
	end

	local last = 0
	for i = 1, get_first_k do
		last = pop(a, last)
	end
end

The code to verify the implementations is at the bottom. Here is how to run it in studio:

wait(3)

local num_runs = 100
local num_elements = 1
local num_to_get = 1

assert(num_to_get <= num_elements)

print(('Num Runs: %d\nNum Elements: %d\nNum to Get: %d'):format(
	num_runs, num_elements, num_to_get))
print(('Table: %s, std = %s'):format(
	avgTime(test, num_runs, num_elements, num_to_get, table.insert, getMin)))
print(('Heap: %s, std = %s'):format(
	avgTime(test, num_runs, num_elements, num_to_get,   heapInsert, popMin)))

Inserting a single element arrays obviously beat the heap by a little:

  Num Runs: 1000
  Num Elements: 1
  Num to Get: 0
  Table: 751.257 nanoseconds, std = 51.777 nanoseconds
  Heap: 778.913 nanoseconds, std = 29.530 nanoseconds

Interestingly, even inserting and then getting a single element, my heap outperforms the array. This pattern of the heap outperforming the is consistent when the number of inserted elements is equal to the number of elements retrieved.

  Num Runs: 1000
  Num Elements: 1
  Num to Get: 1
  Table: 890.255 nanoseconds, std = 41.224 nanoseconds
  Heap: 858.784 nanoseconds, std = 40.408 nanoseconds

Performing 7 insertions then 6 pops allows the heap to outperform the array.

  Num Runs: 1000
  Num Elements: 7
  Num to Get: 6
  Table: 4.586 microseconds, std = 86.836 nanoseconds
  Heap: 4.372 microseconds, std = 98.535 nanoseconds

This shows that as the number of insertions are nearly the same at smaller numbers, heaps will outperform arrays.

As the number of elements increases though, arrays performance greatly decreases. Take for example 1000 insertions and just 6 get minimum operations:

  Num Runs: 1000
  Num Elements: 1000
  Num to Get: 6
  Table: 297.271 microseconds, std = 842.197 nanoseconds
  Heap: 280.652 microseconds, std = 862.436 nanoseconds

However if we perform 12 get minimum operations:

  Num Runs: 1000
  Num Elements: 1000
  Num to Get: 12
  Table: 462.817 microseconds, std = 1.129 microseconds
  Heap: 286.938 microseconds, std = 831.686 nanoseconds

The general pattern is that the heap outperforms the array when the number of elements removed nears the number of items inserted. Insertion in an array is O(1), insertion in a heap is O(log(n)). However, removal from an array is O(n) where a heap is O(log(n)). As the number of elements increases, the more attractive the heap becomes, and the less the ratio of insertions per removal can be while the heap is more efficient.

Since decrease key operations are a very commonly used with binary heaps, they are worth considering. However, since the asymptotic runtime is the same for arrays and heaps as the insertion operation (given that the position of a value is already known), we can replace ‘insertion’ in the previous paragraph and all the statements will still hold. In other words, the ratio of decrease key operations and inserts to the number of removals determines if a heap is more efficient, with the threshold becoming lower as the number of elements increases (which increases with each insert).

As shown in the trivial cases, heaps are not vastly outperformed by arrays and are still viable in Roblox Lua.

Edit: I’ve corrected the results with proper standard deviation, more tests, and a corrected version of the heap.

stravant · November 26, 2019, 4:35pm

The particular case is for pathfinding. I was unable to get more perf out of my A* implementation using a heap for the open list than just using a flat array the last time I attempted it. This was for a case where my open list peaked at about 100 elements.

Also it’s possible that Luau has changed things. Did you have the new VM enabled in these tests?

IdiomicLanguage · November 26, 2019, 11:06pm

No, this is the old VM. I unfortunately haven’t tried out the new one yet. The last time I performed A* was on up to a 60 by 60 hexagonal maze. At the time I did a lot of digging into the performance since I made it to demonstrate how using information available to script allows developers to write pathfinders that vastly outperform the builtin pathfinder. I’ve posted the video on YouTube now for ease of viewing:

The original post is here and includes the place file with the pathfinder:

It averaged around 1000 open nodes and in my experience a heap yielded better performance. I’d have to double check if this was the case for the smaller grid sizes as well, I was more worried about large grids. I’ll pop it open next time I’m in Windows, I’m currently running a process I can’t stop for awhile.

I assume the reason why the heap performs better is that the number of nodes removed is nearly equal to the number of nodes inserted. In every grid size tested on in the video, the number of nodes removed from the queue is 20 to 50 nodes less than the number of nodes inserted. The number of nodes visited ranged from 206 to 2347 with an average number of open nodes while inserting was between 139 to 1300.

BTW,
I noticed in the test code above that the standard deviation is calculated incorrectly. What you see is roughly the standard deviation squared, so the actual deviation is greater. I’ll also rerun those tests when I get into Windows, unless someone else beats me to it and wants to post their results.

stravant · November 27, 2019, 12:11am

Nice results. That’s pretty conclusive, it looks like I might just be wrong on this.

IdiomicLanguage · November 27, 2019, 3:10am

So I did some digging into the A* implementation I posted above. I realized that I actually rolled an ordered linked list for the priority queue, so I did some testing with an unordered array, ordered linked list, and min heap.

I realized a couple things while making this. First, my previous implementation of the ordered link list never removed a node when it was visited, it simply moved onto the next node. This is normally fine, but artificially inflated the average number of nodes in the fringe while inserting. I resolved this issue for these tests.

Second, this test on a maze doesn’t reflect many A* use cases. Since the maze is “perfect” (contains no loops) a node is never reach from different directions and thus its cost is never decreased once inserted into the fringe. Thus, these results although impressive are still not enough for me to feel completely confident saying that heaps always outperform arrays in A*. It may be that in your case there were many decrease key operations which are free in an unordered array but cost logarithmic time in a minimum heap.

Third, this is a hexagonal map instead of rectangular. I’m not entirely sure if this increases the average number of inserts per a node or not (since it still can’t create loops), but it may be a factor.

Also, I discovered a bug in the heap code that may have affected the above tests. I’ll rerun them with the fixed heap and fixed standard deviation, and then update that post.

The following results are on the 20x20 and 60x60 maps seen in the video above and 100x100 / 200x200 maps to see how the algorithms scale. Here is a picture of a 200x200 maze:

Interestingly, the heap was able to perform decently on this large maze with only ~0.07 seconds to find a path from one corner to another on my machine.

My Machine

Roblox Studio – old VM, not in beta

StudioVersion
CPU
RAM

The tests:

Ordered Linked List

Here is the Linked List code:

Code

local function remove(key)
	local s = tick()
	len = len - 1
	key.next.prev = key.prev
	key.prev.next = key.next
	t = t + tick() - s
end
local function getMin()
	local s = tick()
	local key
	key = fringe.key
	remove(fringe)
	fringe = fringe.next
	t = t + tick() - s
	return key
end
local function insert(key)
	c = c + len
	i = i + 1
	len = len + 1
	local s = tick()
	local cur = fringe
	while key.value > cur.value do
		cur = cur.next
	end
	key.prev = cur.prev
	key.next = cur
	cur.prev = key
	key.prev.next = key
	if cur == fringe then
		fringe = key
	end
	t = t + tick() - s
end
local function decrease(key)
	remove(key)
	insert(key)
end

And here are the results:

20x20 results

  Num Nodes: 400
  Run time: 0.00085234642028809
  Percent time spend in queue ops: 0.10769230769231
  Num inserts: 205
  Avg nodes in queue during insert: 6.2585365853659
  Num nodes visited: 197

60x60 results

  Num Nodes: 3600
  Run time: 0.0082039833068848
  Percent time spend in queue ops: 0.24739901191514
  Num inserts: 2634
  Avg nodes in queue during insert: 22.60782080486
  Num nodes visited: 2618

100x100 results

  Num Nodes: 10000
  Run time: 0.024052381515503
  Percent time spend in queue ops: 0.25522635131786
  Num inserts: 7797
  Avg nodes in queue during insert: 25.584840323201
  Num nodes visited: 7771

200x200 results

  Num Nodes: 40000
  Run time: 0.14127492904663
  Percent time spend in queue ops: 0.26282001518859
  Num inserts: 31729
  Avg nodes in queue during insert: 47.935894607457
  Num nodes visited: 31657

Unordered Array

Here is the Unordered Array code:

Code

local function remove(key)
	local s = tick()
	
	len = len - 1
	local last = fringe[#fringe]
	fringe[key.index] = last
	fringe[last.index] = nil
	last.index = key.index
	key.index = nil
	
	t = t + tick() - s
end

local function getMin()
	local s = tick()
	
	if #fringe == 0 then
		return
	end

	local min = fringe[1]
	local ind = 2
	while ind <= #fringe do
		if fringe[ind].value < min.value then
			min = fringe[ind]
		end
		ind = ind + 1
	end
	remove(min)
	
	t = t + tick() - s
	return min
end

local function insert(key)
	c = c + len
	i = i + 1
	len = len + 1
	local s = tick()
	
	local ind = #fringe + 1
	fringe[ind] = key
	key.index = ind
	
	t = t + tick() - s
end

local function decrease(key)
	-- This operation is free.
end

And the Unordered Array results:

20x20 results

  Num Nodes: 400
  Run time: 0.0010368824005127
  Percent time spend in queue ops: 0.39572315474822
  Num inserts: 283
  Avg nodes in queue during insert: 6.3462897526502
  Num nodes visited: 280

60x60 results

  Num Nodes: 3600
  Run time: 0.010080814361572
  Percent time spend in queue ops: 0.46859183576936
  Num inserts: 2383
  Avg nodes in queue during insert: 18.085606378514
  Num nodes visited: 2356

100x100 results

  Num Nodes: 10000
  Run time: 0.047679424285889
  Percent time spend in queue ops: 0.57080137212349
  Num inserts: 9716
  Avg nodes in queue during insert: 29.694112803623
  Num nodes visited: 9703

200x200 results

  Num Nodes: 40000
  Run time: 0.27895498275757
  Percent time spend in queue ops: 0.53525318327348
  Num inserts: 36671
  Avg nodes in queue during insert: 44.058956668757
  Num nodes visited: 36628

Minimum Heap

Here is the Minimum Heap code:

Code

local function getMin()
	local s = tick()
	len = len - 1

	local cur = 1
	local child = 2 * cur
	local min = fringe[1]
	local element = fringe[#fringe]
	fringe[#fringe] = nil
	if #fringe == 0 then
		
		t = t + tick() - s
		return min
	end
	while child < #fringe do
		if fringe[child + 1].value < fringe[child].value then
			child = child + 1
		end
		if element.value <= fringe[child].value then
			fringe[cur] = element
			element.index = cur
			
			t = t + tick() - s
			return min
		else
			fringe[cur] = fringe[child]
			fringe[cur].index = cur
			cur = child
			child = 2 * cur
		end
	end
	if child == #fringe and element.value > fringe[child].value then
		fringe[cur] = fringe[child]
		fringe[cur].index = cur
		cur = child
	end
	fringe[cur] = element
	element.index = cur
	
	t = t + tick() - s
	return min
end

local function insert(element)
	c = c + len
	i = i + 1
	len = len + 1
	local s = tick()

	local cur = #fringe + 1
	local p = (cur - cur % 2) / 2
	while p > 0 do
		if element.value <= fringe[p].value then
			fringe[cur] = fringe[p]
			fringe[cur].index = cur
			cur = p
			p = (cur - cur % 2) / 2
		else
			fringe[cur] = element
			element.index = cur
			
			t = t + tick() - s
			return
		end
	end
	fringe[cur] = element
	element.index = cur
	
	t = t + tick() - s
end

local function decrease(element)
	local s = tick()

	local cur = element.index
	local p = (cur - cur % 2) / 2
	while p > 0 do
		if element.value <= fringe[p].value then
			fringe[cur] = fringe[p]
			fringe[cur].index = cur
			cur = p
			p = (cur - cur % 2) / 2
		else
			fringe[cur] = element
			element.index = cur
	
			t = t + tick() - s
			return
		end
	end
	
	t = t + tick() - s
end

And here is the Minimum Heap results:
20x20 results

  Num Nodes: 400
  Run time: 0.00061559677124023
  Percent time spend in queue ops: 0.27498063516654
  Num inserts: 161
  Avg nodes in queue during insert: 7.2360248447205
  Num nodes visited: 144

60x60 results

  Num Nodes: 3600
  Run time: 0.01081109046936
  Percent time spend in queue ops: 0.36372257139707
  Num inserts: 3012
  Avg nodes in queue during insert: 27.928618857902
  Num nodes visited: 2982

100x100 results

  Num Nodes: 10000
  Run time: 0.01996636390686
  Percent time spend in queue ops: 0.33850379127112
  Num inserts: 5525
  Avg nodes in queue during insert: 26.226787330317
  Num nodes visited: 5483

200x200 results

  Num Nodes: 40000
  Run time: 0.077650547027588
  Percent time spend in queue ops: 0.32077435598268
  Num inserts: 18392
  Avg nodes in queue during insert: 35.562200956938
  Num nodes visited: 18361

To answer the original question, different data structures are faster in different cases. For smaller fringes, an ordered linked list is faster for the A* priority queue than either an array or heap. For large fringes it seems that heaps are the best option. If there are many ways to reach a node (such as in a very open map with a grid representation) then an array may be more performant (yet to be tested).

Anyways, I’m losing interest in this problem, so I probably won’t get around to testing on an open map (the maze one was quickly available to me). If anyone else would like to test it, I’d be appreciative.

(ffs, it is difficult to earn a “solution” from the topic author :P)