Lua 5.1 Internals: Tables I

Lua 5.1 Internals: Tables I

I will try to explain how tables work internally within Lua 5.1.
We will, in this tutorial, write our own basic implementation of Lua’s tables.
We will however go into more things than only tables. This tutorial will cover (in order):

  • The static array
  • Implementing a dynamic array
  • Implementing hash tables with only numbers
  • Implementing hash tables with all types
  • The next function

Take note that this code is intended for vanilla Lua, and will only support part of Roblox Lua.
For writing the hashing functions I am going to assume there is a bit32 library. This library is not included by default in Lua 5.1 and is only included in this tutorial because explaining and writing bitwise operations in pure Lua is outside the scope of this tutorial. I will explain the optimisations used by Lua in another post, as this tutorial is only ment to explain how hash tables work.

A quick note.

This explanation is ment for people who (only) know Lua. I will also assume that you know (the Roblox-relevant parts of) Part I and II of Programming in Lua 5.0.
If you’re experienced with C, I suggest you search for Mike Pall’s recommended reading order on the Lua 5.1 source.
Some ideas might seem obvious to you if you have a background in a lower-level language, but for the people who are only familiar with Lua, I will try and explain this in Lua as much as possible.
For the example code I am going with readability and easy to explain code over efficient and optimized code. The examples don’t handle edgecases at all. (We’re working in Lua, it’s not like you’re going to write good code.) Also, don’t take my word on anything I say here :-).

The static array

For people only familiar with Lua and other high-level languages, arrays are (in general) dynamic.
However, this is all handled behind the scenes. If you add values to arrays, Lua will automatically resize the arrays. In the language Lua was written in C, arrays will not automatically resize.
For the sake of this tutorial I will use a fake class called Array. You can find the source here. The documentation for this class is roughly:

Array Array.new(int size):
Return an array of size “size”

Array[int n]:
Return a value within this array at index “n”.
If n is not within the range of this array, the program will crash.
DO NOT INDEX OUTSIDE THE RANGE OF THIS ARRAY.

Array[int n] = value:
Set a value within this array at index “n”.
If n is not within the range of this array, the program will crash.
DO NOT INDEX OUTSIDE THE RANGE OF THIS ARRAY.

void Array:destroy():
Remove the array from memory.

This array does NOT have any other features like the length operator (#) or insertion/get methods.

For people familiar with anything else than Lua, it might seem weird that the index does not start at 0.
But to keep arrays similar to Lua’s “arrays”, the index will start at 1.

We can roughly use the array as follows:

local	my_favorite_numbers = Array.new(5);

my_favorite_numbers[1] = 0;
my_favorite_numbers[2] = 1;
my_favorite_numbers[3] = 3;
my_favorite_numbers[4] = 3;
my_favorite_numbers[5] = 7;
for i = 1, 5 do
	print(my_favorite_number[i])
end
-- output;
-- 0 
-- 1 
-- 3
-- 3
-- 7

The moment we try to index anything outside of the range of this array (1 to 5 incl.) our program will error. If we have not defined anything at the index, it will return nil.

local	numbers = Array.new(5);

print(numbers[5]); -- nil
print(numbers[6]); -- this would error because our Array only has 5 values. 

Implementing a dynamic array

If we have a program which requires an unknown size of values within our array (let’s say, reading lines from a script), we would run into an issue with this Array class. We can however use this array class to create an automatically resizing array. All we need to implement this array is two new variables: a size and a capacity. The size keeps track of how many values we’re currently storing within the array and the capacity will keep track of how many values we can still store in the array.

First, we set up our prototype:

local		DynArray = {};

function	DynArray:new()
	local	new =
	{
		size = 0;				-- How many things are in our array? (0 at the start)
		capacity = 4;			-- How many things can be in our array? (4 at the start)
		array = Array.new(4);	-- Our static array, hidden from the user.
	};

	setmetatable(new, self);
	self.__index = self;
	return	(new);
end

Then we add basic get-set functionality:

function	DynArray:set(index, value)
	self.array[index] = value;
end

function	DynArray:get(index)
	return	(self.array[index]);
end

Now we need a function that can add something to our array:

function	DynArray:insert(value)
	self.size = self.size + 1;
	self.array[self.size] = value;
end

Take note that we’re still responsible for counting how many values are in our array. Array does not give us anything to help us do this, we have to implement all logic ourselves.

When we’re out of space in the array, we have to create more space for the array.
You can calculate the amount of space left in the array by subtracting the size from the capacity: self.capacity - self.size.

We want to check when there’s exactly 0 spaces left in our array before we resize.
We can do that with (self.capacity - self.size) == 0. We can rewrite it as self.size == self.capacity, because when the difference between two things is zero, they are the same.

Now we can write our array resize method and add it to our insertion method:

function	DynArray:extend()
	local	new_array;

	self.capacity = self.capacity * 2;		-- Double the capacity
	new_array = Array.new(self.capacity);	-- Create a new array with that capacity
	for i = 1, self.size do
		new_array[i] = self.array[i];		-- Copy our old array into the new array
	end
	self.array:destroy();					-- Get rid of our old array
	self.array = new_array;					-- Replace the old array with the new one
end

function	DynArray:insert(value)
	if self.size == self.capacity then		-- If there is no more space
		self:extend();						-- Double our capacity
	end
	self.size = self.size + 1;				-- Keep track of our value count
	self.array[self.size] = value;			-- Add our value to the array
end

What we’re essentially doing is keeping track of how many things are in our array, and when there are no more spaces left within our array, we create a bigger array. This is how dynamically sized arrays work.

Implementing a hash table

A hash function is any function that can be used to map data of arbritary size onto data of a fixed size.
This means that a hash function will take data of any size (a string, a number) and output a fixed-size thing (let’s say a 32-bit number). For now we’ll use a hypothetical hash function that takes any Lua value, and outputs a number. In Roblox documentation terms, it would look like:

int	hash(Variant key);

We will get back to this later. For now let’s get into the basic idea behind a hash table.
A hash table is simply an array in which the key is turned into a number, which is the index on this table. Because the result of a hash number is generally bigger than the capacity of the array, you have to take the hashed key and modulo it to the capacity of the array, to keep the number under (or equal to) the capacity of the array.

Knowing hash tables are just dynamic arrays where the hash of the key is the index, we can easily write a hash table. First we set up the prototype:

local		HashTable = {};

function	HashTable:new()
	local	new =
	{
		size = 0;
		capacity = 4;
		array = Array.new(4);
	};

	setmetatable(new, self);
	self.__index = self;
	return	(new);
end

You can see this is exactly the same as DynArray:new().
We can change our hash table to use DynArray internally instead, as they essentially have the same basic structure.

local		HashTable = {};

function	HashTable:new()
	local	new =
	{
		array = DynArray:new();
	}

	setmetatable(new, self);
	self.__index = self;
	return	(new);
end

Now we have to wrap our dynamic array methods with keys.

function		HashTable:set(key, value)
	local	hashed_key = hash(key) % self.array.capacity + 1;	-- Fit the key within our capacity

	if self.array.size == self.array.capacity then
		self.array:extend();
	end
	if not self.array:get(hashed_key) then						-- Is it a new key?
		self.array.size = self.array.size + 1;					-- Add one to our key counter.
	end
	self.array:set(hashed_key, value);
end

function		HashTable:get(key)
	return	(self.array:get(hash(key) % self.array.capacity) + 1);
end

Because our hash function just has to turn a key into a number, we can just return the actual (floored) number for now.

function	hash(key)
	if type(key) == "number" then
		return	(math.floor(key));
	else
		-- magic
	end
end

Now let’s try out our hash table:

local	new_table = HashTable:new();

new_table:insert(3, "Hi!");
print(new_table:get(3)); -- outputs Hi!
new_table:insert(4, "Hello, world!");
print(new_table:get(4)); -- outputs Hello, world!
new_table:insert(8, "Goodbye, world!");
print(new_table:get(8)); -- outputs Goodbye, world!
print(new_table:get(4)); -- outputs Goodbye, world! - this is bad!

This is not what we want. 4 and 8 both refer to "Goodbye, world!", even though we’ve set 4 to "Hello, world!". This is called a collision. There are many different ways of dealing with collision, but for now we’re just going to use a simple method that roughly does the same as what Lua does. You simply check if there’s already a value on the key. If there is, use the next empty slot.

We will however run into another issue; we do not know which (moved) key refers to what value. We can solve this by storing our key and value as a pair.

We can now add pairs to our get and set functions:

function	HashTable:set(key, value)
	local	hashed_key = hash(key) % self.array.capacity + 1;
	local	pair = Array.new(2);
	local	i = 1;

	pair[1] = key;
	pair[2] = value;
	if self.array.size == self.array.capacity then
		self.array:extend()
	end
	if not self.array:get(hashed_key) then	-- Is this a new key?
		self.array:set(hashed_key, pair);
	else
		while								-- Find a new slot:
			self.array:get(					-- Check the next slot:
				(hashed_key + i) 			-- Calculate an offset slot
				% self.array.capacity + 1	-- Modulo it to the capacity to keep it within range.
			)
			do
			i = i + 1;						-- Add one to our offset.
		end
		self.array:set((hashed_key + i) % self.array.capacity + 1, pair);
	end
	self.array.size = self.array.size + 1;
end

function	HashTable:get(key)
	local	hashed_key = hash(key) % self.array.capacity + 1;
	local	pair = self.array:get(hashed_key);	-- Check the pair on our main position
	local	offset = 1;
	local	current;

	if pair and pair[1] == key then				-- Do the keys match up?
		return	(pair[2]);							-- Give back the value.
	else										-- Otherwise:
		current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		while (not current or current[1] ~= key) and offset <= self.array.size do
			-- Skip until we find our key or have looked over every slot.
			offset = offset + 1;
			current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		end
		if offset > self.array.size then
			return	(nil);
		else
			return	(self.array:get((hashed_key + offset) % self.array.capacity + 1)[2]);
		end
	end
end

We can now retry our code:

local	new_table = HashTable:new();

new_table:insert(3, "Hi!");
print(new_table:get(3)); -- outputs Hi!
new_table:insert(4, "Hello, world!");
print(new_table:get(4)); -- outputs Hello, world!
new_table:insert(8, "Goodbye, world!");
print(new_table:get(8)); -- outputs Goodbye, world!
print(new_table:get(4)); -- outputs Hello, world!

Great! This time our code is working as intended. We now have a working hash table!
However, we can only use numbers as keys. Remember our hashing function int hash(Variant key)? We should extend it to use all types. Take note we’re not going to hash nil, because Lua tables don’t take nil keys. I am not going to use the actual Lua implementation of hashnum as it’s pretty awful to do in pure Lua.

Implementing hash tables with all types

For booleans, we just return a 0 or a 1.

function	hashboolean(key)
	-- We turn bools into 0 or 1s.
	return	(key and 1 or 0);
end

For tables, threads, userdata or functions, we use their pointer. We can get the pointer by capturing it from a tostring of the key:

function	hashpointer(key)
	-- To get the pointer of a table, thread, userdata or function,
	-- you can call tostring on it:
	-- tostring({}) -> table: 0x1234567890ABCDEF
	-- we can capture the pointer by string matching, and turn it into decimal
	--  by using tonumber(n, 16)
	return	(tonumber(tostring(key):match(": 0x(%x+)"), 16));
end

For numbers we take their individual bytes and add them together:

function	hashnum(key)
	--	Lua's hashnum will take the byte representation of the number and
	--	 add the bytes.
	local	a = 0;

	key = math.floor(key); -- bit32 doesn't take floats. Lua does not do this!
	for i = 9, 32, 8 do
		a = a + bit32.band(key, 2 ^ i - 1);
	end
	return	(a);
end

For strings we set their length as the initial hash and do some bit-shift magic for each character:

function	hashstr(key)
	local	length = #key;
	local	hash = length;					-- seed the hash with the length
	local	step = bit32.rshift(hash, 5) + 1;	-- if the string is too long, don't hash everything
	local	length1 = length;

	while length1 >= step do				-- compute the hash
		hash = bit32.bxor(
			hash,
			bit32.lshift(hash, 5)
			+ bit32.rshift(hash, 2)
			+ key:sub(length1, length1):byte()
		); -- h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1-1]));
		length1 = length1 - step;
	end
	return	(hash);
end

Now we can combine them back into our hash function:

function	hash(key)
	if type(key) == "boolean" then
		return	(hashboolean(key));
	elseif type(key) == "number" then
		return	(hashnum(key));
	elseif type(key) == "string" then
		return	(hashstr(key));
	else
		return	(hashpointer(key));
	end
end

We can now test our hash table with other types of keys:

local	new_table = HashTable:new();

new_table:set("abc", "this is the string abc");
new_table:set(true, "this is true");
new_table:set(false, "this is false");
new_table:set(HashTable, "hash table");
new_table:set(print, "this is print");

print(new_table:get("abc"));
print(new_table:get(true));
print(new_table:get(false));
print(new_table:get(HashTable));
print(new_table:get(print));
print(new_table:get("oof"));
-- Output:
--	this is the string abc
--	this is true
--	this is false
--	hash table
--	this is print
--	nil

The next function

If we want to be able to iterate over our hash table we need to make an iterator. The standard Lua table iterator is pairs. We can implement pairs like this:

--	Lua 5.0 PiL 7.3
function	pairs(t)
	return	next, t, nil;
end

You can see that it’s essentially a wrapper for next.
next takes a key, and will return the next key-value pair after that key. If there’s no key given to next, it will return the first key-value pair. If there are no more pairs, next returns nil.

Knowing this, we can try and implement our own next.
First we handle the next(t, nil) call. next(t, nil) should return the first pair of the table.

function	HashTable:next(key)
	if key == nil then							-- If our second argument is nil,
		for i = 1, self.array.capacity do		-- Go through the entire array
			local	pair = self.array:get(i);	-- Test each index

			if pair then						-- If we find a pair
				return	pair[1], pair[2]		-- Return the unpacked pair.
			end
		end
	else
		-- magic
	end
end

Now we need to allow passing keys. We can do this by skipping to the key given and then searching for the next key.

First, we skip to the key:

function	HashTable:next(key)
	if key == nil then
		-- magic
	else
		local	hashed_key = hash(key) % self.array.capacity + 1;
		local	offset = -1;
		local	prev_index;

		current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		while (not current or current[1] ~= key) and offset <= self.array.size do
			-- Skip until we find our original key
			offset = offset + 1;
			current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		end
		prev_index = (hashed_key + offset) % self.array.capacity + 1; -- Store the last index we found.
		if offset > self.array.size then		-- The key does not exist.
			return	(nil);
		end
		-- TO-DO: search for the next key
	end
end

You can see that the key searching code is the same as HashTable:get(). Now we search for the next key:

function	HashTable:next(key)
	if key == nil then
		-- magic
	else
		local	hashed_key = hash(key) % self.array.capacity + 1;
		local	offset = -1;
		local	prev_index;

		current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		while (not current or current[1] ~= key) and offset <= self.array.size do
			offset = offset + 1;
			current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		end
		prev_index = (hashed_key + offset) % self.array.capacity + 1;
		if offset > self.array.size then
			return	(nil);
		end
		offset = offset + 1; -- Skip our old key.
		current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		while
			not current
			and ((hashed_key + offset) % self.array.capacity + 1) > prev_index
		do
			-- Skip all empty keys, as long as they are after our original key.
			offset = offset + 1;
			current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		end
		if ((hashed_key + offset) % self.array.capacity + 1) < prev_index then
			-- There are no more pairs.
			return	(nil);
		else
			return	current[1], current[2];	-- We found another pair!
		end
	end
end

Our final next code will be:

function	HashTable:next(key)
	if key == nil then
		for i = 1, self.array.capacity do
			local	pair = self.array:get(i);

			if pair then
				return pair[1], pair[2]
			end
		end
	else
		local	hashed_key = hash(key) % self.array.capacity + 1;
		local	offset = -1;
		local	prev_index;

		current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		while (not current or current[1] ~= key) and offset <= self.array.size do
			offset = offset + 1;
			current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		end
		prev_index = (hashed_key + offset) % self.array.capacity + 1;
		if offset > self.array.size then
			return	(nil);
		end
		offset = offset + 1;
		current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		while
			not current
			and ((hashed_key + offset) % self.array.capacity + 1) > prev_index
		do
			offset = offset + 1;
			current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
		end
		if ((hashed_key + offset) % self.array.capacity + 1) < prev_index then
			return	(nil);
		else
			return	current[1], current[2];
		end
	end
end

We can now write our own pairs iterator:

function	HashTable:pairs()
	return	self.next, self, nil;
end

Because HashTable:next() is syntactic sugar for HashTable.next(HashTable), we can use self.next instead of calling it as a method.

Let’s test our pairs:

local	new_table = HashTable:new();

new_table:set("abc", "this is the string abc");
new_table:set(true, "this is true");
new_table:set(false, "this is false");
new_table:set(HashTable, "hash table");
new_table:set(print, "this is print");
for k, v in new_table:pairs() do
	print(k, v)
end
-- Output:
--	false	this is false
--	true	this is true
--	table: 0x6fb520	hash table
--	abc	this is the string abc
--	function: 0x41b010	this is print

We can now iterate over our table! However, the methods we are using to do this are inefficient. Lua has a bunch of optimisations that make tables a lot faster. Array-like tables are optimised to work as actual arrays, they rehash and use chained scatter instead of (regular) open addressing.
We will go over that in the second part of this table tutorial, which I haven’t finished yet. (lol)

19 Likes

Mike Pall is the creator of LuaJIT, an extremely fast Lua compiler, and his recommendation is here:

could I translate this post and post it on another site?
if you ok,
I’ll emphasize that I translated it and the original link…!