Lua 5.1 Internals: Tables I
I will try to explain how tables work internally within Lua 5.1.
We will, in this tutorial, write our own basic implementation of Lua’s tables.
We will however go into more things than only tables. This tutorial will cover (in order):
- The static array
- Implementing a dynamic array
- Implementing hash tables with only numbers
- Implementing hash tables with all types
- The
next
function
Take note that this code is intended for vanilla Lua, and will only support part of Roblox Lua.
For writing the hashing functions I am going to assume there is a bit32 library. This library is not included by default in Lua 5.1 and is only included in this tutorial because explaining and writing bitwise operations in pure Lua is outside the scope of this tutorial. I will explain the optimisations used by Lua in another post, as this tutorial is only ment to explain how hash tables work.
A quick note.
This explanation is ment for people who (only) know Lua. I will also assume that you know (the Roblox-relevant parts of) Part I and II of Programming in Lua 5.0.
If you’re experienced with C, I suggest you search for Mike Pall’s recommended reading order on the Lua 5.1 source.
Some ideas might seem obvious to you if you have a background in a lower-level language, but for the people who are only familiar with Lua, I will try and explain this in Lua as much as possible.
For the example code I am going with readability and easy to explain code over efficient and optimized code. The examples don’t handle edgecases at all. (We’re working in Lua, it’s not like you’re going to write good code.) Also, don’t take my word on anything I say here :-).
The static array
For people only familiar with Lua and other high-level languages, arrays are (in general) dynamic.
However, this is all handled behind the scenes. If you add values to arrays, Lua will automatically resize the arrays. In the language Lua was written in C, arrays will not automatically resize.
For the sake of this tutorial I will use a fake class called Array. You can find the source here. The documentation for this class is roughly:
Array Array.new(int size)
:
Return an array of size “size”
Array[int n]
:
Return a value within this array at index “n”.
If n
is not within the range of this array, the program will crash.
DO NOT INDEX OUTSIDE THE RANGE OF THIS ARRAY.
Array[int n] = value
:
Set a value within this array at index “n”.
If n
is not within the range of this array, the program will crash.
DO NOT INDEX OUTSIDE THE RANGE OF THIS ARRAY.
void Array:destroy()
:
Remove the array from memory.
This array does NOT have any other features like the length operator (#) or insertion/get methods.
For people familiar with anything else than Lua, it might seem weird that the index does not start at 0.
But to keep arrays similar to Lua’s “arrays”, the index will start at 1.
We can roughly use the array as follows:
local my_favorite_numbers = Array.new(5);
my_favorite_numbers[1] = 0;
my_favorite_numbers[2] = 1;
my_favorite_numbers[3] = 3;
my_favorite_numbers[4] = 3;
my_favorite_numbers[5] = 7;
for i = 1, 5 do
print(my_favorite_number[i])
end
-- output;
-- 0
-- 1
-- 3
-- 3
-- 7
The moment we try to index anything outside of the range of this array (1 to 5 incl.) our program will error. If we have not defined anything at the index, it will return nil.
local numbers = Array.new(5);
print(numbers[5]); -- nil
print(numbers[6]); -- this would error because our Array only has 5 values.
Implementing a dynamic array
If we have a program which requires an unknown size of values within our array (let’s say, reading lines from a script), we would run into an issue with this Array class. We can however use this array class to create an automatically resizing array. All we need to implement this array is two new variables: a size and a capacity. The size keeps track of how many values we’re currently storing within the array and the capacity will keep track of how many values we can still store in the array.
First, we set up our prototype:
local DynArray = {};
function DynArray:new()
local new =
{
size = 0; -- How many things are in our array? (0 at the start)
capacity = 4; -- How many things can be in our array? (4 at the start)
array = Array.new(4); -- Our static array, hidden from the user.
};
setmetatable(new, self);
self.__index = self;
return (new);
end
Then we add basic get-set functionality:
function DynArray:set(index, value)
self.array[index] = value;
end
function DynArray:get(index)
return (self.array[index]);
end
Now we need a function that can add something to our array:
function DynArray:insert(value)
self.size = self.size + 1;
self.array[self.size] = value;
end
Take note that we’re still responsible for counting how many values are in our array. Array
does not give us anything to help us do this, we have to implement all logic ourselves.
When we’re out of space in the array, we have to create more space for the array.
You can calculate the amount of space left in the array by subtracting the size from the capacity: self.capacity - self.size
.
We want to check when there’s exactly 0 spaces left in our array before we resize.
We can do that with (self.capacity - self.size) == 0
. We can rewrite it as self.size == self.capacity
, because when the difference between two things is zero, they are the same.
Now we can write our array resize method and add it to our insertion method:
function DynArray:extend()
local new_array;
self.capacity = self.capacity * 2; -- Double the capacity
new_array = Array.new(self.capacity); -- Create a new array with that capacity
for i = 1, self.size do
new_array[i] = self.array[i]; -- Copy our old array into the new array
end
self.array:destroy(); -- Get rid of our old array
self.array = new_array; -- Replace the old array with the new one
end
function DynArray:insert(value)
if self.size == self.capacity then -- If there is no more space
self:extend(); -- Double our capacity
end
self.size = self.size + 1; -- Keep track of our value count
self.array[self.size] = value; -- Add our value to the array
end
What we’re essentially doing is keeping track of how many things are in our array, and when there are no more spaces left within our array, we create a bigger array. This is how dynamically sized arrays work.
Implementing a hash table
A hash function is any function that can be used to map data of arbritary size onto data of a fixed size.
This means that a hash function will take data of any size (a string, a number) and output a fixed-size thing (let’s say a 32-bit number). For now we’ll use a hypothetical hash function that takes any Lua value, and outputs a number. In Roblox documentation terms, it would look like:
int hash(Variant key);
We will get back to this later. For now let’s get into the basic idea behind a hash table.
A hash table is simply an array in which the key is turned into a number, which is the index on this table. Because the result of a hash number is generally bigger than the capacity of the array, you have to take the hashed key and modulo it to the capacity of the array, to keep the number under (or equal to) the capacity of the array.
Knowing hash tables are just dynamic arrays where the hash of the key is the index, we can easily write a hash table. First we set up the prototype:
local HashTable = {};
function HashTable:new()
local new =
{
size = 0;
capacity = 4;
array = Array.new(4);
};
setmetatable(new, self);
self.__index = self;
return (new);
end
You can see this is exactly the same as DynArray:new()
.
We can change our hash table to use DynArray internally instead, as they essentially have the same basic structure.
local HashTable = {};
function HashTable:new()
local new =
{
array = DynArray:new();
}
setmetatable(new, self);
self.__index = self;
return (new);
end
Now we have to wrap our dynamic array methods with keys.
function HashTable:set(key, value)
local hashed_key = hash(key) % self.array.capacity + 1; -- Fit the key within our capacity
if self.array.size == self.array.capacity then
self.array:extend();
end
if not self.array:get(hashed_key) then -- Is it a new key?
self.array.size = self.array.size + 1; -- Add one to our key counter.
end
self.array:set(hashed_key, value);
end
function HashTable:get(key)
return (self.array:get(hash(key) % self.array.capacity) + 1);
end
Because our hash function just has to turn a key into a number, we can just return the actual (floored) number for now.
function hash(key)
if type(key) == "number" then
return (math.floor(key));
else
-- magic
end
end
Now let’s try out our hash table:
local new_table = HashTable:new();
new_table:insert(3, "Hi!");
print(new_table:get(3)); -- outputs Hi!
new_table:insert(4, "Hello, world!");
print(new_table:get(4)); -- outputs Hello, world!
new_table:insert(8, "Goodbye, world!");
print(new_table:get(8)); -- outputs Goodbye, world!
print(new_table:get(4)); -- outputs Goodbye, world! - this is bad!
This is not what we want. 4 and 8 both refer to "Goodbye, world!"
, even though we’ve set 4 to "Hello, world!"
. This is called a collision. There are many different ways of dealing with collision, but for now we’re just going to use a simple method that roughly does the same as what Lua does. You simply check if there’s already a value on the key. If there is, use the next empty slot.
We will however run into another issue; we do not know which (moved) key refers to what value. We can solve this by storing our key and value as a pair.
We can now add pairs to our get and set functions:
function HashTable:set(key, value)
local hashed_key = hash(key) % self.array.capacity + 1;
local pair = Array.new(2);
local i = 1;
pair[1] = key;
pair[2] = value;
if self.array.size == self.array.capacity then
self.array:extend()
end
if not self.array:get(hashed_key) then -- Is this a new key?
self.array:set(hashed_key, pair);
else
while -- Find a new slot:
self.array:get( -- Check the next slot:
(hashed_key + i) -- Calculate an offset slot
% self.array.capacity + 1 -- Modulo it to the capacity to keep it within range.
)
do
i = i + 1; -- Add one to our offset.
end
self.array:set((hashed_key + i) % self.array.capacity + 1, pair);
end
self.array.size = self.array.size + 1;
end
function HashTable:get(key)
local hashed_key = hash(key) % self.array.capacity + 1;
local pair = self.array:get(hashed_key); -- Check the pair on our main position
local offset = 1;
local current;
if pair and pair[1] == key then -- Do the keys match up?
return (pair[2]); -- Give back the value.
else -- Otherwise:
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
while (not current or current[1] ~= key) and offset <= self.array.size do
-- Skip until we find our key or have looked over every slot.
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
end
if offset > self.array.size then
return (nil);
else
return (self.array:get((hashed_key + offset) % self.array.capacity + 1)[2]);
end
end
end
We can now retry our code:
local new_table = HashTable:new();
new_table:insert(3, "Hi!");
print(new_table:get(3)); -- outputs Hi!
new_table:insert(4, "Hello, world!");
print(new_table:get(4)); -- outputs Hello, world!
new_table:insert(8, "Goodbye, world!");
print(new_table:get(8)); -- outputs Goodbye, world!
print(new_table:get(4)); -- outputs Hello, world!
Great! This time our code is working as intended. We now have a working hash table!
However, we can only use numbers as keys. Remember our hashing function int hash(Variant key)
? We should extend it to use all types. Take note we’re not going to hash nil, because Lua tables don’t take nil keys. I am not going to use the actual Lua implementation of hashnum as it’s pretty awful to do in pure Lua.
Implementing hash tables with all types
For booleans, we just return a 0 or a 1.
function hashboolean(key)
-- We turn bools into 0 or 1s.
return (key and 1 or 0);
end
For tables, threads, userdata or functions, we use their pointer. We can get the pointer by capturing it from a tostring of the key:
function hashpointer(key)
-- To get the pointer of a table, thread, userdata or function,
-- you can call tostring on it:
-- tostring({}) -> table: 0x1234567890ABCDEF
-- we can capture the pointer by string matching, and turn it into decimal
-- by using tonumber(n, 16)
return (tonumber(tostring(key):match(": 0x(%x+)"), 16));
end
For numbers we take their individual bytes and add them together:
function hashnum(key)
-- Lua's hashnum will take the byte representation of the number and
-- add the bytes.
local a = 0;
key = math.floor(key); -- bit32 doesn't take floats. Lua does not do this!
for i = 9, 32, 8 do
a = a + bit32.band(key, 2 ^ i - 1);
end
return (a);
end
For strings we set their length as the initial hash and do some bit-shift magic for each character:
function hashstr(key)
local length = #key;
local hash = length; -- seed the hash with the length
local step = bit32.rshift(hash, 5) + 1; -- if the string is too long, don't hash everything
local length1 = length;
while length1 >= step do -- compute the hash
hash = bit32.bxor(
hash,
bit32.lshift(hash, 5)
+ bit32.rshift(hash, 2)
+ key:sub(length1, length1):byte()
); -- h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1-1]));
length1 = length1 - step;
end
return (hash);
end
Now we can combine them back into our hash function:
function hash(key)
if type(key) == "boolean" then
return (hashboolean(key));
elseif type(key) == "number" then
return (hashnum(key));
elseif type(key) == "string" then
return (hashstr(key));
else
return (hashpointer(key));
end
end
We can now test our hash table with other types of keys:
local new_table = HashTable:new();
new_table:set("abc", "this is the string abc");
new_table:set(true, "this is true");
new_table:set(false, "this is false");
new_table:set(HashTable, "hash table");
new_table:set(print, "this is print");
print(new_table:get("abc"));
print(new_table:get(true));
print(new_table:get(false));
print(new_table:get(HashTable));
print(new_table:get(print));
print(new_table:get("oof"));
-- Output:
-- this is the string abc
-- this is true
-- this is false
-- hash table
-- this is print
-- nil
The next
function
If we want to be able to iterate over our hash table we need to make an iterator. The standard Lua table iterator is pairs
. We can implement pairs like this:
-- Lua 5.0 PiL 7.3
function pairs(t)
return next, t, nil;
end
You can see that it’s essentially a wrapper for next
.
next
takes a key, and will return the next key-value pair after that key. If there’s no key given to next
, it will return the first key-value pair. If there are no more pairs, next
returns nil.
Knowing this, we can try and implement our own next
.
First we handle the next(t, nil)
call. next(t, nil)
should return the first pair of the table.
function HashTable:next(key)
if key == nil then -- If our second argument is nil,
for i = 1, self.array.capacity do -- Go through the entire array
local pair = self.array:get(i); -- Test each index
if pair then -- If we find a pair
return pair[1], pair[2] -- Return the unpacked pair.
end
end
else
-- magic
end
end
Now we need to allow passing keys. We can do this by skipping to the key given and then searching for the next key.
First, we skip to the key:
function HashTable:next(key)
if key == nil then
-- magic
else
local hashed_key = hash(key) % self.array.capacity + 1;
local offset = -1;
local prev_index;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
while (not current or current[1] ~= key) and offset <= self.array.size do
-- Skip until we find our original key
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
end
prev_index = (hashed_key + offset) % self.array.capacity + 1; -- Store the last index we found.
if offset > self.array.size then -- The key does not exist.
return (nil);
end
-- TO-DO: search for the next key
end
end
You can see that the key searching code is the same as HashTable:get()
. Now we search for the next key:
function HashTable:next(key)
if key == nil then
-- magic
else
local hashed_key = hash(key) % self.array.capacity + 1;
local offset = -1;
local prev_index;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
while (not current or current[1] ~= key) and offset <= self.array.size do
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
end
prev_index = (hashed_key + offset) % self.array.capacity + 1;
if offset > self.array.size then
return (nil);
end
offset = offset + 1; -- Skip our old key.
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
while
not current
and ((hashed_key + offset) % self.array.capacity + 1) > prev_index
do
-- Skip all empty keys, as long as they are after our original key.
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
end
if ((hashed_key + offset) % self.array.capacity + 1) < prev_index then
-- There are no more pairs.
return (nil);
else
return current[1], current[2]; -- We found another pair!
end
end
end
Our final next
code will be:
function HashTable:next(key)
if key == nil then
for i = 1, self.array.capacity do
local pair = self.array:get(i);
if pair then
return pair[1], pair[2]
end
end
else
local hashed_key = hash(key) % self.array.capacity + 1;
local offset = -1;
local prev_index;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
while (not current or current[1] ~= key) and offset <= self.array.size do
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
end
prev_index = (hashed_key + offset) % self.array.capacity + 1;
if offset > self.array.size then
return (nil);
end
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
while
not current
and ((hashed_key + offset) % self.array.capacity + 1) > prev_index
do
offset = offset + 1;
current = self.array:get((hashed_key + offset) % self.array.capacity + 1);
end
if ((hashed_key + offset) % self.array.capacity + 1) < prev_index then
return (nil);
else
return current[1], current[2];
end
end
end
We can now write our own pairs iterator:
function HashTable:pairs()
return self.next, self, nil;
end
Because HashTable:next()
is syntactic sugar for HashTable.next(HashTable)
, we can use self.next
instead of calling it as a method.
Let’s test our pairs
:
local new_table = HashTable:new();
new_table:set("abc", "this is the string abc");
new_table:set(true, "this is true");
new_table:set(false, "this is false");
new_table:set(HashTable, "hash table");
new_table:set(print, "this is print");
for k, v in new_table:pairs() do
print(k, v)
end
-- Output:
-- false this is false
-- true this is true
-- table: 0x6fb520 hash table
-- abc this is the string abc
-- function: 0x41b010 this is print
We can now iterate over our table! However, the methods we are using to do this are inefficient. Lua has a bunch of optimisations that make tables a lot faster. Array-like tables are optimised to work as actual arrays, they rehash and use chained scatter instead of (regular) open addressing.
We will go over that in the second part of this table tutorial, which I haven’t finished yet. (lol)