Unpredictable behavior of the '#' operator on arrays with holes

Azurlified · June 27, 2023, 10:52am

Hello Roblox developers,

I recently stumbled upon an interesting aspect of Luau pertaining to how the # operator behaves when used on tables with non-sequential integer keys, and I’m hoping for some clarification from anyone who might have more insights into this.

Specifically, I’m trying to understand the logic behind the lengths returned by the # operator for tables where there’s a “hole” (a nil value) in the integer keys.

As an example, let’s consider two tables:

local tbl1 = {
    [1] = true,
    [2] = true,
    [4] = true,
}
print(#tbl1)  -- Prints 4

and

local tbl2 = {
    [1] = true,
    [3] = true,
}
print(#tbl2)  -- Prints 1

In both tables, the index before the last is nil. However, #tbl1 returns 4 while #tbl2 returns 1. I understand from Lua’s language reference manual that “the length of a table is any integer index n such that t[n] is not nil and t[n+1] is nil.” But this logic seems to be pretty inconsistent and unpredictable.

An additional case:

local tbl3 = {
    [1] = true,
    [2] = true,
    [3] = true,
    [5] = true,
    [6] = true,
}
print(#tbl3)  -- Prints 6

Additionally, I’ve noticed that the behavior of the # operator seems to change depending on whether a nil value is explicitly assigned to an index or if the index is simply omitted. Consider these examples:

Explicitly assigning nil:

local tbl = {
    [1] = true,
    [2] = true,
    [3] = true,
    [4] = nil,
    [5] = true,
}
print(#tbl) --prints 5

Omitting the index:

local tbl = {
    [1] = true,
    [2] = true,
    [3] = true,
    [5] = true,
}
print(#tbl) --prints 3

The lengths returned in these two cases are different, despite the fact that tbl[4] is functionally nil in both cases.

Further testing revealed a sequence of table lengths where Luau allows for the index before the last to not exist and still return the table’s “true” length, including the missing index:

1, 4, 7, 8, 11, 13, 15, 16, 19, 21, 23, 25, 27, 29, 31, 32, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 64, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99

Most numbers in the sequence increase by 2 or 3, except for jumps from 1 to 4, 7 to 8, 31 to 32, and 63 to 64.

If anyone has insight into how Luau determines the array “boundary” of a table and can shed some light on it that would be greatly appreciated. I understand that this behavior is implementation-dependent, but I’ve been going crazy trying to find some sort of explanation or reason behind this.

Thanks!

dthecoolest · June 27, 2023, 11:11am

This is intentional behavior not exclusive to luau and is present in lua. The situation is known as “mixed/sparse tables” sparse meaning the array has a gap in the index 1,2,3, nil, 5.

Ideally I recommend avoiding sparse arrays entirely using table.remove or fast remove for example.

Table fast remove (faster but messes with index order):
https://twitter.com/sleitnick/status/991065593746018308?lang=en-GB

Azurlified · June 27, 2023, 11:34am

I get that these are mixed tables, but that doesn’t really answer my question. I’m specifically interested in why the # operator behaves so unpredictably when used on this specific type of mixed table (single-index gap arrays). There must be some underlying algorithm that sets the array “boundaries” of these mixed tables based on various factors I’d assume (judging by its seemingly patternless behavior).

dthecoolest · June 27, 2023, 12:35pm

I see unfortunately im not knowledgeable enough on this issue. Perhaps someone with more knowledge on luavm, C++, or the length operator should help. Would be interesting to know.

I found this len function in lua source code perhaps its related as it has a index related function index2value in the name.

LUA_API void lua_len (lua_State *L, int idx) {
  TValue *t;
  lua_lock(L);
  t = index2value(L, idx);
  luaV_objlen(L, L->top.p, t);
  api_incr_top(L);
  lua_unlock(L);
}

Azur:

Explicitly assigning nil:

local tbl = {
    [1] = true,
    [2] = true,
    [3] = true,
    [4] = nil,
    [5] = true,
}
print(#tbl) --prints 5

Omitting the index:

local tbl = {
    [1] = true,
    [2] = true,
    [3] = true,
    [5] = true,
}
print(#tbl) --prints 3

Otherwise from this testing my guess is that it counts the user defined indexes numerically starting from 1 1,2,3,…,n regardless of the user defines value in the key.

legs_v · June 27, 2023, 12:56pm

When learning lua I’ve always been told to not trust the # operator in code since it never works as intended. I’m not a genius with how lua works with the engine but I believe its just the engine returning how much data the table is estimated to take up, I’m guessing to save performance?
I mean it clearly does not work as documented, I’ve seen it documented as a way to measure the amount of values in a table which is not correct.

Its annoying because you can’t even change this with metatables since the __len metamethod only works on userdata data types for some odd reason?? So you couldn’t even correct it if you wanted to.
(or they may of changed this I haven’t used studio in a while)

local t = {[1]=true, [2]=true, [4]=true}
print(#t) -- 4
setmetatable(t, {__len = function(self) local c=0 for _, in self do c+=1 end return c end})
print(#t) -- 4 (should be 3 but doesn't invoke __len cause lua sucks)

legs_v · June 27, 2023, 12:58pm

Nevermind they did add __len support so ignore half of the above message

nicemike40 · June 27, 2023, 1:40pm

Re-quoting some links:

the length of a table t is only defined if the table is a sequence, that is, the set of its positive numeric keys is equal to {1…n} for some integer n

—From the Lua reference manual.

When the last item in the array part is nil, the result of # is the length of the shortest valid sequence found by binsearching the array part for the first nil-followed key.

When the last item in the array part is not nil AND the hash part is empty, the result of # is the physical length of the array part.

When the last item in the array part is not nil AND the hash part is NOT empty, the result of # is the length of the shortest valid sequence found by binsearching the hash part for for the first nil-followed key (that is such positive integer i that t[i] ~= nil and t[i+1] == nil), assuming that the array part is full of non-nils(!).

So the result of # is almost always the (desired) length of the shortest valid sequence, unless the last element in the array part representing a non-sequence is non-nil. Then, the result is bigger than desired.

—From this stackoverflow answer.

The pattern you’re seeing is related to how the array part of a table allocates its underlying memory (doubling it every time it needs more).

Going by the posted c implementation, it first checks the last memory slot of the array for non-nil, which will always be at a power-of-two index (so 4, 8, 16, 32, 64). Otherwise it binary searches for the last element that’s non nil—the “odd lengths only” thing is a detail of their binary search algorithm and how they’re rounding the midpoint for even-length arrays.

Wigglyaa · June 27, 2023, 4:46pm

Yes it is weird:

local y = {
	[0] = 1,
	[1] = 2,
	[2] = 3
}

print("There are "y[0], y[1], y[2], "entries in the table y, but length from # operator is", #y) 
--There are 1 2 3 entries in the table y, but the length from # operator is 2.

local d = {
	[1] = 1,
	[2] = 2,
	[7] = 7,
	[8] = 8,
	[67] = 67,
	[108] = 108
}
print(#d) --2
for i, v in ipairs(d) do --Note ipairs
  print(i, v)  --prints d[1] and d[2]
end
for i, v in pairs(d) do --Note pairs
  print(i, v) --prints all entries (does not print nil values, even when manually assigned)
end

ipairs also breaks down (as it stops at a nil value in an array).
Probably best to avoid using array methods on sparse arrays, but treat them as dictionaries (in which case the #operator doesn’t work anyway).

Definitely something to bear in mind, that the # doesn’t actually count the entries in a table but looks for an index meeting a set of conditions which might reflect the contents of the table.