Explain This Table Behavior

nin900500 · August 25, 2018, 4:53am

How come:

local t = {nil,nil,true}
print(#t) --> 3

but

local t = {nil,nil}
t[3] = true
print(#t) --> 0

?

PeZsmistic · August 25, 2018, 5:05am

~~I believe the length operator # counts only up to the first nil in the table, and assumes the table is a sequence.~~

Edit:
It seems to be undefined behavior, dependent on how the table was initially created.

https://www.lua.org/manual/5.3/manual.html#3.4.7

The length operator applied on a table returns a border in that table. A border in a table t is any natural number that satisfies the following condition:
(border == 0 or t[border] ~= nil) and t[border + 1] == nil

When t is not a sequence, #t can return any of its borders. (The exact one depends on details of the internal representation of the table, which in turn can depend on how the table was populated and the memory addresses of its non-numeric keys.)

In the first case the length operator is returning the last border, which is 3. In the second case, the length operator is returning the first border, which is 0, despite you adding onto the table.
Neither tables are sequences according to the spec. The reason why it returns seemingly random borders like this is dependent on implementation magic - undefined behavior.

It seems like in the first case, that final non-nil element in the table takes precedence as the border to be used for the length operator. However, after testing this myself, when you add another element after that, the length operator will instead return zero - the first border. So Lua is doing something behind the scenes when you instantiate a table inline this way - caching the border?

Friend dug these out if you’re interested.
Implementation of how boundaries are calculated.
Where the above function gets called when you use the length operator.

nin900500 · August 25, 2018, 5:22am

Maybe it has to do with the initial setting of the table?

Maybe an empty table {nil,nil,nil} is treated like {}?

Kampfkarren · August 25, 2018, 5:27am

{nil, nil, nil} is evaluated to {}, as far as I know.

Setting list[thing] = nil, with list[thing] already being nil, does nothing. It’s just a side affect of this behavior.

nin900500 · August 25, 2018, 5:44am

But

t = {nil,nil,nil}

IS taking up more data than

t = {}

correct?

everythingvirtuous · August 25, 2018, 6:16am

Note @ninja900500 I recommend a reread since I changed a ton and I think its much more accurate now - I was kind of in a rush last night too since I had to go to bed

From the section about tables from this article, some speculation through testing lua and the c++ vector class, and some brief look through the source code:

We know tables are made of two parts: array (vector behavior) and hash map
Array part is of size N an goes from the keys 1 to N where N is chosen on a “rehash” to be ((largest power of two)*R (where R is initial size)) such that MORE than half the table entries between 1 and N are filled with (non nil values and exist)
Hash map covers everything else - the pdf mentions hash size is the smallest power of two that accommodates the remaining entries
Rehashes occur when the array part is filled all the way (nil values count as filling since they have values as can be demonstrated by the “dirty trick” force rehash in the pdf linked above to shrink the table (since as mentioned in the rehash definition, they table entries need to be non nil to count)
The # operator returns a ‘border’ that satisfies this (same thing as what @qqtt991 posted but linking 5.1 version (although more ugly) just to ensure its the same idea) and uses binary search like @Den_S mentioned for the array part (and I’m pretty sure the hash map part too according to this and ONLY falls back to linear search when the keys of the table of the following geometric series ALL have non nil values (so probably never?) N*2^0,N*2^1,N*2^2,...with the last element being N*2^i s.t. it is > 2^30+2
When you initialize a table, as @Den_S says it does not evaluate to {} (my previous assumption was definitely wrong). If there are array elements then R = size of array and N = R, else R = 1 and N = 0.

So looking at your examples:

local t = {nil,nil,true}
print(#t) --> 3

R = 3 and N = 3 on initialization
Binary search for boundary on array part finds that key 3 satisfies

local t = {nil,nil}
t[3] = true
print(#t) --> 0

R = 2 and N = 2 on initialization
On insert of entry 3 you are calling a rehash because number of keys is 3
But if you notice if N were to = anything greater than 0 less than half the entries between 1 and N would be non nil and exist so N must = 0
Therefore inserting entry 3 is moved to the hash part
This part requires looking at the source code since I didn’t fully cover it above but:

static int unbound_search (Table *t, unsigned int j) {
  unsigned int i = j;  /* i is zero or a present index */
  j++;
  /* find `i' and `j' such that i is present and j is not */
  while (!ttisnil(luaH_getnum(t, j))) {
    i = j;
    j *= 2;
    if (j > cast(unsigned int, MAX_INT)) {  /* overflow? */
      /* table was built with bad purposes: resort to linear search */
      i = 1;
      while (!ttisnil(luaH_getnum(t, i))) i++;
      return i - 1;
    }
  }
  /* now do a binary search between them */
  while (j - i > 1) {
    unsigned int m = (i+j)/2;
    if (ttisnil(luaH_getnum(t, m))) j = m;
    else i = m;
  }
  return i;
}


/*
** Try to find a boundary in table `t'. A `boundary' is an integer index
** such that t[i] is non-nil and t[i+1] is nil (and 0 if t[1] is nil).
*/
int luaH_getn (Table *t) {
  unsigned int j = t->sizearray;
  if (j > 0 && ttisnil(&t->array[j - 1])) {
    /* there is a boundary in the array part: (binary) search for it */
    unsigned int i = 0;
    while (j - i > 1) {
      unsigned int m = (i+j)/2;
      if (ttisnil(&t->array[m - 1])) j = m;
      else i = m;
    }
    return i;
  }
  /* else must find a boundary in hash part */
  else if (t->node == dummynode)  /* hash part is empty? */
    return j;  /* that is easy... */
  else return unbound_search(t, j);
}

luaH_getn is called to get the table size
j = N = 0
since j is not > 0 and hash part exists, the code branches to the 3rd section and returns what unbound_search returns
in unbound search:
i = 0
j = 1
since t[1] = nil you skip the first while loop
In the next while loop you won’t find a non nil value between 0 and 1 (i and j)
so you return 0 (what i is)

nin900500 · August 25, 2018, 6:33am

so in a nutshell dont have holes in your table lol

Den_S · August 25, 2018, 10:27am

If you initialize a table like {nil, nil, nil, ...} it’ll always create a table with an array part of equal size of the amount of values you put there (even if they’re nil, which can be useful for optimizing reallocations away). So {nil, nil, nil} has an array size of 3, hash size 0. Because # first checks the array part of a table for a boundary value (using binary search), before checking the hash part (using a walk), this causes different behavior here:

local t1 = {nil, nil, nil}
t1[3] = 1 -- key 3 is in the array part of the table
print(#t1) -- 3

local t2 = {}
t2[3] = 1 -- key 3 is in the hash part of the table
print(#t2) -- 0

everythingvirtuous · August 25, 2018, 5:22pm

Yea if you want accurate behavior of #table and as much of the table stored in the array part (WAY FASTER) then I would avoid having holes (setting nil values in between - use false instead) but nils on the end of tables should be fine

also I would recommend re-reading my previous post since I changed a lot and I think it might actually be 100% correct now

nin900500 · August 25, 2018, 8:45pm

Whats the difference between a hash table and an array? Hashtables be indexed directly with a key correct?

I thought the key cannot be an integer.

local t2 = {}
t2[3] = 1 -- key 3 is in the hash part of the table
print(#t2) -- 0

3 is an integer. I thought in this case, t2 would be an array.

Darkmist101 · August 25, 2018, 9:05pm

The difference is the key you’re using is either a non-integer, or hasn’t had its position allocated in the array portion of the table. You’ll access the hash table if either of the above is true.

nin900500 · August 25, 2018, 9:40pm

So why is this

local t2 = {}
t2[3] = 1 -- key 3 is in the hash part of the table
print(#t2) -- 0

a hash table? 3 is an integer, right?

Darkmist101 · August 25, 2018, 9:41pm

A new table has 1 (2^0) slot allocated for the array, therefore the second condition is met.

nin900500 · August 25, 2018, 10:46pm

Would this:

local Table = {}
Table[1] = "a"
Table[2] = "b"
Table[3] = "c"

be the same as:

local Table = {"a","b","c"}

nin900500 · August 25, 2018, 11:50pm

local Table = {nil,nil,nil}
Table[2] = "3"
print(#Table) --> 0

And what condition is being met here? This is a hashtable too, right?

Darkmist101 · August 26, 2018, 12:18am

The first piece of code is the same. I don’t know the answer to the 2nd part because I never really depend on #tab in a weird situation like this.

Wunder_Wulfe · September 4, 2018, 5:01pm

table length is based on construction but when you assign values to a table the array part will be resized to use the least amount of data possible (throwing values that are part of the array into the hash part) when there are as many or more than half nil items to values between 1 and index X