How come:
local t = {nil,nil,true}
print(#t) --> 3
but
local t = {nil,nil}
t[3] = true
print(#t) --> 0
?
How come:
local t = {nil,nil,true}
print(#t) --> 3
but
local t = {nil,nil}
t[3] = true
print(#t) --> 0
?
I believe the length operator # counts only up to the first nil in the table, and assumes the table is a sequence.
Edit:
It seems to be undefined behavior, dependent on how the table was initially created.
https://www.lua.org/manual/5.3/manual.html#3.4.7
The length operator applied on a table returns a border in that table. A border in a table
t
is any natural number that satisfies the following condition:
(border == 0 or t[border] ~= nil) and t[border + 1] == nil
When
t
is not a sequence,#t
can return any of its borders. (The exact one depends on details of the internal representation of the table, which in turn can depend on how the table was populated and the memory addresses of its non-numeric keys.)
In the first case the length operator is returning the last border, which is 3. In the second case, the length operator is returning the first border, which is 0, despite you adding onto the table.
Neither tables are sequences according to the spec. The reason why it returns seemingly random borders like this is dependent on implementation magic - undefined behavior.
It seems like in the first case, that final non-nil element in the table takes precedence as the border to be used for the length operator. However, after testing this myself, when you add another element after that, the length operator will instead return zero - the first border. So Lua is doing something behind the scenes when you instantiate a table inline this way - caching the border?
Friend dug these out if youâre interested.
Implementation of how boundaries are calculated.
Where the above function gets called when you use the length operator.
Maybe it has to do with the initial setting of the table?
Maybe an empty table {nil,nil,nil} is treated like {}?
{nil, nil, nil}
is evaluated to {}
, as far as I know.
Setting list[thing] = nil
, with list[thing]
already being nil, does nothing. Itâs just a side affect of this behavior.
But
t = {nil,nil,nil}
IS taking up more data than
t = {}
correct?
Note @ninja900500 I recommend a reread since I changed a ton and I think its much more accurate now - I was kind of in a rush last night too since I had to go to bed
From the section about tables from this article, some speculation through testing lua and the c++ vector class, and some brief look through the source code:
N*2^0,N*2^1,N*2^2,...with the last element being N*2^i s.t. it is > 2^30+2
{}
(my previous assumption was definitely wrong). If there are array elements then R = size of array and N = R, else R = 1 and N = 0.So looking at your examples:
local t = {nil,nil,true}
print(#t) --> 3
R = 3 and N = 3 on initialization
Binary search for boundary on array part finds that key 3 satisfies
local t = {nil,nil}
t[3] = true
print(#t) --> 0
R = 2 and N = 2 on initialization
On insert of entry 3 you are calling a rehash because number of keys is 3
But if you notice if N were to = anything greater than 0 less than half the entries between 1 and N would be non nil and exist so N must = 0
Therefore inserting entry 3 is moved to the hash part
This part requires looking at the source code since I didnât fully cover it above but:
static int unbound_search (Table *t, unsigned int j) {
unsigned int i = j; /* i is zero or a present index */
j++;
/* find `i' and `j' such that i is present and j is not */
while (!ttisnil(luaH_getnum(t, j))) {
i = j;
j *= 2;
if (j > cast(unsigned int, MAX_INT)) { /* overflow? */
/* table was built with bad purposes: resort to linear search */
i = 1;
while (!ttisnil(luaH_getnum(t, i))) i++;
return i - 1;
}
}
/* now do a binary search between them */
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(luaH_getnum(t, m))) j = m;
else i = m;
}
return i;
}
/*
** Try to find a boundary in table `t'. A `boundary' is an integer index
** such that t[i] is non-nil and t[i+1] is nil (and 0 if t[1] is nil).
*/
int luaH_getn (Table *t) {
unsigned int j = t->sizearray;
if (j > 0 && ttisnil(&t->array[j - 1])) {
/* there is a boundary in the array part: (binary) search for it */
unsigned int i = 0;
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(&t->array[m - 1])) j = m;
else i = m;
}
return i;
}
/* else must find a boundary in hash part */
else if (t->node == dummynode) /* hash part is empty? */
return j; /* that is easy... */
else return unbound_search(t, j);
}
luaH_getn
is called to get the table size
j = N = 0
since j is not > 0 and hash part exists, the code branches to the 3rd section and returns what unbound_search returns
in unbound search:
i = 0
j = 1
since t[1] = nil you skip the first while loop
In the next while loop you wonât find a non nil value between 0 and 1 (i and j)
so you return 0 (what i is)
so in a nutshell dont have holes in your table lol
If you initialize a table like {nil, nil, nil, ...}
itâll always create a table with an array part of equal size of the amount of values you put there (even if theyâre nil, which can be useful for optimizing reallocations away). So {nil, nil, nil}
has an array size of 3, hash size 0. Because #
first checks the array part of a table for a boundary value (using binary search), before checking the hash part (using a walk), this causes different behavior here:
local t1 = {nil, nil, nil}
t1[3] = 1 -- key 3 is in the array part of the table
print(#t1) -- 3
local t2 = {}
t2[3] = 1 -- key 3 is in the hash part of the table
print(#t2) -- 0
Yea if you want accurate behavior of #table
and as much of the table stored in the array part (WAY FASTER) then I would avoid having holes (setting nil values in between - use false instead) but nils on the end of tables should be fine
also I would recommend re-reading my previous post since I changed a lot and I think it might actually be 100% correct now
Whats the difference between a hash table and an array? Hashtables be indexed directly with a key correct?
I thought the key cannot be an integer.
local t2 = {}
t2[3] = 1 -- key 3 is in the hash part of the table
print(#t2) -- 0
3 is an integer. I thought in this case, t2 would be an array.
The difference is the key youâre using is either a non-integer, or hasnât had its position allocated in the array portion of the table. Youâll access the hash table if either of the above is true.
So why is this
local t2 = {}
t2[3] = 1 -- key 3 is in the hash part of the table
print(#t2) -- 0
a hash table? 3 is an integer, right?
A new table has 1 (2^0) slot allocated for the array, therefore the second condition is met.
Would this:
local Table = {}
Table[1] = "a"
Table[2] = "b"
Table[3] = "c"
be the same as:
local Table = {"a","b","c"}
local Table = {nil,nil,nil}
Table[2] = "3"
print(#Table) --> 0
And what condition is being met here? This is a hashtable too, right?
The first piece of code is the same. I donât know the answer to the 2nd part because I never really depend on #tab in a weird situation like this.
table length is based on construction but when you assign values to a table the array part will be resized to use the least amount of data possible (throwing values that are part of the array into the hash part) when there are as many or more than half nil items to values between 1 and index X