Just something I’ve wondered about for a while now.
Option A:
Table = {23, 18, 34}
function findThirtyFour()
for _, number in pairs(Table) do
if number == 34 then return true end
end
end
Option B:
Table = {[23] = true, [18] = true, [34] = true}
function findThirtyFour()
return Table[34]
end
From the way I understand it, Option B is cleaner to fetch, and likely more efficient CPU-wise.
But, it also stores an extra value. Would this be relevant, data wise?
Any mathy people out there that can give me the nitty gritty?
Personally, I prefer the first method, but the second method is better (both in saving server resources and well… looking cleaner), especially as more values are added to the table, the amount of extra data that storing an extra value is so negligible that I wouldn’t worry about storing a few extra.
The first way is fine for smaller sets of data. The table.find does the same thing as your for loop.
Internally you can think of that first one as {[1]=23, [2]=18, [3]=34}
It’s very fast to access the indices 1,2,3, and arrays don’t need any extra mem overhead to support referencing by index. To find that “34” though, it will need to check all three indices in order. When the size of your data grows, it may not be ideal to look through many thousands of indices to find the value you’re after. A faster option with arrays is to use the index similar to how the dict does it array = {true, false, false, false, true, false} ->array[5] == true. That can be fast and mem efficient if the data is very dense but wasteful if there are big gaps between numbers.
The dict representation {[23]=true, [18]=true, [34]=true} will be hashed, so it has a little bit of overhead in terms of memory and processing, but it will be able to jump straight to the 34 key. This is also mem efficient for sparse data since it doesn’t store anything for all the ‘holes’. It isn’t as efficient as the array is at accessing a given index, but it’s way more efficient than the array is at accessing ALL the indices up to the one it needs (if you don’t know the index).
Dicts are better when you have larger sets of sparse data (big dataset with lots of ‘holes’). Arrays can be better for dense, sequential data (like representing images, where every table entry is filled with something). If there aren’t a lot of values in your table, it arguably doesn’t make much difference; go with what you like to use.
If you care about ordering, i.e. how those are built into the table then maybe you want option A.
Option A is O(N) while B is O(1), so very quickly you’ll see more efficiency on the lookup with option B. There is for sure a space tradeoff but I think in most cases if that matters then the lookup time really matters so you’ll still want option B.