This is incorrect, and actually the opposite on Roblox! Using an array as the backing for your object instead of a map of fields is not only wildly unreadable but also slower as well!
This is because the Roblox Luau VM pre-caches the index into the hashmap that will be needed for field accesses, so most field accesses also only need to effectively do a single array lookup, while a numeric array lookup is slower because it needs to check if the type of the index is a number or different type to tell whether to index the array part or not (vs a field lookup immediately knows that it should look in the hash part).
I was wondering if itâs possible to add :ConnectParallel properly? I tried, I think it worked, not sure⌠Trying to use unsafe functions gave me an error, but I honestly canât tell if itâs even in parallel.
This is because the Roblox Luau VM pre-caches the index into the hashmap that will be needed for field accesses,
This is nothing special actually. In both Luau and Lua, string hashes are computed and cached when interned, and due to this, a hash map lookup doesnât need to calculate the hash for the specific key.
because it needs to check if the type of the index is a number or different type to tell whether to index the array part or not
Strictly speaking of the implementation of tables in Lua, an array will be considered an dictionary by Lua if it has explicit keys like the ones down below:
local t = {[1] = 1, [2] = 2} -- Lua thinks this is an dictionary, whereas
-- in reality, its an array.
The type of index have to be checked in both array and hashmap lookups, not only for array lookups.
What I said is mostly relevant in Lua, however we donât know much about Luauâs internals. Itâs safe to say that the first point is relevant in both. However, the second and third may not be.
Hiya, could you elaborate on what you mean by this? Running this benchmark
local array = {}
local dict = {}
for i = 1, 1e5 do
table.insert(array, i)
dict['a' .. i] = i
end
return {
ParameterGenerator = function()
return
end,
Functions = {
array = function()
for _ = 1, 1e5 do
local _ = array[1e5]
end
end,
dict = function()
for _ = 1, 1e5 do
local _ = dict.a1
end
end
}
}
Your benchmark needs some work. Thereâs a few factors confounding the result:
Your example is contrived because of the large number of fields in the object. No object in peopleâs code actually has 10000 unique fields in it and this actually matters too because field access is optimized for a small number of fields, so youâre hitting the bad case of field access time when you have so many fields.
Youâre not caching the upvalue reference, and accessing that is going to be more than 50% of the total work the loop iterations are doing, so that will pollute the results with a lot of noise:
You need to do way more work in each iteration of the loop. Simply running the loop itself is taking a lot of the time right now, again causing issues for the measurement.
Hereâs my version of your benchmark which shows a dict based 4-element vector (4 because my connection class has 4 fields) being significantly faster than an array based one (with a transposed version of the access to show that sequential vs non-sequential access is not a relevant factor with this small struct size):
Though I should also note that at a very small object size and an operation as fast as field access, the details of how field access works are extremely sensitive, so you have to be pretty careful with how you benchmark.
The dict based signal wins over the array based signal in both cases just like it did with my benchmarking when I was initially writing it, so the result from the above benchmark carries over to practictical scenarios as well.
Okay, in detail, what Roblox does is slightly more than this. It actually figures out what most likely hash bucket to look in is ahead of time and stores that information directly in the VM instruction, it doesnât simply have the string hash pre-computed like normal Lua. Sorry for not very accurate explanation.
The fast-path stuff like expected-hash-bucket based indexing all happens right in the core VM loop before things even get down into to the core slow-path functions like luaH_get.
Okay, in detail, what Roblox does is slightly more than this. It actually figures out what most likely hash bucket to look in is ahead of time and stores that information directly in the VM instruction, it doesnât simply have the string hash pre-computed like normal Lua. Sorry for not very accurate explanation.
This is incorrect. In the case of Lua, it doesnât store any of the information in a VM instruction and nowhere in Luau is documented that the information is stored in a VM. The only thing different is that Luau inline caches tables for field lookups. Take a look at the header for string objects:
The hash bucket is literally the hash of the key, each hash bucket is a linked list. When luaH_get calls luaH_getStr (for a key which is a string), it already gets the precomputed hash of the key and iterates through a linked list, searching nodes and seeing if the key stored inside each node is a string and is equivalent to the key whose value you need to find.
The fast-path stuff like expected-hash-bucket based indexing all happens right in the core VM loop before things even get down into to the core slow-path functions like luaH_get .
This is incorrect, all of the actual work happens in these 2 functions which luaH_get calls one of them depending on the type of key. You are most likely assuming information out your own at this point.
Thatâs for vanilla Lua, all of what weâre talking about here is specific to the Roblox Luau VM, which is significantly diverged from the vanilla Lua one. It goes as far as having modified and additional opcodes in the bytecode format as well as having a completely rewritten core interpreter loop.
Yes, in vanilla Lua using an array is basically always faster than a struct with actual fields, but as you can see from the benchmark above thatâs not true for the Roblox Luau VM because of the things I mentioned.
That is correct, but you never cited any of your sources for saying that Luau stores that information inside a VM instruction (which seems redundant).
Yes, in vanilla Lua using an array is basically always faster than a struct with actual fields, but as you can see from the benchmark above thatâs not true for the Roblox Luau VM because of the things I mentioned.
Yup, I cited in terms of Lua. Luau does some voodo magic for table field lookups as well as inline caching.
Sorry for the lack of citations, thatâs me being a bit lazy. The info is spread buried deep inside many Luau Recap threads / release notes, so itâs pretty hard to dig up.
I thought it would be useful info regardless even with a citation, because many people still mucking up their code under the misconception that it will be faster if they use arrays where they should be using proper structs.
I see! so, to clarify - if I for instance have something of the like:
for _, Data in next, SomeTable do
local Math = Data[1] * Time + Data[2] / Data[3] -- etc
end
would it be better to convert the Data into a dictionary, then? I am iterating through ~20,000 elements and indexing at least 2-3 properties for each one, so would it be more worthwhile for me to make Data a dictionary instead here?
If what you care about is perf, then for a 3-5 element item fields should beat an array (IIRC for a 2 element item using an array still wins for some reason, somehow the bucket logic must not work out in that case)
If what you care most about above all else memory and you have specifically 3 or less numeric fields, then you should actually put the three values into a Vector3: Since Vector3s are now a native type, you will avoid doing any memory allocation of a table grouping them and the numeric data will just be stored directly in the main table. This also works for compressing an array of plain numbers by a factor of 3 at the expense of it being slower to access them.
Yes⌠but thereâs other reasons to not use coroutine.resume (if you want to re-use a coroutine you have to use a separate coroutine.resume, not coroutine.wrap): You canât fully correctly implement event:Wait() using coroutine.resume, and it also doesnât integrate as well with the output window and debugger in the case where the code in the coroutine encounters an error.
I am a bit confused on why you said we âcanât fully correctly implement event:Wait()â with coroutines. Could you elaborate on why this is the case?
Using coroutine.resume makes errors disappear.
And also has some other hard to explain issues which I havenât got to understanding yet. And I have no reason to, since now task.spawn is so much better for everything, that coroutine.resume is only used at really specific scenearios.
He didnât say you canât implement it with coroutines, the task library is one of the main ways to mess with coroutines, itâs just different and has less problems.
Just because it isnât inside the coroutine library, doesnât mean itâs not coroutine-related.
task.spawn(coroutine, ...) resumes a coroutine (better named thread), but has none of the issues that coroutine.resume has.
I have been using your GoodSignal class for a while now and itâs been a breeze to use. However, it seems to keep a strong reference to variables passed as parameters after firing and is only kept if the signal is connected. Hereâs some code that displays this behavior.
local ref = setmetatable({},{__mode="v"})
local Signal = require(script.Parent.GoodSignal)
local Object = Instance.new("Part")
table.insert(ref,Object)
local Event = Signal.new()
Event:Connect(function()end)
Event:Fire(Object)
Object = nil
print(ref[1])
wait(2)
print(ref[1])
Output:
To compare, I modified the code slightly to make use of SimpleSignal instead and the Part was able to be GCâed as expected.
local ref = setmetatable({},{__mode="v"})
local Signal = require(script.Parent.SimpleSignal)
local Object = Instance.new("Part")
table.insert(ref,Object)
local Event = Signal.new()
Event:Connect(function()end)
Event:Fire(Object)
Object = nil
print(ref)
wait(2)
print(ref)
Output:
Is this behavior related to how the coroutines that run the handlers are recycled? Is this intended?
Oh boy, that is one subtle bug, nice catch! It is indeed unintional and hereâs why it happens:
-- Coroutine runner that we create coroutines of. The coroutine can be
-- repeatedly resumed with functions to run followed by the argument to run
-- them with.
local function runEventHandlerInFreeThread(...)
acquireRunnerThreadAndCallEventHandler(...)
while true do
acquireRunnerThreadAndCallEventHandler(coroutine.yield())
end
end
Can you spot the bug? Hint: Itâs in the ....
If none of your handlers ever actually yields, it will keep reusing the same runEventHandlerInFreeThread thread⌠and the ... argument of that thread remains on the stack even through the subsequent handler calls, so anything in the ... canât be garbage collected until one of the handlers yields, requiring the creation of a new runEventHandlerInFreeThread thread, and resulting in the eventual garbage collection of the old thread.
So basically, specifically the first set of arguments you pass to a :Fire call in your game after a handler yields will be held onto until the next time that any handler yields.
Iâm not sure what to do about this. Itâs a pretty obscure edge case I donât think will ever actually come up in practice and even when it does it would only leak a small fixed amount of memory. Since solving it will incur a slight performance penalty Iâm hesitant to do it with there being so little chance of it mattering. If you want to solve it, you can change runEventHandlerInFreeThread to this:
local function runEventHandlerInFreeThread()
while true do
acquireRunnerThreadAndCallEventHandler(coroutine.yield())
end
end
And the creation of the freeRunnerThread to this:
freeRunnerThread = coroutine.create(runEventHandlerInFreeThread)
coroutine.resume(freeRunnerThread) --> Add this
To basically âclearâ the initial step of the coroutine and only invoke handlers via the yield.
Kinda unfortunate that we canât load LVM bytecode⌠if we could I would be able to solve this via LVM assembly at no perf cost.
Iâve actually fixed this on my Lua implementation, and I think itâs worth it to change the behaviour for this on GoodSignal.
It only comes at a really, really small cost that is even more insignificant given it only costs more when creating a new thread, which again only happens when a handler errors or yields, which if youâre yielding then you probably donât care about the performance cost anyhow.
This little aspect might cause confusion for users, even if itâs a rare edge case, they could be losing a lot of time trying to debug such a hidden issue.
Edit: Made a pull request to make your life easier :)
Also, the GoodSignal links here are not pointing to the repo and are still pointing to the gist. Would you mind fixing that?