It might be your benchmark then.
I’ll benchmark once BoatBomber fixes his Benchmarker plugin — yea after 7 major releases it’s still buggy.
Hi, changing task.spawn
to coroutine.resume
in :Fire()
fixed the issue
I don’t know why but it seems like the coroutine library is much faster at resuming/closing threads, atleast in this case
(Note that my way of benchmarking fluctuates wildly, benchmarker would be the best way to measure this)
(Also if you don’t mind can you make my credit in the post bigger :3 )
I do know that it’s faster, but I didn’t know the difference was this big.
The thing is, with coroutine.resume
you don’t get proper error outputs.
Since coroutine.resume() returns a boolean indicating success and a message (similar to pcall), you can just detect if that’s false and error manually :V
Let me know if you succeed adding that with better performance than with task.spawn
.
Version 2.9.0
Changes & fixes:
- Switched back to the original thread storage logic because
Once
andWait
connections resulted in a memory leak. - Removed a few unnecessary
table.freeze
calls. - Removed unnecessary comma.
- Minor function description change.
- Minor comment changes and additions.
This is far from being “extremely fast” considering it uses a doubly linked list. Your iteration speed when calling the signal will be very slow. If you want to solve the problem, use an array. If you want to maintain O(1) insertions and deletions, use a swapback array. And if you want an optimal solution, don’t use any metatables.
For ordered connections, swapback wouldn’t work since the order will be changed upon removing any connection from the array.
That’s true. It depends on what you prioritize. He’s not wrong with the fact that swapback arrays will be a lot faster for firing signals.
Yep. But if you have few connections then it will still be faster to shift each element down.
Version 3.0.0
Changes & fixes:
- Switched from a linked list to a swapback array.
- Switched back to previous thread storage logic. The memory leak is now fixed.
- No longer uses
task.spawn
to resume callback threads, and instead has custom error handling for non-yielding threads. - No longer uses metatables, and instead stores functions that are inserted in every object.
- Comment changes and additions.
why did you decide to remove the metatables?
(Note that i used Optimize 2)
Yeah it is kinda weird and benchmarking signal modules is almost impossible; Each of them giving different results every time.Hower the thing to note is that BindableEvent is literally bad in every benchmark
Update made creation of signal COSTLY but firing became a bit better as a return.
BenchmarkModules.rbxm (9.3 KB)
I heard it was better. But I guess not.
Metatables IS bad bro.
They eat up perfomance a lot; Plus look at benchmarks.
Metatables also 100% refuse to properly work with --!native
Ah but creation is costly without metatables.
Also the performance gains in the benchmark are from the other optimizations I did in v3.0.0, not metatable removal.
while yes it takes more memory but firing i think is a better priority hence its not like you create 10k signals every second while firing actual signals is pretty common.
Pointer OOP wins Metatable by a bit in perfomance but is losing a little memory wise.
Metatables are virtually what yield the most slow paths on the LVM, iirc.
So I took the time to check a bit using Luau Bytecode Explorer to explore bytecode. Please note I’m in no way super experienced with interpreters or the Luau interpreter at all; I just know some C++, luau and can read comments, documentation and the code.
Full explanation as to the 'In short' to prevent flooding the thread...
We can see that indexes into a metatable using O2 (max opts) with the following example:
local sig = {}
sig.__index = sig
function sig.new(name)
return setmetatable({ Name = name }, sig)
end
function sig.hello(self)
print("hi!", self.Name)
end
sig.new("Nameless"):hello()
compiles to the following bytecode
The function body that matters here is that of anon_1
, which will be sig.hello
. In it we have GETIMPORT
, which is basically getglobal, but optimised, LOADK
to load a constant from the list of constants (in this case hi!
), and then we have a GETTABLEKS
, which gets an index from a table. GETTABLEKS
has many paths that you can find yourself by reading the luau_execute
function.
The specific case the LVM runs is the LOP_GETTABLEKS
→ luau/VM/src/lvmexecute.cpp at d110c812bb754d47ecc37b87db0d4e20a12aacc9 · luau-lang/luau · GitHub
This is C++ code so yeah, anyway, the main point and easy to read is that the devs are really cool and they added simple comments // fast-path: ...
and // slow-path: ...
. From the code itself we get already sufficient information to say metatables are slower, perhaps not super slow, but certainly slower than just keeping the object in an array.
// fast-path: built-in table
if (LUAU_LIKELY(ttistable(rb)))
{
Table* h = hvalue(rb);
int slot = LUAU_INSN_C(insn) & h->nodemask8;
LuaNode* n = &h->node[slot];
// fast-path: value is in expected slot
if (LUAU_LIKELY(ttisstring(gkey(n)) && tsvalue(gkey(n)) == tsvalue(kv) && !ttisnil(gval(n))))
{
setobj2s(L, ra, gval(n));
VM_NEXT();
}
else if (!h->metatable)
{
// fast-path: value is not in expected slot, but the table lookup doesn't involve metatable
const TValue* res = luaH_getstr(h, tsvalue(kv));
if (res != luaO_nilobject)
{
int cachedslot = gval2slot(h, res);
// save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
VM_PATCH_C(pc - 2, cachedslot);
}
setobj2s(L, ra, res);
VM_NEXT();
}
else
{
// slow-path, may invoke Lua calls via __index metamethod
L->cachedslot = slot;
VM_PROTECT(luaV_gettable(L, rb, kv, ra));
// save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
VM_PATCH_C(pc - 2, L->cachedslot);
VM_NEXT();
}
}
As you can see, the branch that performs metatable lookups is considered the slow branch; after all the metafield has to be processed. quite honestly I’m not so sure about saving closures raw on a table having severe impact, more when they’re very likely going to be optimized with the DUPCLOSURE
instruction, which is seen when the following example is compiled to bytecode (O2)
local sig = {}
function sig.new(name)
local nNew = {}
nNew.hello = function(self)
print("hi!", self.Name)
end
nNew.new = nil
nNew.Name = name
return nNew
end
sig.new("Nameless"):hello()
The instructions highlighted those responsible for the behaviour of
nNew.hello = function(self) ...
, and we can see that it doesn’t really create a new closure but duplicates it.
It is also worth noting that G2 (Debug 2) doesn’t do anything other than just add debugging information. It doesn’t make the code emit different instructions!
In short: It is unlikely to change anything, but overall it should be faster when you begin indexing it like crazy repeated times because then it can just use the LVM fast-paths repeatedly and perhaps benefit slightly, but overall, you should consider if you rather have 8 bytes worth of an address in memory or be damned with indexing slower than one would otherwise, or you can hold an array of callbacks, which you then simply use a buffer to hold the index to (4 bytes will probably do fine), and then you are guaranteed to probably be fast maybe.
It’ll be fixed in the next update.