Signal+｜Insanely optimized script signal

Yarik_superpro · April 17, 2025, 6:48pm

Metatables IS bad bro.
They eat up perfomance a lot; Plus look at benchmarks.
Metatables also 100% refuse to properly work with --!native

AlexanderLindholt · April 17, 2025, 6:54pm

Ah but creation is costly without metatables.

Also the performance gains in the benchmark are from the other optimizations I did in v3.0.0, not metatable removal.

Yarik_superpro · April 17, 2025, 6:58pm

while yes it takes more memory but firing i think is a better priority hence its not like you create 10k signals every second while firing actual signals is pretty common.
Pointer OOP wins Metatable by a bit in perfomance but is losing a little memory wise.

roalex2008 · April 17, 2025, 8:07pm

Metatables are virtually what yield the most slow paths on the LVM, iirc.

So I took the time to check a bit using Luau Bytecode Explorer to explore bytecode. Please note I’m in no way super experienced with interpreters or the Luau interpreter at all; I just know some C++, luau and can read comments, documentation and the code.

Full explanation as to the 'In short' to prevent flooding the thread...

We can see that indexes into a metatable using O2 (max opts) with the following example:

local sig = {}
sig.__index = sig

function sig.new(name)
    return setmetatable({ Name = name }, sig)
end

function sig.hello(self)
    print("hi!", self.Name)
end

sig.new("Nameless"):hello()

compiles to the following bytecode

The function body that matters here is that of anon_1, which will be sig.hello. In it we have GETIMPORT, which is basically getglobal, but optimised, LOADK to load a constant from the list of constants (in this case hi!), and then we have a GETTABLEKS, which gets an index from a table. GETTABLEKS has many paths that you can find yourself by reading the luau_execute function.

The specific case the LVM runs is the LOP_GETTABLEKS → luau/VM/src/lvmexecute.cpp at d110c812bb754d47ecc37b87db0d4e20a12aacc9 · luau-lang/luau · GitHub
This is C++ code so yeah, anyway, the main point and easy to read is that the devs are really cool and they added simple comments // fast-path: ... and // slow-path: .... From the code itself we get already sufficient information to say metatables are slower, perhaps not super slow, but certainly slower than just keeping the object in an array.

// fast-path: built-in table
if (LUAU_LIKELY(ttistable(rb)))
{
    Table* h = hvalue(rb);

    int slot = LUAU_INSN_C(insn) & h->nodemask8;
    LuaNode* n = &h->node[slot];

    // fast-path: value is in expected slot
    if (LUAU_LIKELY(ttisstring(gkey(n)) && tsvalue(gkey(n)) == tsvalue(kv) && !ttisnil(gval(n))))
    {
        setobj2s(L, ra, gval(n));
        VM_NEXT();
    }
    else if (!h->metatable)
    {
        // fast-path: value is not in expected slot, but the table lookup doesn't involve metatable
        const TValue* res = luaH_getstr(h, tsvalue(kv));

        if (res != luaO_nilobject)
        {
            int cachedslot = gval2slot(h, res);
            // save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
            VM_PATCH_C(pc - 2, cachedslot);
        }

        setobj2s(L, ra, res);
        VM_NEXT();
    }
    else
    {
        // slow-path, may invoke Lua calls via __index metamethod
        L->cachedslot = slot;
        VM_PROTECT(luaV_gettable(L, rb, kv, ra));
        // save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
        VM_PATCH_C(pc - 2, L->cachedslot);
        VM_NEXT();
    }
}

As you can see, the branch that performs metatable lookups is considered the slow branch; after all the metafield has to be processed. quite honestly I’m not so sure about saving closures raw on a table having severe impact, more when they’re very likely going to be optimized with the DUPCLOSURE instruction, which is seen when the following example is compiled to bytecode (O2)

local sig = {}

function sig.new(name)
    local nNew = {}
    nNew.hello = function(self)
        print("hi!", self.Name)
    end
    nNew.new = nil
    nNew.Name = name 

    return nNew
end

sig.new("Nameless"):hello()

The instructions highlighted those responsible for the behaviour of nNew.hello = function(self) ..., and we can see that it doesn’t really create a new closure but duplicates it.

It is also worth noting that G2 (Debug 2) doesn’t do anything other than just add debugging information. It doesn’t make the code emit different instructions!

In short: It is unlikely to change anything, but overall it should be faster when you begin indexing it like crazy repeated times because then it can just use the LVM fast-paths repeatedly and perhaps benefit slightly, but overall, you should consider if you rather have 8 bytes worth of an address in memory or be damned with indexing slower than one would otherwise, or you can hold an array of callbacks, which you then simply use a buffer to hold the index to (4 bytes will probably do fine), and then you are guaranteed to probably be fast maybe.

bro_proxiomFun · April 17, 2025, 8:26pm

AlexanderLindholt · April 17, 2025, 8:30pm

It’ll be fixed in the next update.

AlexanderLindholt · April 17, 2025, 9:10pm

Version 3.1.0

Changes & fixes:

Now uses metatables again.
Fixed disconnects.
It’s now safe to call Disconnect multiple times.
Once and Wait connections now use a variable that is already in the scope instead of indexing the connection for the signal.
Minor variable renamings.
Minor comment changes and additions.

Fraiixen · April 17, 2025, 9:41pm

Is Signal+ memory efficient? I assume since it is fast it probably trades memory over speed, which is not bad, But I would like to know if this trade-off is that big, since I am working on a project that requires the creation of loads of signals.

TheYusufGamer · April 17, 2025, 10:06pm

The logic for removing a connection from the signal table is duplicated within the callbackThread function used by Once and Wait, and also exists in the Connection:Disconnect() method

You can create a helper function to reduce redundancy, code readability, and if you want to change how it work,s it would be easier to do so

athar_adv · April 18, 2025, 3:50am

Hello, the way SimpleSignal works is quite similar to how Signal+ works now so I can answer this question

The answer is no, the tradeoff isn’t too big. You can have 1000 connections on 1 signal and barely experience any memory problems (atleast from my own benchmarks)

The signals themselves are only as big as their connections, so you really shouldn’t worry unless you’re connecting 10k+ times at once, at which point the memory usage starts getting a bit high, but still acceptable.

AlexanderLindholt · April 18, 2025, 9:23am

It was done on purpose. It improves speed.

AlexanderLindholt · April 18, 2025, 12:48pm

Version 3.2.0

Changes & fixes:

Now stores reusable threads in a global table again, enabling you to fire connections while their callbacks are still active.
Changed module description.
Minor comment changes and additions.

bro_proxiomFun · April 18, 2025, 1:26pm

Dude, I’ve been testing my module against yours. Before the update, your module was playing at a speed of 52us(3.0.0), and after the update to the new version (3.2.0) - 240us, what the…

AlexanderLindholt · April 18, 2025, 1:31pm

I believe you are talking about the Fire. Threads are created on fire not on connect. I believe you are only testing once, not over multiple iterations. The threads are saved, so it’s only the very first fire that is so slow — unless new threads are needed at some point.

It is especially good at connecting (both Connect, Once and Wait).

bro_proxiomFun · April 18, 2025, 1:34pm

when I write 10 connections and one Fire, the benchmark starts lagging, although your previous version copes with this in less than 1 ms.

Oh, what am I doing, why did I Disconnect signal…
im changed photos

AlexanderLindholt · April 18, 2025, 1:41pm

The new version does too. 1 millisecond is 1.000 microseconds

Also why do you have 2 connections for the new version? That’s not fair.

Also it’s not really a fair comparison since you’re preallocating for SignalX, right?
Please explain the 16ExtraThread, 8Pool Threads.

Here’s a proper benchmark:

I don’t know how to use your module lol.

bro_proxiomFun · April 18, 2025, 1:54pm

I was talking about 10 connections, but there’s only 2

My bad

Yes, you are completely right. I allocate 8 cores for my signal when creating it, and the rest are already being created with the remaining 16 connections (or, if you not yielding in function, only 8 may be created, because the last 8 from the pool will be released by this time)

Okay, actually, the only thing I wanted to do was compare the new signal and the old one. I’ll analyze your code, okay?

AlexanderLindholt · April 18, 2025, 1:56pm

Yea, I did that for you. As you can see, with connect (10x) and fire combined, the new version is over twice as fast as the old version.

bro_proxiomFun · April 18, 2025, 2:04pm

I want to ask you, which config should I ideally test your code with? It would be more honest if I check Your code according to Your rules. I will set exactly the same settings for my script.

AlexanderLindholt · April 18, 2025, 2:08pm

Just remember that code can be bad at some specific cases but excel in others.
Your benchmark always proves something, just make sure that what it proves really matters, and test different cases.