Signal+｜Insanely optimized script signal

anexpia · April 17, 2025, 4:14pm

For ordered connections, swapback wouldn’t work since the order will be changed upon removing any connection from the array.

AlexanderLindholt · April 17, 2025, 4:15pm

That’s true. It depends on what you prioritize. He’s not wrong with the fact that swapback arrays will be a lot faster for firing signals.

sangolemango · April 17, 2025, 5:03pm

Yep. But if you have few connections then it will still be faster to shift each element down.

AlexanderLindholt · April 17, 2025, 6:08pm

Version 3.0.0

Changes & fixes:

Switched from a linked list to a swapback array.
Switched back to previous thread storage logic. The memory leak is now fixed.
No longer uses task.spawn to resume callback threads, and instead has custom error handling for non-yielding threads.
No longer uses metatables, and instead stores functions that are inserted in every object.
Comment changes and additions.

bro_proxiomFun · April 17, 2025, 6:29pm

why did you decide to remove the metatables?

Yarik_superpro · April 17, 2025, 6:39pm

(Note that i used Optimize 2)
Yeah it is kinda weird and benchmarking signal modules is almost impossible; Each of them giving different results every time.Hower the thing to note is that BindableEvent is literally bad in every benchmark
Update made creation of signal COSTLY but firing became a bit better as a return.
BenchmarkModules.rbxm (9.3 KB)

AlexanderLindholt · April 17, 2025, 6:47pm

I heard it was better. But I guess not.

Yarik_superpro · April 17, 2025, 6:48pm

Metatables IS bad bro.
They eat up perfomance a lot; Plus look at benchmarks.
Metatables also 100% refuse to properly work with --!native

AlexanderLindholt · April 17, 2025, 6:54pm

Ah but creation is costly without metatables.

Also the performance gains in the benchmark are from the other optimizations I did in v3.0.0, not metatable removal.

Yarik_superpro · April 17, 2025, 6:58pm

while yes it takes more memory but firing i think is a better priority hence its not like you create 10k signals every second while firing actual signals is pretty common.
Pointer OOP wins Metatable by a bit in perfomance but is losing a little memory wise.

roalex2008 · April 17, 2025, 8:07pm

Metatables are virtually what yield the most slow paths on the LVM, iirc.

So I took the time to check a bit using Luau Bytecode Explorer to explore bytecode. Please note I’m in no way super experienced with interpreters or the Luau interpreter at all; I just know some C++, luau and can read comments, documentation and the code.

Full explanation as to the 'In short' to prevent flooding the thread...

We can see that indexes into a metatable using O2 (max opts) with the following example:

local sig = {}
sig.__index = sig

function sig.new(name)
    return setmetatable({ Name = name }, sig)
end

function sig.hello(self)
    print("hi!", self.Name)
end

sig.new("Nameless"):hello()

compiles to the following bytecode

The function body that matters here is that of anon_1, which will be sig.hello. In it we have GETIMPORT, which is basically getglobal, but optimised, LOADK to load a constant from the list of constants (in this case hi!), and then we have a GETTABLEKS, which gets an index from a table. GETTABLEKS has many paths that you can find yourself by reading the luau_execute function.

The specific case the LVM runs is the LOP_GETTABLEKS → luau/VM/src/lvmexecute.cpp at d110c812bb754d47ecc37b87db0d4e20a12aacc9 · luau-lang/luau · GitHub
This is C++ code so yeah, anyway, the main point and easy to read is that the devs are really cool and they added simple comments // fast-path: ... and // slow-path: .... From the code itself we get already sufficient information to say metatables are slower, perhaps not super slow, but certainly slower than just keeping the object in an array.

// fast-path: built-in table
if (LUAU_LIKELY(ttistable(rb)))
{
    Table* h = hvalue(rb);

    int slot = LUAU_INSN_C(insn) & h->nodemask8;
    LuaNode* n = &h->node[slot];

    // fast-path: value is in expected slot
    if (LUAU_LIKELY(ttisstring(gkey(n)) && tsvalue(gkey(n)) == tsvalue(kv) && !ttisnil(gval(n))))
    {
        setobj2s(L, ra, gval(n));
        VM_NEXT();
    }
    else if (!h->metatable)
    {
        // fast-path: value is not in expected slot, but the table lookup doesn't involve metatable
        const TValue* res = luaH_getstr(h, tsvalue(kv));

        if (res != luaO_nilobject)
        {
            int cachedslot = gval2slot(h, res);
            // save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
            VM_PATCH_C(pc - 2, cachedslot);
        }

        setobj2s(L, ra, res);
        VM_NEXT();
    }
    else
    {
        // slow-path, may invoke Lua calls via __index metamethod
        L->cachedslot = slot;
        VM_PROTECT(luaV_gettable(L, rb, kv, ra));
        // save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
        VM_PATCH_C(pc - 2, L->cachedslot);
        VM_NEXT();
    }
}

As you can see, the branch that performs metatable lookups is considered the slow branch; after all the metafield has to be processed. quite honestly I’m not so sure about saving closures raw on a table having severe impact, more when they’re very likely going to be optimized with the DUPCLOSURE instruction, which is seen when the following example is compiled to bytecode (O2)

local sig = {}

function sig.new(name)
    local nNew = {}
    nNew.hello = function(self)
        print("hi!", self.Name)
    end
    nNew.new = nil
    nNew.Name = name 

    return nNew
end

sig.new("Nameless"):hello()

The instructions highlighted those responsible for the behaviour of nNew.hello = function(self) ..., and we can see that it doesn’t really create a new closure but duplicates it.

It is also worth noting that G2 (Debug 2) doesn’t do anything other than just add debugging information. It doesn’t make the code emit different instructions!

In short: It is unlikely to change anything, but overall it should be faster when you begin indexing it like crazy repeated times because then it can just use the LVM fast-paths repeatedly and perhaps benefit slightly, but overall, you should consider if you rather have 8 bytes worth of an address in memory or be damned with indexing slower than one would otherwise, or you can hold an array of callbacks, which you then simply use a buffer to hold the index to (4 bytes will probably do fine), and then you are guaranteed to probably be fast maybe.

bro_proxiomFun · April 17, 2025, 8:26pm

AlexanderLindholt · April 17, 2025, 8:30pm

It’ll be fixed in the next update.

AlexanderLindholt · April 17, 2025, 9:10pm

Version 3.1.0

Changes & fixes:

Now uses metatables again.
Fixed disconnects.
It’s now safe to call Disconnect multiple times.
Once and Wait connections now use a variable that is already in the scope instead of indexing the connection for the signal.
Minor variable renamings.
Minor comment changes and additions.

Fraiixen · April 17, 2025, 9:41pm

Is Signal+ memory efficient? I assume since it is fast it probably trades memory over speed, which is not bad, But I would like to know if this trade-off is that big, since I am working on a project that requires the creation of loads of signals.

TheYusufGamer · April 17, 2025, 10:06pm

The logic for removing a connection from the signal table is duplicated within the callbackThread function used by Once and Wait, and also exists in the Connection:Disconnect() method

You can create a helper function to reduce redundancy, code readability, and if you want to change how it work,s it would be easier to do so

athar_adv · April 18, 2025, 3:50am

Hello, the way SimpleSignal works is quite similar to how Signal+ works now so I can answer this question

The answer is no, the tradeoff isn’t too big. You can have 1000 connections on 1 signal and barely experience any memory problems (atleast from my own benchmarks)

The signals themselves are only as big as their connections, so you really shouldn’t worry unless you’re connecting 10k+ times at once, at which point the memory usage starts getting a bit high, but still acceptable.

AlexanderLindholt · April 18, 2025, 9:23am

It was done on purpose. It improves speed.

AlexanderLindholt · April 18, 2025, 12:48pm

Version 3.2.0

Changes & fixes:

Now stores reusable threads in a global table again, enabling you to fire connections while their callbacks are still active.
Changed module description.
Minor comment changes and additions.

bro_proxiomFun · April 18, 2025, 1:26pm

Dude, I’ve been testing my module against yours. Before the update, your module was playing at a speed of 52us(3.0.0), and after the update to the new version (3.2.0) - 240us, what the…