Signal+｜Insanely optimized script signal

AlexanderLindholt · April 16, 2025, 12:45pm

It might be your benchmark then.
I’ll benchmark once BoatBomber fixes his Benchmarker plugin — yea after 7 major releases it’s still buggy.

athar_adv · April 16, 2025, 12:49pm

Hi, changing task.spawn to coroutine.resume in :Fire() fixed the issue

I don’t know why but it seems like the coroutine library is much faster at resuming/closing threads, atleast in this case

(Note that my way of benchmarking fluctuates wildly, benchmarker would be the best way to measure this)

(Also if you don’t mind can you make my credit in the post bigger :3 )

AlexanderLindholt · April 16, 2025, 1:01pm

I do know that it’s faster, but I didn’t know the difference was this big.
The thing is, with coroutine.resume you don’t get proper error outputs.

athar_adv · April 16, 2025, 1:03pm

Since coroutine.resume() returns a boolean indicating success and a message (similar to pcall), you can just detect if that’s false and error manually :V

AlexanderLindholt · April 16, 2025, 1:04pm

Let me know if you succeed adding that with better performance than with task.spawn.

AlexanderLindholt · April 16, 2025, 8:12pm

Version 2.9.0

Changes & fixes:

Switched back to the original thread storage logic because Once and Wait connections resulted in a memory leak.
Removed a few unnecessary table.freeze calls.
Removed unnecessary comma.
Minor function description change.
Minor comment changes and additions.

sangolemango · April 17, 2025, 12:41pm

This is far from being “extremely fast” considering it uses a doubly linked list. Your iteration speed when calling the signal will be very slow. If you want to solve the problem, use an array. If you want to maintain O(1) insertions and deletions, use a swapback array. And if you want an optimal solution, don’t use any metatables.

anexpia · April 17, 2025, 4:14pm

For ordered connections, swapback wouldn’t work since the order will be changed upon removing any connection from the array.

AlexanderLindholt · April 17, 2025, 4:15pm

That’s true. It depends on what you prioritize. He’s not wrong with the fact that swapback arrays will be a lot faster for firing signals.

sangolemango · April 17, 2025, 5:03pm

Yep. But if you have few connections then it will still be faster to shift each element down.

AlexanderLindholt · April 17, 2025, 6:08pm

Version 3.0.0

Changes & fixes:

Switched from a linked list to a swapback array.
Switched back to previous thread storage logic. The memory leak is now fixed.
No longer uses task.spawn to resume callback threads, and instead has custom error handling for non-yielding threads.
No longer uses metatables, and instead stores functions that are inserted in every object.
Comment changes and additions.

bro_proxiomFun · April 17, 2025, 6:29pm

why did you decide to remove the metatables?

Yarik_superpro · April 17, 2025, 6:39pm

(Note that i used Optimize 2)
Yeah it is kinda weird and benchmarking signal modules is almost impossible; Each of them giving different results every time.Hower the thing to note is that BindableEvent is literally bad in every benchmark
Update made creation of signal COSTLY but firing became a bit better as a return.
BenchmarkModules.rbxm (9.3 KB)

AlexanderLindholt · April 17, 2025, 6:47pm

I heard it was better. But I guess not.

Yarik_superpro · April 17, 2025, 6:48pm

Metatables IS bad bro.
They eat up perfomance a lot; Plus look at benchmarks.
Metatables also 100% refuse to properly work with --!native

AlexanderLindholt · April 17, 2025, 6:54pm

Ah but creation is costly without metatables.

Also the performance gains in the benchmark are from the other optimizations I did in v3.0.0, not metatable removal.

Yarik_superpro · April 17, 2025, 6:58pm

while yes it takes more memory but firing i think is a better priority hence its not like you create 10k signals every second while firing actual signals is pretty common.
Pointer OOP wins Metatable by a bit in perfomance but is losing a little memory wise.

roalex2008 · April 17, 2025, 8:07pm

Metatables are virtually what yield the most slow paths on the LVM, iirc.

So I took the time to check a bit using Luau Bytecode Explorer to explore bytecode. Please note I’m in no way super experienced with interpreters or the Luau interpreter at all; I just know some C++, luau and can read comments, documentation and the code.

Full explanation as to the 'In short' to prevent flooding the thread...

We can see that indexes into a metatable using O2 (max opts) with the following example:

local sig = {}
sig.__index = sig

function sig.new(name)
    return setmetatable({ Name = name }, sig)
end

function sig.hello(self)
    print("hi!", self.Name)
end

sig.new("Nameless"):hello()

compiles to the following bytecode

The function body that matters here is that of anon_1, which will be sig.hello. In it we have GETIMPORT, which is basically getglobal, but optimised, LOADK to load a constant from the list of constants (in this case hi!), and then we have a GETTABLEKS, which gets an index from a table. GETTABLEKS has many paths that you can find yourself by reading the luau_execute function.

The specific case the LVM runs is the LOP_GETTABLEKS → luau/VM/src/lvmexecute.cpp at d110c812bb754d47ecc37b87db0d4e20a12aacc9 · luau-lang/luau · GitHub
This is C++ code so yeah, anyway, the main point and easy to read is that the devs are really cool and they added simple comments // fast-path: ... and // slow-path: .... From the code itself we get already sufficient information to say metatables are slower, perhaps not super slow, but certainly slower than just keeping the object in an array.

// fast-path: built-in table
if (LUAU_LIKELY(ttistable(rb)))
{
    Table* h = hvalue(rb);

    int slot = LUAU_INSN_C(insn) & h->nodemask8;
    LuaNode* n = &h->node[slot];

    // fast-path: value is in expected slot
    if (LUAU_LIKELY(ttisstring(gkey(n)) && tsvalue(gkey(n)) == tsvalue(kv) && !ttisnil(gval(n))))
    {
        setobj2s(L, ra, gval(n));
        VM_NEXT();
    }
    else if (!h->metatable)
    {
        // fast-path: value is not in expected slot, but the table lookup doesn't involve metatable
        const TValue* res = luaH_getstr(h, tsvalue(kv));

        if (res != luaO_nilobject)
        {
            int cachedslot = gval2slot(h, res);
            // save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
            VM_PATCH_C(pc - 2, cachedslot);
        }

        setobj2s(L, ra, res);
        VM_NEXT();
    }
    else
    {
        // slow-path, may invoke Lua calls via __index metamethod
        L->cachedslot = slot;
        VM_PROTECT(luaV_gettable(L, rb, kv, ra));
        // save cachedslot to accelerate future lookups; patches currently executing instruction since pc-2 rolls back two pc++
        VM_PATCH_C(pc - 2, L->cachedslot);
        VM_NEXT();
    }
}

As you can see, the branch that performs metatable lookups is considered the slow branch; after all the metafield has to be processed. quite honestly I’m not so sure about saving closures raw on a table having severe impact, more when they’re very likely going to be optimized with the DUPCLOSURE instruction, which is seen when the following example is compiled to bytecode (O2)

local sig = {}

function sig.new(name)
    local nNew = {}
    nNew.hello = function(self)
        print("hi!", self.Name)
    end
    nNew.new = nil
    nNew.Name = name 

    return nNew
end

sig.new("Nameless"):hello()

The instructions highlighted those responsible for the behaviour of nNew.hello = function(self) ..., and we can see that it doesn’t really create a new closure but duplicates it.

It is also worth noting that G2 (Debug 2) doesn’t do anything other than just add debugging information. It doesn’t make the code emit different instructions!

In short: It is unlikely to change anything, but overall it should be faster when you begin indexing it like crazy repeated times because then it can just use the LVM fast-paths repeatedly and perhaps benefit slightly, but overall, you should consider if you rather have 8 bytes worth of an address in memory or be damned with indexing slower than one would otherwise, or you can hold an array of callbacks, which you then simply use a buffer to hold the index to (4 bytes will probably do fine), and then you are guaranteed to probably be fast maybe.

bro_proxiomFun · April 17, 2025, 8:26pm

AlexanderLindholt · April 17, 2025, 8:30pm

It’ll be fixed in the next update.