Narrowing the LuaBridge overhead

Welcome to hell, folks. This is a tutorial about how you can make your Roblox functions faster, somehow.
That’s right. If you are very concerned with performance like I do, look no further than possibly the most mind-boggling, confusing, and unnecessary tutorial you will ever lay your eyes on.


This tutorial is intended for people who like running headfirst into premature optimization and people who are very interested in how Roblox works under the hood.


How?

Paste this snippet. Don’t worry, it won’t complain if you’re in strict typechecking.

local rawRBXmemberGet
xpcall(function() return game[""] end, function() rawRBXmemberGet = debug.info(2, "f") end)

Now you have a function. Call it with any instance as the first argument and the string Name as the second argument and print it out.

print(rawRBXmemberGet(workspace, "Name"))
-- Output: Workspace

You just directly called the __index metamethod of instances.
This also works with children, signals, and functions.

rawRBXmemberGet(workspace, "IsA")
rawRBXmemberGet(workspace, "Camera")
rawRBXmemberGet(workspace, "Changed")

Now for the next trick, copy this code and run it:

local rawRBXmemberSet
xpcall(function() return game[""] = nil end, function() rawRBXmemberSet = debug.info(2, "f") end)

You now have another function. Call it with any instance as the first argument, then specify and property on the next argument, then pass a compatible value on the third, then check the instance properties after running it.

rawRBXmemberSet(game.SoundService, "Archivable", false)

You have now directly changed an instance property.

For the last goodie, copy this code and run it:

local raycast = workspace.Raycast -- don't worry, its just to get the function
raycast(workspace, <your raycast arguments>) -- directly calls the function

You have successfully constructed a way to directly call workspace:Raycast().

Any other instance that either has or inherits the method can even be passed in, like so:

game.IsA(workspace, "DataModel") -- workspace:IsA
workspace.FindFirstChild(localPlayer, "Test", true) -- localPlayer:FindFirstChild

Why does it work?

Object-oriented programming in Luau depends on two key traits:

  1. Calling object functions with the colon operator is syntactic sugar for obj.func(obj, ...)
  2. Retrieving a key within objects first checks if the key is in the object itself, then checks within the __index table/function if that fails.

Example:

local test = me.new() -- has an __index table of {test2 = function() blah blah end}
test.a = 8 -- creates a key in the object itself
test:test2() -- the object does not have test2, so it checks the index metatable instead

Object methods in the colon operator will always pass in the object and not the __index metatable, so if you call :test2(), it is syntactic sugar for test.test2(test).

Instances are type userdata, so it can only rely on metatables for any work to be done. The __index and __newindex keys in their metatables are functions, and you can call them directly. We extract these functions with xpcall and debug.info as the stack trace is preserved in xpcall. Both functions are universal and are the exact same functions for all other Instances, and they will by design work with all other Instances. This is also how some anti-cheats use this knowledge and check if the metatables of other Instances have been modified by an exploit, and even those same anti-cheats led me to this discovery.

Instance methods themselves, however, don’t work similarly. Getting the functions by dot operator will instead return a universal function that will work on all objects that inherit that function.

What are the use cases?

Practical use is extremely limited and niche, but a practical example is overhead reduction.

If you want to call multiple quick methods in one go, there is an unavoidable overhead as Luau goes through some code to make sure the method call works as intended. Direct calls can very slightly reduce the overhead, as Luau wouldn’t need to go this, there, and whatever. It may not be much on 10 calls, but this can scale tremendously with 40,000 calls and can greatly benefit if the fundamental design of the code cannot yield, in cases such as software rendering.

In most cases, you are completely fine with regular dot and colon operators, and I strongly discourage using direct calls unless you know what you are doing.

11 Likes

Doesn’t Luau have namecall optimizations that make Object:Method() faster than Object.Method(Object)? I think this is more applicable to Lua than Luau

3 Likes

As a result of both optimizations, common Lua tricks of caching the method in a local variable aren’t very productive in Luau and aren’t recommended either.

3 Likes

The optimizations are only applicable to userdatas.

When the object in question is a reflected userdata, a special mechanism called “namecall” is used to minimize the interop cost.

So for instances, yes, this is true. Roblox’s namecall is indeed faster than .f(self) but even though it’s really fast, direct calling only beats it slightly. I, however, failed to show another example where direct calls work best:

local isA_direct = game.IsA

-- code code code
-- code code code

for i = 1, 10000 do
  if isA(test[i], "BasePart") then
    -- code code code
  end
end

By paying the index cost now to achieve a more time-compact loop of calls, direct calls win in this singular situation.

1 Like

The knowledge that A:B() is logically the same as A.B(A), may actually be really useful for creating neat one-liners.

Imagine you need to pivot a PVInstance to a specific CFrame after a delay of 5 seconds. You could do

task.delay(5, function()
    instance:PivotTo(cframe)
end)

However I’m a sucker for one-liners, so you could instead do

task.delay(5, instance.PivotTo, instance, cframe)

And you would achieve the same result.

This feels like a bug and undefined behaviour.

1 Like

1 Like

I benchmarked this and the difference between directly calling the function and having it run via the metatable is very miniscule, basically deep within micro-optimization territory.
The overhead is from accessing the metatable - which is very fast already - rather than anything else.

sure, you can reduce the overhead just a little but I don’t think saving 300-900 microseconds - that is, when repeating it 100000 times. imagine how it is normally. - is really worth changing your code.

  • namecall vs calling IsA

  • getting properties normally vs using the metafunction directly


    at this point, I doubt that the 300-900 microseconds you save will really be any help since you got a bigger problem with the code taking THAT much time to run.

There is actually valid use cases for this that don’t revolve around microoptimizing your code, like using the function in pcall/task.spawn rather than having to make another function to do it

local get; xpcall(function() return game.something end, function() get = debug.info(2, "f") end)

-- ... assume you got code here inbetween

local success, value = pcall(get, someinstance, property)
task.delay(3, someinstance.Destroy, someinstance) -- though i prefer another way (not debris) to do this to avoid creating threads, it works fine

You can also do the same thing for almost every other datatype like cframes.

3 Likes

I might be wrong, so please benchmark my theory:

As far as I know, if luau knows the variable type and is in --!optimize 2, it will inline the property read in loops/events. So such rawget methods aren’t really necessary.

Update:

I made a bootleg benchmark, and from the tests of this benchmark, I can say that --!optimize 2 makes the speed equivalent to direct read/write:

ScriptA:

--!strict
--!optimize 2

local t:{string} = {}
local start = os.clock()

for i = 1,100000 do
	script.Name = tostring(math.random(1,1000))
	t[i] = script.Name
end
print(os.clock()-start)

ScriptB:

local Set xpcall(function() game[""] = nil end,function() Set = debug.info(2,"f") end)
local Get xpcall(function() return game[""] end,function() Get = debug.info(2,"f") end)

local t = {}
local start = os.clock()

for i = 1,100000 do
	Set(script,"Name",tostring(math.random(1,1000)))
	t[i] = Get(script,"Name")
end

print(os.clock()-start)

EDIT:
I believe this is still a very convenient function for the usage on --!optimize 0 (when your code contains getfenv/setfenv) or --!optimize 1 (standard studio script level). This is definitely a really cool trick but unless you are dynamically changing your fenv’s, I would suggest you avoid it and instead just do --!optimize 2

Getting the metamethod using an erroneous xpcall and debug.info was an idea I never thought about before. Now I need to test what other metamethods you can get from Roblox userdata!

Based on my benchmarks, calling the raw metamethod is always faster no matter what optimization level or compiler target you’re on.

Your benchmark is probably being interfered with by the math.random calls, which can take varying time depending on the clock seed.

Can you show me the way you benchmarked?

--!strict
--!optimize 2

local Clock, Info = os.clock, debug.info

--local Get xpcall(function() return game[nil] end, function() Get = Info(2, "f") end)
--local Set xpcall(function() game[nil] = nil end, function() Set = Info(2, "f") end)

local Start = Clock()

for Iteration = 1, 1e5 do
	--game.Name = Iteration
	--local Result = game.Name
	
	--Set(game, "Name", Iteration)
	--local Result = Get(game, "Name")
end

local End = Clock()
print(End - Start)

I usually get a ~2% speedup from direct calling.

Regardless, it would be quite silly for Roblox to go out of their way to fork the Luau compiler just so they can remove the metamethod invocation overhead for static instances. The Luau compiler doesn’t even optimize away unused variables currently.