This is a follow-up on my previous post. The previous post focused on DUPCLOSURE
and GETIMPORT
, two pieces of the optimising puzzle in Luau.
This post is WIP, provide feedback, suggestions and other important feedback, please. Cheers!
Post Co-Author: @i0rtid
Today, however, we will be focusing on one of the core pillars of ROBLOX game scripting, OOP
.
OOP, also known as object-orientated programming, is simply a way of designing our game where we use ‘classes’ that represent the state of our game. Say we have a tycoon; we would create a class called Tycoon
that holds the Player
, the Money
and the ProgressIndex
of the tycoon. There are many ways to do this pattern; however, we will be focusing on the two most commonly used and shared around.
The disassembly present on this post was obtained using RbxStu V4, my Roblox Studio executor. This is due to it using a compiler that is mostly accurate to the one used in Roblox (Opt 2 (due to RCC now using it by default on released games), mutable globals properly configured, Vector lib set, …). Comparable results can be done in external Luau compilers with proper settings, which I have documented on RbxStu’s standard page here due to getfunctionhash
requiring the proper compiler configuration.
However, if you are not as adventurous as getting V4 (currently not public per-se) and neither adventurous enough to build your own luau with a custom compiler for the bytecode, you can continue to use Lonegwadiator’s bytecode explorer => Luau Bytecode Explorer.
The disassembler used in RbxStu V4 is Konstant V2.1
, by plusgiant6 (also known as plusgiant5) “a Luau disassembler written in Luau” (Those are their ROBLOX usernames if anyone is interested on their projects!)
metatables and closures.
In metatable-based OOP we use metatables to our advantage; an example OOP module using this method would be the following:
local lib = {}
lib.__index = lib
function lib.hi()
print("hi")
end
function lib.new(...)
local newObject = { --[[ Initialize Fields or Properties... ]] }
-- Additional Initialization (if any)
return setmetatable(newObject, lib)
end
This method’s main advantage is memory, because of the fact that we are not keeping the functions around on the table we create every time (this is an oversimplification; read on DUPCLOSURE
to understand this point more thoroughly; the truth is that we are going to be keeping a new table ‘node’ each time, which is a key-value pair for the table internally. Nodes are roughly 32 bytes
in size; the closure is not reallocated if the DUPCLOSURE
op. code is used properly!)
The thing is: using this method of OOP takes us up in our execution time. Every time we index our table we could take either of two VM paths:
- Index the table (Fastest path)
- Index the metatables
__index
metafield (where__index
can be a function or table, which presents the same conundrum again). (Slower path)
We want to maximise our fast paths, and assuming that we will be repeatedly indexing a function, it may be better to simply ‘inject’ it into our table to reduce the index time and improve performance when calling these ‘hot functions’.
This takes us to our second method for OOP, closures
. This time around, we don’t necessarily use metatables for everything. Instead, each class holds all its state, as well as its functions without needing to use metatables at any point. An example of closure in OOP would be the following:
local lib = {}
function lib.new()
local current = { --[[ Initialize Fields or Properties... ]] }
-- Additional Initialization (if any)
function current.hi(self) -- Define methods
print("hi")
end
return current
end
Here we benefit from runtime performance but slightly lose on our memory usage, as we are constantly creating table nodes for each object in our table.
Now, I’ll jump straight to my test results and what I have learnt from them so far, and what the most performant methods are (which are hilarious).
For metatables, the following is the fastest method of instantiation (x2 faster than the normal, conventional method):
local lib = {}
lib.__index = lib
local cachedTableWithStubs = { --[[ Stub fields ]] }
function lib.new(...)
local newObject = table.clone(cachedTableWithStubs)
-- Update the properties and fields with your custom state
return setmetatable(newObject, lib) -- METATABLE MUST BE SET AFTER CLONING! (see remarks)
end
As for closures, the following is the fastest method of instantiation (x2.3 faster than the normal, conventional method):
local lib = {}
local cachedTableWithStubs = { --[[ Stub fields and functions here ]] }
function lib.new(...)
local newObject = table.clone(cachedTableWithStubs)
-- Update the properties and fields with your custom state
return newObject
end
Remarks
Spooky remarks! We aren’t cloning the table with the stubs with a metatable already set. Why? Internally, Luau has the __metatable
metafield. This metafield is checked on table.clone
to verify if the table can be cloned.
Because of this, if we call table.clone
with a metatable set, we will be incurring an index into the metatable of the table to check for the metafield. This is SLOW! It is SO slow, in fact, that, while slower than using the normal instantiation (where we just table {} in there for our indexes), the method above is almost 2x faster.
Details: Optimisation 2 (inlining): Objects were instantiated
10000
for all three tests. This screenshot is part of the benchmarks run using Benchmarker!
As you can see, this method of instantiation, where we keep a ‘cached’ table that we then replace with our dynamic state (The method is named Metatable-based Creation [Optimized (Metatable set each call)]
I’m a programmer; don’t judge the naming ), is cheaper in both ‘closure’ and ‘metatable’ OOP class instantiation.
Why choose one or the other?
According to your preferences, or maybe if you simply side with the ‘metatable’ or ‘no-metatable’ part of the community, however, there is a clear point if we look at them from simply facts.
Do you value memory or runtime performance?
The average developer will likely be more interested in memory performance; it is more significant in the grand scheme of things, as you may be running on devices with high constraints, and where you’re more likely to come across an OOM (out-of-memory) crash, where you simply have no more memory to make tables or any other resource in general.
This, however, we can ignore because we are simply looking at what is the fastest, not at memory complexity. Measuring memory complexity is more complicated, because we cannot just ‘spam’ call it to really get a good reading; we would have to calculate the number of bytes each method produces and then ‘diff’ them out to get a good reading. However, the equation for getting the bytes is roughly the following. NodeCount * 32
, that will result in the number of bytes your table would allocate for all your nodes roughly. You can calculate the number of nodes by simply getting the number of key-value pairs that are not arrays. This means that, yes, you will regardless lose memory in closure
-based OOP due to it allocating more nodes for each class, so if you value memory, you want to use metatable
OOP all the way because theoretically it should result in good, low memory allocations, with slightly slower runtime performance.
Now, let’s go into the graphs, the reasons, and the disassembly of the test cases to notice WHAT happens.
This output graph has some parts removed to show what I want to focus on first, the creation of the objects; later on in this post, we will be delving into the performance when using the objects!
All graphs, including the timings on the left.
This is a comparison of the ‘classic’ methods with the optimised methods we found during testing. The ‘classic methods’ are monikered [Normal]
, while our optimised methods are monikered [Optimized ...]
. As you can see, the classic closure
-based OOP instantiation is almost TWO TIMES slower than a normal metatable
-based OOP instantiation. However this does not come close to the 2x and 4x instantiation performance improvements for each metatable
and closure
’s optimised implementations.
Part 1: Closures Normal vs Optimised.
The key difference between the two lays in the disassembly. When you create a table, programatically, each index you add is transformed into a SETTABLE
op. code, Luau contains an ‘optimized’ one for constant indexes. SETTABLEKS
. The op. code, as defined on lbytecode.h
is as follows:
// SETTABLEKS: store source register into table using constant string as a key
// A: source register
// B: table register
// C: predicted slot index (based on hash)
// AUX: constant table index
LOP_SETTABLEKS,
If the op. code was to be translated into English, it would be Set the value in this table using this constant string as a key
.
Disassembly of the normal closure method:
-- Disassembled with Konstant V2.1's disassembler, made by plusgiant5
-- Disassembled on 2025-05-03 21:23:28
-- Luau version 6, Types version 3
-- Time taken: 0.000244 seconds
[0] #1 [0x00000041] PREPVARARGS 0 ; -- Prepare for any number (top) of variables as ...
[1] #2 [0x00040036] DUPTABLE 0, 4 ; var0 = {}
[2] #3 [0x00000104] LOADN 1, 0 ; var1 = 0
[3] #4 [0xC9000110] SETTABLEKS 1, 0, 201 [0] ; var0.balance = var1
local function Withdraw() -- Line 3
[0] #1 [0xC900020F] GETTABLEKS 2, 0, 201 [0] ; var2 = var0.balance
[2] #2 [0x01020222] SUB 2, 2, 1 ; var2 -= var1
[3] #3 [0xC9000210] SETTABLEKS 2, 0, 201 [0] ; var0.balance = var2
[5] #4 [0x00010016] RETURN 0, 1 ; return
end
[5] #5 [0x00050140] DUPCLOSURE 1, 5 ; var1 = Withdraw
[6] #6 [0x6A000110] SETTABLEKS 1, 0, 106 [1] ; var0.Withdraw = var1
local function Deposit() -- Line 6
[0] #1 [0xC900020F] GETTABLEKS 2, 0, 201 [0] ; var2 = var0.balance
[2] #2 [0x01020221] ADD 2, 2, 1 ; var2 += var1
[3] #3 [0xC9000210] SETTABLEKS 2, 0, 201 [0] ; var0.balance = var2
[5] #4 [0x00010016] RETURN 0, 1 ; return
end
[8] #7 [0x00060140] DUPCLOSURE 1, 6 ; var1 = Deposit
[9] #8 [0x9F000110] SETTABLEKS 1, 0, 159 [2] ; var0.Deposit = var1
local function GetBalance() -- Line 9
[0] #1 [0xC900010F] GETTABLEKS 1, 0, 201 [0] ; var1 = var0.balance
[2] #2 [0x00020116] RETURN 1, 2 ; return var1->var1
end
[11] #9 [0x00070140] DUPCLOSURE 1, 7 ; var1 = GetBalance
[12] #10 [0xBC000110] SETTABLEKS 1, 0, 188 [3] ; var0.GetBalance = var1
[14] #11 [0x00020016] RETURN 0, 2 ; return var0->var0
Code disassembled:
local self = {
balance = 0,
Withdraw = function(self: { balance: number }, amount: number)
self.balance -= amount
end,
Deposit = function(self: { balance: number }, amount: number)
self.balance += amount
end,
GetBalance = function(self: { balance: number }): number
return self.balance
end,
}
return self
As you can see above, the disassembly of our source code has plenty of SETTABLEKS
op. codes. This is normal on this kind of design, after all we are adding 4 new fields into the table before returning it. However, this is slow to do repeatedly. Instead, what if we make a base object? One that its state is all just, stub values, and we clone it and then use that instead?
That is exactly what the optimized version does! It only pays the cost of SETTABLEKS
four times (because of the initial instantiation), after which the table is cloned. Cloning a table like this is really fast, except in one edge case, which I will delve into in a bit; only its dynamic state is set, that being balance
. This way, we no longer have to deal with that many SETTABLEKS
op. codes, but rather just one time for the balance
field. This sets us on rather equal footing to metatable
-based OOP creation, but it is faster on creation as well. This could be due to the interpreter’s DUPTABLE
op. code, which duplicates a table from its constant version in the bytecode being slower at executing than table.clone
is, since table.clone
is much more straight forward, simply cloning the table without much beating around the bush most of the time.
Part 2 Metatable Normal vs Optimised:
Disassembly of the normal metatable method:
-- Disassembled with Konstant V2.1's disassembler, made by plusgiant5
-- Disassembled on 2025-05-03 21:42:16
-- Luau version 6, Types version 3
-- Time taken: 0.000266 seconds
[0] #1 [0x00000041] PREPVARARGS 0 ; -- Prepare for any number (top) of variables as ...
[1] #2 [0x00010136] DUPTABLE 1, 1 ; var1 = {}
[2] #3 [0x00000204] LOADN 2, 0 ; var2 = 0
[3] #4 [0xC9010210] SETTABLEKS 2, 1, 201 [0] ; var1.balance = var2
[5] #5 [0x0003020C] GETIMPORT 2, 3 [0x40200000] ; var2 = lib
[7] #6 [0x03013D4A] FASTCALL2 61, 1, 3 ; ... = setmetatable(var1, var2) -- Uses results from call at [11]. If successful, goto [12]
[9] #7 [0x0005000C] GETIMPORT 0, 5 [0x40400000] ; var0 = setmetatable
[11] #8 [0x02030015] CALL 0, 3, 2 ; var0 = var0(var1, var2)
::8::
[12] #9 [0x00020016] RETURN 0, 2 ; return var0->var0
Code dissassembly:
return setmetatable({ balance = 0 }, lib)
In this sample, we create our table, then we set its field, and then we set its metatable to our lib
.
May I remark that lib is not a global, its an upvalue, however as this sample was compiled in isolation, that is not properly observed in the bytecode!
The op. code would go through DUPTABLE
, as expected, however for some reason we benefit from using luaH_clone
directly without the interference of the Luau VM’s interpreter, something which I have not really wrapped my head around quite honestly! On the ‘optimised’ version, we do something similar to the DUPTABLE op. code, except we are doing it more ‘explicitly’. However, somehow, this ends up benefitting us in performance as much as 2x!
Part 3: table.clone edge-case
table.clone
has a specific edge-case, where it will perform really slowly, so slow in fact that it somehow almost 2x’s the time it takes to execute .new
. This edge-case appears only on tables whose metatable is set. table.clone
check the __metatable
metafield, and if it is set, it prohibits the clone with an error. Because of this, every time you make use of table.clone
with a table that has a metatable, it is slightly slower, since it is additionally indexing into the metatable of the object to check for that metafield.
I found this out during testing, and wrote a small little comment for it, perhaps this could explain it slightly better!
--[[
table.clone makes a check on the metatable of the given argument.
If the table has a metatable it will index the metatable in search of the '__metatable' field.
If the metatable has a '__metatable' field, it will error.
This is great, however this incurs a penalty when cloning ANY table with the '__metatable' field set to anything that isn't nil.
This explains why newDupeWithMetatableSet is much slower than newDupeWithMetatableNotSet and new, because table.clone is silently checking the metatable for the '__metatable' metafield.
]]
This is to explain why if we are creating the object that we will duplicate with table.clone
it is preferrable for it not to have a metatable.
Below is the test code for all this post, which you can test yourself with the Benchmarker plug-in:
--!optimize 2
local ClosureAccount = {}
local cachedClosureAccount = {
balance = 0,
Withdraw = function(self: { balance: number }, amount: number)
self.balance -= amount
end,
Deposit = function(self: { balance: number }, amount: number)
self.balance += amount
end,
GetBalance = function(self: { balance: number }): number
return self.balance
end,
}
function ClosureAccount.dupeNew()
local self = table.clone(cachedClosureAccount)
self.balance = 0
return self
end
function ClosureAccount.new()
local self = {
balance = 0,
Withdraw = function(self: { balance: number }, amount: number)
self.balance -= amount
end,
Deposit = function(self: { balance: number }, amount: number)
self.balance += amount
end,
GetBalance = function(self: { balance: number }): number
return self.balance
end,
}
return self
end
local Metatablebased = {}
Metatablebased.__index = Metatablebased
function Metatablebased.Withdraw(self: { balance: number }, amount: number)
self.balance -= amount
end
function Metatablebased.Deposit(self: { balance: number }, amount: number)
self.balance += amount
end
function Metatablebased.GetBalance(self: { balance: number }): number
return self.balance
end
function Metatablebased.new()
return setmetatable({ balance = 0 }, Metatablebased)
end
local duped2 = { balance = 0 }
function Metatablebased.newDupeWithMetatableNotSet()
local duped = table.clone(duped2)
duped.balance = 0
return setmetatable(duped, Metatablebased)
end
local accountClosures = ClosureAccount.dupeNew()
local accountMetatable = Metatablebased.newDupeWithMetatableNotSet()
return {
ParameterGenerator = function()
return
end,
Functions = {
["Metatable-based Creation [Normal]"] = function(Profiler)
for i = 1, 10000 do
Metatablebased.new()
end
end,
["Metatable-based Creation [Optimized (Metatable set each call)]"] = function(Profiler)
for i = 1, 10000 do
Metatablebased.newDupeWithMetatableNotSet()
end
end,
["Closure-based Creation [Normal]"] = function(Profiler)
for i = 1, 10000 do
ClosureAccount.new()
end
end,
["Closure-based Creation [Optimized]"] = function(Profiler)
for i = 1, 10000 do
ClosureAccount.dupeNew()
end
end,
["Closure-based Usage"] = function(Profiler)
for i = 1, 10000 do
accountClosures.Deposit(accountClosures, 0)
accountClosures.Withdraw(accountClosures, 1)
end
end,
["Metatable-based Usage"] = function(Profiler)
for i = 1, 10000 do
accountMetatable.Deposit(accountMetatable, 0)
accountMetatable.Withdraw(accountMetatable, 1)
end
end,
["Closure-based Usage (Namecall)"] = function(Profiler)
for i = 1, 10000 do
accountClosures:Deposit(0)
accountClosures:Withdraw(1)
end
end,
["Metatable-based Usage (Namecall)"] = function(Profiler)
for i = 1, 10000 do
accountMetatable:Deposit(0)
accountMetatable:Withdraw(1)
end
end,
},
}
Part 4: Call performance
As expected, no metatables has the best call performance, being roughly x1.25 faster to call when using dot-indexing
with a known index. This detail is significant, since this test is done on the assumption that the op. code used is GETTABLEKS
, not GETTABLE
!
It is worth noting that there is a difference with namecall
and index
calls. NAMECALL
is its OWN op. code unlike, others may believe, and it invokes its own path on the luau interpreter. NAMECALL
has some specific paths for __index
calls due to the fact they’re really frequent on userdata
both table
objects.
Regardless, what we can take away from all this is simple:
- Closure-based OOP is less performant memory wise, but the calling performance is slightly faster.
- Metatable-based OOP is more performant on the memory aspect, but the calling performance is slightly slower.
However remember! You must always test on your own, and prove the post that you are reading, potentially, wrong! In the end, both methods have their uses, they just need to be properly managed.
The reason why the ‘optimized’ versions are better is likely also to benefit because of maybe one of the following things:
- The slots for the ‘nodes’ of the tables will be allocated and indexing them will succeed.
- Modifying the index on the cloned tables likely does not incur a re-size.
- Due to the table being created once, the cost of all the ‘SETTABLEKS’ op. codes is much more significant on
closure
-based OOP, where we set new indexes for each function, however it can display benefits when you have a ‘default’ state for your class, where you would benefit from these stub values.
In the end, this is on the level of micro-benchmarking.
Cheerio to whoever reads this, this post was written hastily. I sincerely believe it is more of an ‘information dump’ than a proper explanation documenting everything that is truly going on :< However, I’m sure that with this you, developer, can make proper usage of this information for maybe your next project or to develop even more insane ways to save on performance, if only we all looked at the interpreter and other parts of it to try and exploit every single bit of performance
And remember…
If you would be a real seeker after truth, it is necessary that at least once in your life you doubt, as far as possible, all things.
- René Descartes (1596-1650)