Disclaimer, because I am sure that the content of this post may be controversal for some:
-
I’m in no way a professional at developing software; I’m just a guy who can read C++ comments as well as what would be doing my own studio executor in my free time. I am simply sharing what I believe to be good, useful knowledge.
-
This is a small tutorial focused on optimizing your code style, design, and overall structure so we can cheat our way into slightly better performance and memory usage, taking advantage of things consciously instead of unconsciously.
Like a great mind once said…
If you would be a real seeker after truth, it is necessary that at least once in your life you doubt, as far as possible, all things.
- René Descartes (1596-1650)
So doubt everything on this post, and test yourself, perhaps it could be different in your case!
Lexicon
Up-References/Up-Values:
- They are references made to variables outside of the scope of a function.
local dummyVariable = 0
local function test() --- nups (number of upvalues): 1.
print(
dummyVariable -- Captures dummyVariable as an Up-Reference
)
end
test()
Virtual Machine (programming context):
- It is software that interprets a specific format of ‘bytecode.’
Bytecode: - Optimized representation of source code that a virtual machine can interpret to do operations.
Operation Code (or Op. Code): - It is an instruction that a virtual machine or CPU can interpret. We will be focusing on the first one.
Where to begin?
We should begin first by understanding why Luau was even conceived. Luau came up as an alternative VM, which announced its full rollout on Faster Lua VM Released | Roblox Developer Forum in the year 2019, up to that point, we had been using Lua 5.1, whose performance, while good, was not perfect, and ROBLOX had started noticing its shortcomings. This also came with the unintended side effect of breaking cheats due to the big changes the VM had, having brand-new op codes and a new bytecode format.
The performance
Luau comes packed with performance. As the engineer went on 6 years ago, some scripts ran almost twice as fast! These optimizations come all over the place; however, we can still increase our yields by taking the most optimal paths consciously.
The fast-path(s)
The Luau VM has many implementations for its many opcodes. These have multiple ‘paths.’ These paths are normally labeled ‘fast-path’ and ‘slow-path.’ We want to MAXIMIZE the number of ‘fast paths’ we take in order to improve our performance and efficiency.
A key to taking these fast paths is making sure our environment never loses its ‘safe’ flag. In short, the ‘safe’ flag marks the environment that is currently running as one whose globals have not ‘changed’ uncontrollably while executing. You can only lose the ‘safe’ flag in one of the following ways (which I know of at least!):
- Using
getfenv
- Using
setfenv
- Using
loadstring
These functions are either deprecated or their usage is not recommended.
The flag, however, allows the interpreter to make some assumptions about the environment, such as the fact that once a non-mutable global is retrieved, it is guaranteed to not change throughout the script’s lifetime. This takes us to our first op code.
GETIMPORT
GETIMPORT
is an operation code that is basically an optimized GETGLOBAL
.
When the bytecode is loaded into the Luau VM using the luau_load
function, it does a few operations:
- Check bytecode version
- Check types version
- Load string table
-
userdata
type mappings - Load all the functions in the bytecode.
- Find the ‘main’ function
- Create a
Closure
object that represents the Luau function that was just loaded.
Simple and easy! However, we are interested in the 5th step, Load all the functions in the bytecode
. During this step, the Luau VM will begin loading the type information, instructions, constants, and debugging information (if available). When it begins loading constants, there, there is where the magic happens.
There are different kinds of constants, namely:
nil
boolean
number
vector
string
table
-
closure
And there is another one, which is specific toGETIMPORT
, theimport
constant.
This constant is what gives GETIMPORT
its speed. The globals that are marked non-mutable are resolved and saved into the constants table.
Then, when GETIMPORT
’s interpreter implementation is called, it will first check if the import resolution was successful, and if it was and the environment is ‘safe,’ it retrieves the imported global from the constants table, skipping a table lookup that could have potentially caused a __index
call, saving precious execution time.
References to this section:
DUPCLOSURE
DUPCLOSURE
simply reuses or duplicates a closure if certain, specific conditions are met.
To simplify, a Closure
object represents an instance of a function that the Luau interpreter can call at any point. These have two underlying implementations: lua_CFunction
and Proto
. The first are functions exposed from C/C++, while the latter are Luau functions exposed from bytecode.
Luau will attempt to emit the DUPCLOSURE
op. code only if the compiler is sure that there is a chance for reusing the function. In cases where there are unique up-values, in which every instance of the Closure object would have a unique up-value, the NEWCLOSURE
op. code is emitted, which guarantees a closure allocation.
So, if we use, say, this code:
local lib = {}
function lib.create()
local createdObject = {}
function createdObject.printObject()
print(createdObject)
end
return createdObject
end
print(lib.create().printObject)
print(lib.create().printObject)
print(lib.create().printObject)
print(lib.create().printObject)
Then the disassembly of the instructions of lib.create
when it is compiled with Optimization 0+ is the following:
NEWTABLE R0, 1, 0 ; Create a new table and store it into R0. The size of the hash-table will be '1' and the size of the array will be '0'.
NEWCLOSURE R1, P0 ; Create a new closure from P0 (proto 0), and save it into R1.
CAPTURE VAL R0 ; Capture the value at R0 'createdObject'
SETTABLEKS R1 R0 K0 ; Set the index K0 'printObject' in the table present at R0 'createdObject' to whatever is in R1 (the function we just created).
RETURN R0, 1 ; Return n (1) registers from R0
This is all to say that we will create a new function that will capture (hold an up-reference to) the table we created. Then we will set that function to an index and return the table we created originally.
Every time we call this function, we will, unfortunately, create a new closure every single time.
This can be proven just by loading the code, calling the function a few times, and looking at what we obtain on our print:
Different functions every. single. call.
However, if we slightly modify your implementation to the following:
local lib = {}
function lib.create()
local createdObject = {}
function createdObject.printObject(self)
print(self)
end
return createdObject
end
print(lib.create().printObject)
print(lib.create().printObject)
print(lib.create().printObject)
print(lib.create().printObject)
Then the DUPCLOSURE
op code is emitted by the compiler instead of NEWCLOSURE
.
Running the code on Studio yields us the following output:
Just like magic, we are using the same function over and over again; we are not sacrificing memory for performance like people continue to believe.
References for this section:
These are the two op codes I wanted to go over in this community tutorial, really, mainly since I have encountered more than one person who has gone out of their way to say, ‘Localize everything!’ or ‘You will be wasting memory doing this!’ This is to simply prove to you, dear reader, that no. You are not wasting resources doing this; you’re simply deciding not to use metatables and making sure to take the fastest VM paths possible where applicable.
I hope to release a follow-up post that disproves the ‘advantage’ of metatables on some aspects; however, I wish to obtain feedback on this whole post, what I could improve, and perhaps even users to run their own tests. However, it is fairly clear that if no functions are created, we don’t waste time creating the object, allocating it, etc., which is quicker regardless!
Let me know your thoughts down below