Luau Bytecode EXPLAINED - How to Read, Debug, and Optimize Like a Hacker

Yarik_superpro · September 18, 2025, 5:20pm

Understanding Bytecode - The Hidden Language of Luau

Hello to all developers brave (or crazy ) enough to look behind the curtain and learn how Luau actually works

Is it worth learning Bytecode?

If you just started with Luau <1 year ago, then I would say no, but once again it depends on your personality.
If you have been familiar with Luau for over a year, it’s 100% YES, worth the time understanding what your code becomes.

TL;DR: Bytecode = what your code actually becomes. Learn it to debug, optimize, and outsmart Roblox’s compiler.

Reasons to learn Bytecode

destroying skids “hahaha gg ez noob”
actual working optimizations
Noob vs Pro vs Hacker vs God [spoiler](I went insane writing this post; I’m sorry.)[/spoiler]
verifying and testing paradigms and if they fit Luau or not.
actual awareness as to what you code.
not being scammed by “Ultra Optimized Framework Inc. ”
Bytecode is the final boss of debugging → beat it once, and nothing in Roblox will ever feel mysterious again..
You stop being scared of “unknown” behavior because you can literally see the machine’s brain.

Why Most Optimizations Fail

The main reason optimizations flop is simple: most devs don’t actually know what Luau is doing under the hood.
Luau (Roblox’s language) is made up of four key parts:

. Bytecode - The low-level instructions your code becomes.
. Constant Pool - A separate table holding all your literals (numbers, strings, etc.).
. Bytecode Compiler - Turns your Luau source into bytecode + constant pool.
. VM/Interpreter - Reads and executes the bytecode instruction by instruction.

Understanding this pipeline is the first step to writing fast code - because you’re not just writing for yourself, you’re writing for the VM that runs it.

Stack-Based vs. Register-Based VMs

There are two major types of bytecode interpreters used in programming languages:

Stack-based: Every instruction works by pushing and popping values from a stack.
Register-based: Instructions directly reference numbered “registers,” like a little array of slots.

Luau is both Register-based and Stack-based.

Think of registers as a fixed-size table, usually limited to ~200 slots in Luau (not 255, since Luau reserves some internally).

If you try to use more registers than exist (e.g. creating too many locals or temporary values in one function), you’ll hit an internal compiler error. That’s why understanding how many registers your code consumes is important for writing optimal (and even valid) code.

TL;DR: Bytecode = 4 bytes per instruction, constants live separately, registers = ~200 slots, LOADK/LOADN/LOADB are your main tools. Understanding this lets you optimize code and read what Luau really does.

Optimization levels 0-2

Optimize levels are options you can force in your script by setting specif flag at the first lines:
Paste: --!optimize (number from 0 to 2 here) for example:

--!optimize 2

In live games all scripts will have optimize 2 set. (outside of studio)
It keeps the behavior of alghoritm but changes bytecode significantly to be more optimized.
Here is some additional info:

Seeing Bytecode in Action

Let’s look at a simple code:

local a = 5
local b = 10
return a + b

You’ll see something scary-looking like this (not real opcode values)

05 00 00 05
05 01 01 0A
33 02 00 01
22 02 01 00

But we can make it human-readable using a disassembler (there are several tools out there, such as luau-compile.exe: https://github.com/luau-lang/luau/releases/download/0.690/luau-windows.zip or Luau Bytecode Explorer: https://lbe.lonegladiator.dev

How to run a disassembler

There are 2 options to run a dissasembler:

Manual install (its easy dw bro)

Go to: Releases · luau-lang/luau · GitHub
Scroll down untill you find:

Select the one you need (luau-windows.zip) in my case
Extract it
Create file smth like test.luau and insert code inside it.
Now hold shift+right click empty space in this extracted folder luau-windows (in my case) and press open with PowerShell here
And now run:
.\luau-compile.exe binary -O0 .\test.luau
It will output you a dissasembled Luau bytecode basically.
You can also change optimize levels:
.\luau-compile.exe binary -O1 .\test.luau
.\luau-compile.exe binary -O2 .\test.luau
Etc…
And that is pretty much it! Happy usage!

Web disassembler

NOTE: This is not official Luau web dissasembler and there no any guarantee of this website being safe!

If you paste private code into web tools, don’t include API keys or secure code.

https://lbe.lonegladiator.dev/
Very simply:
You just paste code on left side and see dissasembled version on right side;
You can seamlessly change optimize levels and that pretty much it LOL.

This website also will automatically switch to optimize level you want if you type “–!optimize 2” for example in the beggining of your code.

Debug Levels

You can set debug level to provent compiler from throwing awey unused lines of code etc.
Either something like:
./luau-compile.exe binary -O2 -g2 ./test.luau
For installed disassembler.
or

if you use website version.

Why?

Without debug mode this LOADN instruction would’ve been thrown awey for example:

local a = 1

Please note that my visualization of bytecode inside this tutorial differs to the one you would see inside dissasembler. I did it so to make it look more begginer friedly.

Constants:
    [0] = 5;
    [1] = 10;

Function 0 (??):

LOADK R0 K0-- [0] 5
LOADK R1 K1-- [1] 10
ADD R2 R0 R1
RETURN R2 1

Breaking It Down

Instruction	What It Means
`LOADK R0 K0`	Load constant `5` ([0] from constant pool) into register `R0`.
`LOADK R1 K1`	Load constant `10` ([1] from constant pool) into register `R1`.
`ADD R2 R0 R1`	Add `R0 + R1` and store in `R2`. Essentially: `R2 = R0+R1`.
`RETURN R2 1`	Return the value in `R2`. 1 = number of args returned. (more about that later)

What is a LOADK

LOADK [A] [B]
A = Register to load.
B = Constant.
Essentially: A = Constants[B]

What is a ADD

ADD [A] [B] [C]
A = Register to load.
B = Register.
C = Register.
Essentially: A = B + C

Notice how:

The constants (5 and 10) live in the constant pool.
The registers (R0, R1, R2) are just slots in a little “virtual array.”
The bytecode never directly stores 5 or 10 - it stores an index into the constant pool.
Each bytecode instruction takes 4 bytes!

Why This Matters

If we rewrote the code like this:

return 5 + 10

The compiler might optimize it entirely into:

Constants:
    (literally empty)

LOADN R0 15
RETURN R0 1

…What did just happen?!
Bytecode just ditched constant pool entirely and stored value dirrectly!
Why?
You see creating space for a variable that is gonna be used only once is pretty dumb and compiler thinks that way too!
This way we learned about new operation: LOADN; Do want to remind you that LOADN will only get used in --!optimize 1+ level; So --!optimize 2 or --!optimize 1 for example.
So LOADK is a thing of past and is now 100% useless? WRONG!

local t = {"hello", "world"}

Constants:
    [0] = "hello";
    [1] = "world";

NEWTABLE R0 0 2
LOADK R1 K0-- [0] "hello"
LOADK R2 K1-- [1] "world"
SETLIST R0 R1 2
RETURN R0 0

Breakdown of what just happened

Instruction	What It Means
`NEWTABLE R0 0 2`	Creates table inside R0 with 0 hash size and 2 array size.
`LOADK R1 K0`	Load constant `hello` ([0] from constant pool) into register `R1`.
`LOADK R2 K1`	Load constant `world` ([1] from constant pool) into register `R2`.
`SETLIST R0 R1 2`	Loads ALL registers in array-order from R1 to R2 into R0 table.
`RETURN R0 0`	Returns nothing (void/nullptr).

SETLIST Explained

Basically:
[SETLIST] [A] [B] [C]

Param	Meaning
A	Target table register
B	Starting register
C	Number of registers to copy

B’s Register number + (C-1)
So if B = R7 and C = 5 Then range = R7 to R11
5 registers total: R7, R8, R9, R10, R11

Althrough note how it can only set 16 registers per operation

NEWTABLE Explained

[NEWTABLE] [A] [B] [C]
A = Register
B = Hash Size
C = Array Size

Both Array Size and Hash Size can co-exist!

What is a Hash Size?

Imagine tables likes this:
local t = {[2]=true;[4]=true}
or like this:
local a = {["Hello"]=2;[true]="Lol"}

Essentially any table piece that has broken number index range or is not a using numbers as a key is considered a “hash part”

What is an Array Size?

Imagine tables likes this:
local hi = {"Hello";"World";"!";}
or like this:
local t = {[1]=true;[2]="Happy";}

Essentially any table piece that is following order (from 1 to inf) is considered an “array part”

Still used LOADK? Why?
That becouse LOADN is used ONLY for INTEGER (numbers) -32768 to 32767 (16 bit):

Why 16 bits?

Why not 8?
You see OPCODE+Register slot already take 16 bit (8*2) and we are left with two 8 bit slots.
So LOADN simply merges it into singular slot.
1 byte = 8 bit.
Bytecode instruction is capped at 4 byte (so 32 bit).

Values like Booleans do use own OPCODE: LOADB R 0-1 --(false/true)

One instruction, one register.
This is why understanding bytecode is powerful - it helps you see where the compiler is doing extra work and where you can make its job easier.

Ok what is this buzzword “opcode”?

Opcode stands for Operation Code blah blah blah.
Basically sort of Enum/Id for operation.

So for example LOADK would actually be a value from 0 to 255 internally (lets assume it is 05 for example)
So bytecode operation actually has 4 bytes to store information: [Opcode] [A] [B] [C]
Each operation contains 4 bytes in Luau basically lets make actual code example:

local function CALL(Arg1,Arg2,Arg3)
end

So we basically have function “CALL” and OPCODES are essentially sort of like a function.
We can pass 3 arguments inside it maximum.
So essentially: CALL R0 0 0 is literally like: CALL(print,nil,nil) in Luau.
Call can also contain -1 and in that case it will act as if tuple (…) aka varargs.

What does -1 means?

It simply means… ALL!
Just treat -1 as a very huge number. [spoiler](althrough lesser than 255)[/spoiler]
Remember SETLIST ?
Essentially SETLIST R0 R1 -1 is like SETLIST R0 R1 999999 (abstract example but you get it)
This will also be the case for CALL or any other OPCODE really.

FUN FACT

You may see something like LOADB R1 0 +1 instead of LOADB R1 1
Its essentially means L1 aka JUMP TO A CERTAIN LINE. (more info about this later)

Next up we’ll talk about jumps, why the compiler keeps MOVEing stuff around, and the secret FASTCALL instructions that let you skip function overhead

CALL explained

So here is our code example:

local hm = 2
print(hm)

GETIMPORT R0 1-- print
LOADN R1 2
CALL R0 1 0
RETURN R0 0

Explanation:

Instruction	What It Means
`GETIMPORT R0 1`	Load function `print` into register `R0`.
`LOADN R1 2`	Load number `2` into register `R1`.
`CALL R0 1 0`	Calls `R0` with `R1` as argument.
`RETURN R0 0`	Returns nothing (void/nullptr).

What is GETIMPORT

GETIMPORT allows you to load Luau globals (with up to 3 depth etc game.Workspace.Part is the limit, can’t index further with GETIMPORT alone)

local game = game
local print = print

[GETIMPORT] [A] [B]
A = Register to load
B = Import table index (nothing useful and you may see [DATA REDACTED] gaps here)
Think of it like this:

Compiler picks what it wants to use and puts it inside a secret table:
[1] = print; [2] = ??? (Data Redacted, even Roblox staff don't know); [3] = game; ...

Then your bytecode does:

GETIMPORT R0 1  -- fetches 'print'
GETIMPORT R1 3  -- fetches 'game'

That also means that your: game.Workspace.Part is actually gonna be:

[1] = game; [2] = game.Workspace; [3] = game.Workspace.Part

GETIMPORT R1 3  -- game.Workspace.Part

How CALL actually works

[CALL] [A] [B] [C]
A = Register with the function we call
B = Amount of args to pass into function (CALL R5 3 0 is essentially R5+3=R6,R7,R8)
C = Amount of args to be returned

C: That a part where most of the problems happen: It starts writing arguments right inside its own function register. So, for example, the function returns 5,4,3 and so we have: CALL R0 0 3 It will do that:
R0 = 5
R1 = 4
R2 = 3
So our function is now… GONE! This is why the compiler is so pessimistic, but we will discuss that in the next section.

Why compiler is so pessimistic and always MOVEs values to a straight order?

As you seen in “How CALL actually works” some registers have to be stacked in order to interact properly.
This is why compiler likes to MOVE/IMPORT same function/Value multiple times.

Speaking of Move
here is how it works:
MOVE [A] [B]
A = Register to Load
B = Register we get value from.
Essentially A = B

JUMP Operations and If statements

This is pretty easy part.
Essentially lets your bytecode instructions “jump”

local print = print
if true then
    print("its true")
else
    print('its false')
end

print("hii")

Bytecode:

1.  GETGLOBAL R0 K0-- "print"
2.  LOADB R1 1-- true
3.  JUMPIFNOT R1 L8--if R1 is not true then jump to line 8
4.  MOVE R1 R0-- moves function "print" to R1
5.  LOADK R2 K1-- "its true"
6.  CALL R1 1 0
7.  JUMP L11--Jump to line 11 and skip other instructions behind
8.  MOVE R1 R0-- moves function "print" to R1
9.  LOADK R2 K2-- "its false"
10. CALL R1 1 0
11. MOVE R1 R0-- moves function "print" to R1
12. LOADK R2 K3-- "hii"
13. CALL R1 1 0
14. RETURN R0 0-- exit code

What is JUMP and JUMPBACK and JUMPX

JUMP (or as I like to call it, LEAP - sounds cooler)
Is an OPCODE that is used to skip all instructions and move to certain line. (etc from line 11 to line 2).
This opcode is quite rare as since JUMPIFNOT, JUMPIFEQ, JUMPIFNOTEQ, JUMPIF… pretty much more common.

JUMP [A]
A = line to jump to.
JUMPBACK [A]
A = line to jump to. (same as jump exept can exit while/repeat loop)
JUMPX [A]
A = line to jump to.

What is JUMPIFNOT

JUMPIFNOT [A] [B] JUMP IF NOT
A = Register for condition.
B = line to jump to.
If A is falsy (false/nil), jumps by B.

What is JUMPIF

JUMPIF [A] [B] JUMP IF
A = Register for condition.
B = line to jump to.
If A is truthy(true), jumps by B.

What is JUMPIFEQ

JUMPIFEQ [A] [B] [C] JUMP IF EQUAL
A = Register for condition.
B = Register for condition.
C = line to jump to.
If A == B then jumps by C.

What is JUMPIFNOTEQ

JUMPIFNOTEQ [A] [B] [C] JUMP IF NOT EQUAL
A = Register for condition.
B = Register for condition.
C = line to jump to.
If A ~= to B then jumps by C.

What is JUMPIFLE

JUMPIFLE [A] [B] [C] JUMP IF LESS OR EQUAL
A = Register for condition.
B = Register for condition.
C = line to jump to.
If A <= B then jumps by C.

What is JUMPIFNOTLE

JUMPIFNOTLE [A] [B] [C] JUMP IF NOT LESS AND NOT EQUAL
A = Register for condition.
B = Register for condition.
C = line to jump to.
If not A <= B then jumps by C.
Effectively A > B.

What is JUMPIFLT

JUMPIFLT [A] [B] [C] JUMP IF LESS
A = Register for condition.
B = Register for condition.
C = line to jump to.
If A < B then jumps by C.

What is JUMPIFNOTLT

JUMPIFNOTLT [A] [B] [C] JUMP IF NOT LESS
A = Register for condition.
B = Register for condition.
C = line to jump to.
If not A < B then jumps by C.
Effectively A >= B.

What is JUMPXEQKNIL

JUMPXEQKNIL [A] [B] [C] JUMP IF REGISTER == nil
A = Register for condition.
B = line to jump to.
C = not switch.

if a==nil then:
JUMPXEQKNIL R0 L0 NOT

if a~=nil then:
JUMPXEQKNIL R0 L0

What is JUMPXEQKB

JUMPXEQKB [A] [B] [C] [D] JUMP IF REGISTER == true
A = Register for condition.
B = State (0/1: false/true)
C = line to jump to.
D = not switch.

if a==true then:
JUMPXEQKB R0 1 L0 NOT

if a~=true then:
JUMPXEQKB R0 1 L0

if a==false then:
JUMPXEQKB R0 0 L0 NOT

if a~=false then:
JUMPXEQKB R0 0 L0

What is JUMPXEQKN

JUMPXEQKN [A] [B] [C] [D] JUMP IF REGISTER == 1
A = Register for condition.
B = Constant (number)
C = line to jump to.
D = not switch.

if a==1 then:
JUMPXEQKN R0 K0 L0 NOT

if a~=1 then:
JUMPXEQKN R0 K0 L0

What is JUMPXEQKS

JUMPXEQKS [A] [B] [C] [D] JUMP IF REGISTER == 1
A = Register for condition.
B = Constant (string)
C = line to jump to.
D = not switch.

if a=="" then:
JUMPXEQKS R0 K0 L0 NOT

if a~="" then:
JUMPXEQKS R0 K0 L0

FASTCALLs

What is a FASTCALL?
FASTCALLs is used perform a fast call of a built-in function!
Essentially an “ID” instead of importing anything;
FASTCALL is essentially a very strong but “experimental” brother of CALL that can’t live without having CALL as a backup.
All FASTCALLs if fail will default to regular CALL instruction;
FASTCALL is always followed by one of (GETIMPORT, MOVE, GETUPVAL) OPCODE for a backup.
Here is a bytecode example:

print(math.abs(-9))

1. GETIMPORT R0 1-- "print"
2. LOADN R2 -9--load number -9 to R2
3. FASTCALL1 2 R2 L6--Attempts FASTCALL with 1 argument "R2"
4. GETIMPORT R1 4-- math.abs
5. CALL R1 1 1--call math.abs (non FASTCALL)
6. CALL R0 1 0--print call
7. RETURN R0 0--exit code

Explanation:

Instruction	What It Means
`GETIMPORT R0 1`	Load function `print` into register `R0`.
`LOADN R2 -9`	Load number `-9` into register `R2`. (notice how compiler moved it here to fit CALL)
`FASTCALL1 2 R2 L6`	Calls Fastcall with id 2 with `R2` as argument. Jumps to Line 6 if successful.
`GETIMPORT R1 4`	Load function `math.abs` into register `R1`. if `FASTCALL1` fails
`CALL R1 1 1`	Calls `R1` with `R2` as argument. R1 becomes return of this call. if `FASTCALL1` fails
`CALL R0 1 0`	Calls `R0` with `R1` as argument.
`RETURN R0 0`	Returns nothing (void/nullptr).

Why are there multiple of them?
Each fastcall has a specific purpose.

What is a FASTCALL

FASTCALL [A] [B]
A = Id of a function.
B = Jump to.
FASTCALL tells the VM: “Yo, the next CALL? I got this. Skip the usual setup if it’s safe.
Think of it like a macro that bypasses some function-call overhead.
Code sample:

print(table.unpack())

1. GETIMPORT R0 1--print
2. FASTCALL 53 L5
3. GETIMPORT R1 4-- table.unpack
4. CALL R1 0 -1
5. CALL R0 -1 0
6. RETURN R0 0

TL;DR: It doesn’t store argument registers itself - it just looks ahead to the following CALL to know where the args are and how many there are. So the VM can execute the call faster if it’s a builtin, skipping the normal Luau call overhead.

Plain FASTCALL: can have 0, 1, 2, or more arguments - it just reads them from the next CALL.
FASTCALL1/2/2K/3: optimized for 1–3 args (or 1+constant) for speed.

What is a FASTCALL1

FASTCALL1 [A] [B] [C]
A = Id of a function.
B = Register.
C = Jump to.
Fastcall with exactly 1 register argument. VM knows there’s just one arg, so it can skip stack setup.

What is a FASTCALL2

FASTCALL2 [A] [B] [C]
A = Id of function.
B = Register.
C = Register.
D = Jump to.
Fastcall with exactly 2 register argument. VM knows there’s just two arg, so it can skip stack setup.

What is a FASTCALL2K

FASTCALL2K [A] [B] [C]
A = Id of function.
B = Register.
C = Constant.
D = Jump to.
Exactly the same as FASTCALL2 exept C is now a constant.

What is a FASTCALL3

FASTCALL3?
I can’t find any way to generate this instruction normally
luau/Common/include/Luau/Bytecode.h at 5059095fec64b658ea9f7c5fae61cc770fe0d9af · luau-lang/luau · GitHub

List of supported FASTCALLs (FASTCALL, FASTCALL1, FASTCALL2K, FASTCALL3)

Click to see the list

math

atan
ceil
cosh
cos
deg
exp
floor
fmod
frexp
ldexp
log10
log
max
min
modf
pow
rad
sinh
sin
sqrt
tanh
tan
clamp
sign
round
lerp

bit32

arshift
band
bnot
bor
bxor
btest
extract
lrotate
lshift
replace
rrotate
rshift
countrz
countlz
extract
byteswap

string

byte
char
len
sub

table

insert
unpack

buffer

readi8
readu8
writeu8
readi16
readu16
writeu16
readi32
readu32
writeu32
readf32
writef32
readf64
writef64

vector

magnitude
normalize
cross
dot
floor
ceil
abs
sign
clamp
min
max
lerp
create

type
typeof
rawset
rawget
rawequal
rawlen
select
setmetatable
getmetatable
tonumber
tostring

Operators with bytecode instructions instead of function calls (good for optimization)

Click to see the list

// instead of math.floor IDIV
% instead of math.pow MOD
# instead of string.len LENGTH

Closures , Prototypes (blueprints) and functions

Functions… Probably part that makes the most people confused.
What the hell is P0!?
Well just as R stands for Register, K for Constant, so does P stand for prototype.
Ok what is a prototype? blud cant read the title
Protytype is a blueprint for function to be built using DUPCLOSURE or NEWCLOSURE
They are essentially sort of useless as since it can be fully precompiled and stored in constant UNLESS you create a closure.

Opcode	When Used	Purpose
NEWCLOSURE `[R]` `[P]`	Used any time the function needs to capture something (`VAL`/`REF`/`UPVAL`).	Slower (has to set up upvalues with `CAPTURE`).
DUPCLOSURE `[R]` `[P]`	Used when the function captures nothing, so it can just clone the function as-is.	Faster (no capture step needed).
DUPCLOSURE `[R]` `[K]`	Used when the compiler already emitted the closure as a constant and just needs another reference to it.	Fastest (no prototype lookup, no captures).

NEWCLOSURE

NEWCLOSURE [A] [B]
A = Register to load.
B = Prototype.

DUPCLOSURE

DUPCLOSURE [A] [B]
A = Register to load.
B = Prototype/Constant.

Is it a function or a closure?

Closure is A FUNCTION* that captures value outside of own body (function arguments don’t count)
If your function is NOT* a closure then compiler will try applying optimizations (if its Optimize 1 or 2) such as using DUPCLOSURE instead of NEWCLOSURE and that is why avoiding closures is most of the time good!

Functions:

local function e()
    return function()

    end
end

local function printVal(hii)
    print(hii)
end

Closures:

local function e()
    local LOL = 1
    return function()
        print(LOL)
    end
end

local RealValue = 1
local function printVal(hii)
    print(hii,RealValue)
end

Want to note that compiler will TRY to turn closure into a regular function if its possible (etc if its immutable values closure may just create their own and this way allowing for DUPCLOSURE and anihilation of upvalue usage.
My examples with closures above will become functions rather than closures as since value they capture are immutable (in this case).

Stays a closure (wont be turned into a regular optimized function):

local function e()
local LOL = 1
    return function()
    LOL+=1
        print(LOL)
    end
end

For loops ipairs, pairs, inext, next

View Category

For loops unlike while/repeat until/if statements do not simply jump up or down, in Luau for loops have specialized bytecode instructions for each specific case of for loops.

What is a FORNPREP

FORNPREP [A] [B]
A:
R[A] = Register for index;
R[A+1] register for limit;
R[A+2] register fpr step.
B = jump over the loop if first iteration doesn’t need to run. (ETC for i=10,9,1 do)

for i=1,9,1 do--index,limit,step

1. LOADN R2 5--A+2
2. LOADN R0 9--A
3. LOADN R1 1--A+1
4. FORNPREP R0 L9

What is a FORNLOOP

FORNLOOP [A] [B]
A:
R[A] = Register for index;
R[A+1] register for limit;
R[A+2] register fpr step.
B = jump to if index(R[A]) < limit(R[A+1])

for i=5,9,1 do
    warn(i)
end
print(1)

1.  LOADN R2 5--A+2
2.  LOADN R0 9--A
3.  LOADN R1 1--A+1
4.  FORNPREP R0 L9
5.  GETIMPORT R3 1-- warn
6.  MOVE R4 R2
7.  CALL R3 1 0
8.  FORNLOOP R0 L5
9.  GETIMPORT R0 3-- print
10. LOADN R1 1
11. CALL R0 1 0
12. RETURN R0 0

What is a FORGPREP

FORGPREP [A] [B]
A:
R[A] = Register for generator(table/function);
R[A+1] Register for state(usually nil/table);
R[A+2] Register for index(usually nil/number/string).
B = jump to. (regardless of anything)

for i,v in {} do

1. NEWTABLE R0 0 0--A
2. LOADNIL R1--A+1
3. LOADNIL R2--A+2
4. FORGPREP R0 L9

Can be used for custom iterators aswell:

local function custom_iter(tbl,key)
	return next(tbl,key)
end
for i,v in custom_iter,{1,2,3,4},2 do
	print(i,v)
end

Prints: 3,3,4,4

1.  DUPCLOSURE R0 K0;--our custom iterator "custom_iter"
2.  MOVE R1 R0
3.  NEWTABLE R2 0 4--{
4.  LOADN R4 1
5.  LOADN R5 2
6.  LOADN R6 3
7.  LOADN R7 4
8.  SETLIST R2 R4 4--} building our table
9.  LOADN R3 2--start after 2
10. FORGPREP R1 L15--R1,R2,R3
11. GETIMPORTR6 2-- print
12. MOVE R7 R4
13. MOVE R8 R5
14. CALL R6 2 0
15. FORGLOOP R1 L11 2
16. RETURN R0 0

What is a FORGLOOP

FORGLOOP [A] [B] [C] [D]
A:
R[A] = Register for generator;
R[A+1] Register for state;
R[A+2] Register for index.
B = jump to. if generator(state, index)~=nil (as first argument)
C = ammount of args to return (etc for i,v in would be 2 args)
D = ipairs-style traversal marker.
Registers getting loaded like that: if C==2 then it would take: R[A+3],R[A+4] for those for i,v returned variables.

for i in {} do
    print(i)
end

1. NEWTABLE R0 0 0
2. LOADNIL R1
3. LOADNIL R2
4. FORGPREP R0 L8
5. GETIMPORT R5 1-- print
6. MOVE R6 R3
7. CALL R5 1 0
8. FORGLOOP R0 L5 1--i is a R3,1 means only "i" is returned (better optimization)
9. RETURN R0 0

What is a FORGPREP_NEXT

FORGPREP_NEXT [A] [B]
A:
R[A] = Register for next (function);
R[A+1] Register for table;
R[A+2] Register for nil;
B = jump to. (regardless of anything)

for i,v in pairs({}) do

or

for i,v in next,{} do

1. GETIMPORT R0 1-- next
2. NEWTABLE R1 0 0
3. LOADNIL R2
4. FORGPREP_NEXT R0 L8
5. GETIMPORT R5 3-- print
6. MOVE R6 R3
7. CALL R5 1 0
8. FORGLOOP R0 L5 2
9. RETURN R0 0

What is a FORGPREP_INEXT

FORGPREP_INEXT [A] [B]
A:
R[A] = Register for inext (function);
R[A+1] Register for table;
R[A+2] Register for number;
B = jump to. (regardless of anything)

for i,v in ipairs({}) do

for i,v in ipairs({}) do
    print(i)
end

1. GETIMPORT R0 1-- ipairs
2. NEWTABLE R1 0 0
3. CALL R0 1 3
4. FORGPREP_INEXT R0 L8--inext,{},0
5. GETIMPORT R5 3-- print
6. MOVE R6 R3
7. CALL R5 1 0
8. FORGLOOP R0 L5 2-- inext marker enabled
9. RETURN R0 0

Fun fact

You can actually obtain “inext” function:

local inext = ipairs({})

Althrough compiler won’t recognize it inside FORGPREP_INEXT sadly.

Summary

Bytecode is very hacky

R - Register
K - Constant pool
L - Line
P - Prototype(blueprint for function)

`CAPTURE` opcode modifiers:

CAPTURE Modifiers (U/REF/VAL/UPVAL)

Modifier	What it means	Example
`CAPTURE VAL R`	Capture a pointer to the outer variable; closure shares its own copy	`CAPTURE VAL R0` → captures `x=5` into closure
`CAPTURE REF R`	ake a snapshot of the current register value; closure stores its own copy	`CAPTURE REF R0` → closure reads/writes the same `x` as outer
`CAPTURE UPVAL U`	Capture an upvalue from an outer closure	`CAPTURE UPVAL U0` → reuse an upvalue captured by a previous closure

U = upvalue slots of a function/closure
VAL = copy, REF = shared pointer

VAL: safe snapshot → changes outside don’t affect closure
REF: live link → changes outside do affect closure

UPVAL can chain: closures can capture variables from outer closures
GETUPVAL(opcode) is how closures access their captured variables at runtime
Closures don’t use outer locals directly; they go through upvalues

Do it if you want to regret

Here is a snippet of code you may use for testing

local c = 1
(function()
local b = 1
local function outer()
    local x,y,z = 5,1,4
    return function()
    local h = 9
        print(b,x,y,z)  -- b,x,y,z is an upvalue here
        return function()
            print(b,x,y,z,h,c) -- yeah.... :horrow:
            b+=1
        end
    end
end

outer()()()
end)()

List of unmentioned OPCODES and their behavior

GETVARARGS

GETVARARGS [A] [B]
A - Register where to start write (like in CALL opcode)
B - ammount to write
Example:
GETVARARGS R0 4 : R0,R1,R2,R3 gets written

GETUPVAL

Essentially “grab value from pointer”
The way UPVALUES are loaded does matter for B.
GETUPVAL [A] [B]
A = Register to load
B = Upvalue slot (in order of which CAPTURE loading such slots)
Basically:
A = Values[B]

SETUPVAL

Opposite of GETUPVAL
If you ever used language like C you would’ve understood such behavior as “pointers”
Essentially:
SETUPVAL [A] [B]
A = Register to grab value from
B = Upvalue slot to load.
Essentially Values[B] = A if that makes sense.
Forcing B upvalue to get value of A register.

CAPTURE

Used to interact with GETUPVAL/SETUPVAL for a closure.
Is used to set load upvalue referances to the last used NEWCLOSURE/DUPCLOSURE
Order in which they are used do matter for free slot
Basically:

NEWCLOSURE
CAPTURE--0 slot ETC GETUPVALUE R4 0
CAPTURE--1 slot ETC GETUPVALUE R5 1
CAPTURE--2
SomethingHere
CAPTURE--3
NEWCLOSURE
CAPTURE--0

Information about modifiers (VAL,REF,UPVAL) was already described above
CAPTURE [A] [B]
A = Modifier
B = Register/U (U means upvalue)

CLOSEUPVALS

CLOSEUPVALS [A]
A = register to migrate.
Moves the register from stack to heap so any closures capturing it can safely access it beyond the function’s stack lifetime.

LOADNIL

LOADNIL [A]
A = Register to load.
Sets register A to a nil value.

LOADB

LOADB [A] [B] [C]
A = Register to load.
B = value (0/1 false/true)
C = Jump point (optional).
Sets register A to a boolean value.If C is set then jumps to a set line.
dissasemblers currently face a problem of displaying it like +1 instead of L1 for example.

GETGLOBAL

GETGLOBAL [A] [B]
A = Register to load.
B = Constant(string).
Essentially: R[A] = _ENV[B]

SETGLOBAL

SETGLOBAL [A] [B]
A = Register to get value from.
B = Constant(string).
Essentially: _ENV[B] = R[A]

DUPTABLE

DUPTABLE [A] [B]
A = Register to load.
B = Id.
Essentially NEWTABLE but with already ready presets for hash and array part.

SETTABLE

SETTABLE [A] [B] [C]
A = Register to with value.
B = Register with the Table (target).
C = Register with a Key.
Essentially:
B[C] = A

SETTABLEKS

SETTABLEKS [A] [B] [C]
A = Register to with value.
B = Register with the Table (target).
C = Constant with a Key.
Exactly the same as SETTABLE except uses constants for a key.

GETTABLE

GETTABLE [A] [B] [C]
A = Register to load.
B = Register with the Table (target).
C = Register with a Key.
Essentially:
A = B[C]

GETTABLEKS

GETTABLEKS [A] [B] [C]
A = Register to load.
B = Register with the Table (target).
C = Constant with a Key.
Exactly the same as GETTABLE except uses constants for a key.

SETTABLEN

SETTABLEN [A] [B] [C]
A = Register to with value.
B = Register with the Table (target).
C = Number (1-256).
Essentially:
B[C] = A

Yeah sort of micro optimization instruction with limitations.

GETTABLEN

GETTABLEN [A] [B] [C]
A = Register to load.
B = Register with the Table (target).
C = Number (1-256).
Essentially:
A = B[C]

Yeah sort of micro optimization instruction with limitations.

NAMECALL

NAMECALL [A] [B] [C] - evil brother of CALL
A:
A = Register to load (function).
A+1 = Register to load table (self).

B = Register with the Table (target).
C = Constant.
Essentially:
A = B[C]
A+1 = B

Essentially used for calls with :, always followed by a CALL OPCODE.
Basically micro optimization + mix of GETTABLEKS(for function) + MOVE(for table).
Conclusion: don’t do methods.

ADD

ADD [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B+C

SUB

SUB [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B-C

MUL

MUL [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B*C

DIV

DIV [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B/C

IDIV

IDIV [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B//C

MOD

MOD [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B%C

POW

POW [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B^C

ADDK

ADDK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B+C

SUBK

SUBK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B-C

MULK

MULK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B*C

DIVK

DIVK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B/C

IDIVK

IDIVK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B//C

MODK

MODK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B%C

POWK

POWK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B^C

AND

AND [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B and C

If B is truthy then return C else return B

OR

OR [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B or C

If B is not truthy then return C else return B

ANDK

ANDK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B and C

If B is truthy then return C else return B

ORK

ORK [A] [B] [C]
A = Register to load.
B = Register with value.
C = Constant.
Essentially: A = B or C

If B is not truthy then return C else return B

CONCAT

CONCAT [A] [B] [C]
A = Register to load.
B = Register with value.
C = Register with value.
Essentially: A = B .. C

NOT

NOT [A] [B]
A = Register to load.
B = Register with value.
Essentially: A = not B
Flips value:
true = false;
false = true.

MINUS

MINUS [A] [B]
A = Register to load.
B = Register with value.
Essentially: A = -B
Flips value:
-1 = 1;
1 = -1.

LENGTH

LENGTH [A] [B]
A = Register to load.
B = Register with value.
Essentially: A = #B
Lengh operator:
{} = 0;
{“Hello”} = 1.

https://github.com/luau-lang/luau/blob/5059095fec64b658ea9f7c5fae61cc770fe0d9af/Common/include/Luau/Bytecode.h#59

Luau Bytecode Compiler is very dumb

View Category

Maybe a hot take at first glance but hear me out
Compiler is focusing at making compact list of instructions rather than optimized.
There is a couple examples:

for i,v in {} do
print(v)
end

1. NEWTABLE R0 0 0
2. LOADNIL R1
3. LOADNIL R2
4. FORGPREP R0 L8
5. GETIMPORT R5 1-- "print"
6. MOVE R6 R4
7. CALL R5 1 0
8. FORGLOOP R0 L5 2
9. RETURN R0 0

The problem:
It uses GETIMPORT which is much slower in this example and could be replaced by MOVE
So how do we force compiler to solve it?
Solution: Localizing function manually will force compiler to place it in the beggining.

local print = print
for i,v in {} do
print(v)
end

1.  GETIMPORT R0 1-- print --Notice how compiler now using GETIMPORT here
2.  NEWTABLE R1 0 0
3.  LOADNIL R2
4.  LOADNIL R3
5.  FORGPREP R1 L9
6.  MOVE R6 R0-- moves(copies) our print to R6
7.  MOVE R7 R5-- moves for loop value to be dragged into a function call
8.  CALL R6 1 0
9.  FORGLOOP R1 L6 2
10. RETURN R0 0

Pros: much better optimization.
Cons: more instructions.

I don’t understand this post. Am i dumb?

No, don’t worry, it’s a very complex topic. If you used to live and program without being aware, this is very normal, and even SMART to be confused at first.
It takes a while for your brain to rewire itself and integrate gained knowledge.
Just try doing testing, look at dissasembled code, and all of a sudden you will realize that you fully understand this topic.

Try disassembling one of your game scripts and see if you can predict the registers before compiling. Post screenshots below if you find something cursed.

Yarik_superpro · September 18, 2025, 5:22pm

This post took me 4+ days to write btw.

GetStyled · September 18, 2025, 6:19pm

Can we get tutorial on how to optimize machine code

Yarik_superpro · September 18, 2025, 6:31pm

Sorry but i do not belive that it is ontopic and the fact that roblox does not provide any information regarding it.
FASTCALL functions is the closest you can get.

saaawdust · September 18, 2025, 6:52pm

Actually a really nice and well written tutorial on bytecode, haven’t seen anything on the forum like this before

respect

Yarik_superpro · September 18, 2025, 7:49pm

Thanks comrade
Remember to always avoid METAMETHODS

Yarik_superpro · September 18, 2025, 10:29pm

If you have found some optimizations, I have made a thread dedicated to that.
So post something random but useful like that:

On another topic

Fun fact:
Would be faster than if check becouse it uses OR/ORK instruction instead of whole if statement setup:
local function a(b)
b = b or ""

end
1. ORK R0 R0 K0-- ""
2. RETURN R0 0

zZArimaaZz · September 18, 2025, 10:38pm

Nice post. For anyone interested, I recommend learning this through reverse engineering programs and writing C code which was how I learned it.

Yarik_superpro · September 18, 2025, 10:45pm

Nvm CAPTURE can be used on DUPCLOSURE aswell.
its just kind of odd and very rare.

Judgy_Oreo · September 19, 2025, 7:52am

16bit signed integers can store in the range -32768 .. 32767, not -29999 .. 29999, because the maximum number of possible combinations for 16 bits is 2^16 AKA 65536, which has to be halved into the positive and negative spaces for signed integers (and an additional combination is used up for 0.)

An 8-bit unsigned value can only store up to the integer 255. Personally, I think it’s better if you explained as it is: -1 is meant to be like an N/A sign, the value is Not Applicable and shouldn’t be taken as literally being -1 values returned.

GETIMPORT assumes the imports are static. It isn’t used for any Instance globals or anything that can change externally (GETIMPORT is not used on globals the Script attempts to modify), because it’ll just re-use the initial value. This is the point of the Compiler’s mutableGlobals list:

github.com/luau-lang/luau

Compiler/include/Luau/Compiler.h

ae59a0e33


      
          int coverageLevel = 0;
          
          // alternative global builtin to construct vectors, in addition to default builtin 'vector.create'
          const char* vectorLib = nullptr;
          const char* vectorCtor = nullptr;
          
          // alternative vector type name for type tables, in addition to default type 'vector'
          const char* vectorType = nullptr;
          
          // null-terminated array of globals that are mutable; disables the import optimization for fields accessed through these
          const char* const* mutableGlobals = nullptr;
          
          // null-terminated array of userdata types that will be included in the type information
          const char* const* userdataTypes = nullptr;
          
          // null-terminated array of globals which act as libraries and have members with known type and/or constant value
          // when an import of one of these libraries is accessed, callbacks below will be called to receive that information
          const char* const* librariesWithKnownMembers = nullptr;
          LibraryMemberTypeCallback libraryMemberTypeCb = nullptr;
          LibraryMemberConstantCallback libraryMemberConstantCb = nullptr;

github.com/luau-lang/luau

Remove Roblox-specific mutable globals

master ← LoganDark:kSpecialGlobals

opened 09:25PM - 08 Nov 21 UTC

LoganDark

+141 -8

`kSpecialGlobals` is used by the compiler to disable the import optimization for… values that are fetched from a mutable source (for example, `_G.xd`). Not only does this currently include Roblox-specific globals that might mean totally different things in other applications, but there's no way for embedding applications to specify if they have their own globals that act this way, which means that the import optimization may misbehave in those cases. In vanilla Luau, only the `_G` global is mutable. Roblox can use the newly-implemented `mutableGlobalNames` compile option to specify its own globals - and crucially, third-party applications can as well.

Fun fact about these:

roalex2008 · September 19, 2025, 10:57am

… so you just grabbed lbytecode.h and made it into a soup of non-serious, and somehow got wrong some points like what the min and max of a signed 16 bit integer is.

It’s not about complexity, it’s about presentation.

Knowing opcodes won’t solve world hunger nor make your game run faster, you only need to know the actual useful and significant opcodes to gain a somewhat noticeable difference. Knowing what LOADK does won’t give you more efficient code, it’s just going to clutter your brain thinking “okokok I gotta make it so this isn’t emitted! 1!1!1!2!” when it’s a perfectly reasonable opcode to emit in most cases, it’s like you’re nudging people into writing constant numerical values under 32767, then making them add a value to it that’s also 16 bit, so on and so fourth to get an actual value. Now that’s taking it to the extreme, but you very could lead people to that conclusion.

I’d recommend you revisit this post, specifically with the opcodes that actually make a difference, and you don’t just uselessly transcribe lbytecode.h into a brainrotten post filled with emojis, and continuously calling “dumb” or “not quite capable” developers who read this and don’t understand it, because quite honestly, trying to get anything away from a post filled to the brim with emojis, all which looks like a joke. Is quite complicated, sum to that the fact that there are opcodes, some with the improper explanations, and it makes me want to pull my hairs of.

I’m someone who wants performance, but if the delivery to spread the information is flawed, I won’t stand by it, really.

Yarik_superpro · September 19, 2025, 11:15am

I used that as an example.
Its more ideomatic to understand how GETIMPORT works in my opinion becouse such logic applies to string.sub and any function/value that has to be indexed through their library.

Yarik_superpro · September 19, 2025, 11:17am

Yeah, yeah, continue on, gatekeeper. Are you mad that now knowledge has become easy for the public to access and you are not special anymore?
Imagine writing 3 paragraphs just to say: “I don’t like emojis.”

Yarik_superpro · September 19, 2025, 11:32am

Edit: Huh, odd, the range is indeed -32768 to 32767.
I remember the compiler not producing this result with LOADN.
Probably because I have been writing this post, as I said, 4 days at day and at night, so I didn’t pay much attention.
@Judgy_Oreo is right about LOADN.

Yarik_superpro · September 19, 2025, 12:06pm

Please provide an actual example…

As I said, this reply doesn’t help learners - it only discourages them and gatekeeps useful knowledge. If you want to contribute, point out the incorrect parts so everyone can benefit.

roalex2008 · September 19, 2025, 12:29pm

This isn’t about that at all; it’s about the fact you are overloading people with information about opcodes that won’t really provide them any benefit for performance as you claim. As I already said, you should focus on key opcode and solve the informational errors you have on your post, I have already written posts regarding opcodes months ago, I’m not gatekeeping

Unlimited_Objects · September 19, 2025, 12:47pm

If all you got from @roalex2008 ‘s reply is “I don’t like emojis” then maybe you need to stop being so close minded and actually read and comprehend what he is telling you

Yarik_superpro · September 19, 2025, 12:56pm

Because his logic makes no sense.
How can you optimize bytecode without knowing it in the first place?
Ok, you know optimized opcodes. What is next? NOTHING! Because you don’t know opcodes that follow it, so his logic makes no sense.

I already mentioned in the post majorly used OPCODEs so you can force the compiler to optimize them properly.
Other unrelated bytecode OPCODEs have been left in a collapsed category for people who want to learn them all.
Regardless, this is more of a tutorial as to how to read/debug bytecode rather than “optimized” opcodes. For that, I have made the https://devforum.roblox.com/t/code-optimizations-sharing/3882214 category.
I also think that Dottik’s opinion is not correct, as since he is not really a target audience for this tutorial in the first place, this tutorial is for people who know nothing about bytecode and want to learn it.

Judgy_Oreo · September 19, 2025, 1:27pm

I thought they used Bytecode.h but this one also seems like a possibility now

roalex2008 · September 19, 2025, 1:33pm

I’m just used to including lbytecode.h when working on my projects, regardless, that doesn’t invalidate anything.

Luau Bytecode EXPLAINED - How to Read, Debug, and Optimize Like a Hacker

Understanding Bytecode - The Hidden Language of Luau

Why Most Optimizations Fail

Stack-Based vs. Register-Based VMs

Optimization levels 0-2

Seeing Bytecode in Action

How to run a disassembler

NOTE: This is not official Luau web dissasembler and there no any guarantee of this website being safe!

If you paste private code into web tools, don’t include API keys or secure code.

Breaking It Down

Why This Matters

Breakdown of what just happened

Both Array Size and Hash Size can co-exist!

Ok what is this buzzword “opcode”?

CALL explained

Explanation:

Why compiler is so pessimistic and always MOVEs values to a straight order?

JUMP Operations and If statements

FASTCALLs

Explanation:

List of supported FASTCALLs (FASTCALL, FASTCALL1, FASTCALL2K, FASTCALL3)

Operators with bytecode instructions instead of function calls (good for optimization)

Closures , Prototypes (blueprints) and functions

Functions:

Closures:

Stays a closure (wont be turned into a regular optimized function):

For loops ipairs, pairs, inext, next

Summary

CAPTURE opcode modifiers:

Do it if you want to regret

Luau Bytecode Compiler is very dumb

I don’t understand this post. Am i dumb?

`CAPTURE` opcode modifiers: