INTRODUCTION
Hello guys, did you know that in Luau’s --!native mode, your type annotations aren’t just for linting, they also dictate how the JIT compiler generates machine code?
If you “lie” to the compiler with incorrect type hints, your code can become significantly slower than if you had used no hints at all.
THE EXPERIMENT
In the script below, we compare two functions with identical logic but different type hints:
bad_func: Lies about its parameter being abuffer(it isn’t).normal_func: Correctly annotates the type of its parameter to benumber.
--!native
--!optimize 2
--!strict
local ITERATIONS = 100000000
local rand_num = Random.new():NextInteger(1, 10000)
local _1 = 1
local _2 = 1
-- Incorrect annotation: logic expects a number, hint says buffer
local function bad_func(num: buffer)
for i = 1, ITERATIONS do
_1 *= _1 + num
end
end
-- Correct annotation
local function normal_func(num: number)
for i = 1, ITERATIONS do
_2 *= _2 + num
end
end
-- Timing results
local start = os.clock()
bad_func(rand_num)
local elapsed = os.clock() - start
print("BAD:", math.round(elapsed * 1000), "ms")
local start = os.clock()
normal_func(rand_num)
local elapsed = os.clock() - start
print("NORMAL:", math.round(elapsed * 1000), "ms")
Typical Results
When running under --!native, the “bad” hint causes a massive performance hit:
BAD: 603 ms
NORMAL: 423 ms
WHY DOES THIS HAPPEN?
When you use --!native, the Luau JIT uses your type annotations to specialize the generated machine code.
Below is a quote from Luau.org documenting this behavior. The text was first added to the website on Feb 19, 2025.
“… Luau JIT takes into account the type annotations present in the source code to specialize code paths …”
Explanations
When you tell the JIT a variable is a number, it generates optimized CPU instructions for floating-point math. However, it also generates a “guard.”
If you provide a buffer hint but pass a number, the following happens:
- The JIT creates a “specialized code path” optimized for a
buffer. - During execution, the “guard” checks the input and sees it is actually a
number. - The machine code path is rejected, and the code falls back to being interpreted.
Pseudocode of the JIT logic
local function normal_func(num: number)
if typeof(num) == "number" then
-- NATIVE PATH: Executing raw machine code
fast_native_path(num)
else
-- FALLBACK: Bytecode interpretation
-- This happens when the hint was wrong or missing
for i = 1, ITERATIONS do
_2 *= _2 + num
end
end
end
The Proof
If you remove the --!native flag, the performance difference disappears. Both functions run at roughly 600ms because they are both running in the interpreter. The “bad” hints only hurt you when you are trying to go faster with --!native mode.
BAD: 605 ms
NORMAL: 603 ms
DEEP DIVE
In more complex benchmarks, things can get even crazier! (See below)
The “magic” of typeof and any
We discovered that using any or typeof() can sometimes be faster than manual type definitions for no apparent reason:
| Type Annotaion Style | Average Execution Time | Performance |
|---|---|---|
Inferred using any or typeof |
411.3 ns | Fastest |
| Standard Type Definitions | 466.8 ns | 1.13x slower |
| Incorrectly Typed | 1.723 µs | 4.18x slower |
Benchmark Code
-- Must enable native code generation to see the performance
-- impact of type hints!
--
--!native
--!optimize 2
--!strict
-- The use of types inferred using "typeof" can magically
-- boost performance by 1.1x in this benchmark!
--
type StandardTable = { _n: number, _p: number, _r0: number, _pr: number, _odds: number }
type MagicTable = typeof({ _n = 0, _p = 0, _r0 = 0, _pr = 0, _odds = 0 })
type StandardArray = { number }
type MagicArray = typeof({ 0, 0, 0, 0, 0 })
type HalfMagicArray = MagicArray | StandardArray
-- Each workload function has the same core logic, the only
-- main difference is the type hint used by each function.
--
-- Trust me, I didn't put any additional code into the functions
-- to affect their performance. (other than table unpacking code)
--
local function args_workload(rd: Random, n: number, p: number, r0: number, pr: number, odds: number): number
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
-- Every type hint of this function is "any"
local function args_workload_with_any_type_hints(rd: any, n: any, p: any, r0: any, pr: any, odds: any): any
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
-- Every type hint of this function is intentionally incorrect
local function args_workload_with_awful_type_hints(rd: number, n: buffer, p: Random, r0: any, pr: {}, odds: boolean): Vector3
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
local function standard_table_workload(rd: Random, tbl: StandardTable): number
local n, p, r0, pr, odds = tbl._n, tbl._p, tbl._r0, tbl._pr, tbl._odds
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
local function magic_table_workload(rd: Random, tbl: MagicTable): number
local n, p, r0, pr, odds = tbl._n, tbl._p, tbl._r0, tbl._pr, tbl._odds
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
local function standard_array_workload(rd: Random, tbl: StandardArray): number
local n, p, r0, pr, odds = tbl[1], tbl[2], tbl[3], tbl[4], tbl[5]
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
local function magic_array_workload(rd: Random, tbl: MagicArray): number
local n, p, r0, pr, odds = tbl[1], tbl[2], tbl[3], tbl[4], tbl[5]
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
local function half_magic_array_workload(rd: Random, tbl: HalfMagicArray): number
local n, p, r0, pr, odds = tbl[1], tbl[2], tbl[3], tbl[4], tbl[5]
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
-- Testing code (nothing magical beyond this point!)
--
local function format_duration(seconds: number): string
return if seconds >= 1.0 then string.format("%.3f s" , seconds )
elseif seconds >= 0.001 then string.format("%.3f ms", seconds * 1000 )
elseif seconds >= 0.000001 then string.format("%.3f µs", seconds * 1000000 )
else string.format("%.3f ns", seconds * 1000000000)
end
local function format_throughput(ops: number): string
return if ops >= 1000000000 then string.format("%.3f G/s", ops / 1000000000)
elseif ops >= 1000000 then string.format("%.3f M/s", ops / 1000000 )
elseif ops >= 1000 then string.format("%.3f K/s", ops / 1000 )
else string.format("%.3f /s" , ops )
end
local function dataframe_to_string(df : {{[any]: any}},
cols : {any},
col_fmt : {[any]: (string | (any) -> string)?}): string
local formatted_data = table.create(#df)
local n_cols = #cols
local widths = table.create(n_cols, 0)
for _, row in ipairs(df) do
local formatted_row = table.create(n_cols)
for i, key in ipairs(cols) do
local value = row[key]
local fmt = col_fmt[key]
local text
if fmt then
if type(fmt) == "string" then
text = string.format(fmt, value)
else
text = fmt(value)
end
else
text = tostring(value)
end
formatted_row[i] = text
widths[i] = math.max(widths[i], utf8.len(text))
end
table.insert(formatted_data, formatted_row)
end
local col_names = table.create(n_cols)
for i, key in ipairs(cols) do
local name = tostring(key)
col_names[i] = name
widths[i] = math.max(widths[i], utf8.len(name))
end
local function pad_strings(string_arr: {string})
for i, str in ipairs(string_arr) do
string_arr[i] = " " .. str .. string.rep(" ", widths[i] - utf8.len(str)) .. " "
end
end
pad_strings(col_names)
local top = "┌"
local hdr = "│" .. table.concat(col_names, "│") .. "│"
local mid = "├"
local btm = "└"
for i = 1, n_cols do
local w = #col_names[i]
top ..= string.rep("─", w) .. (if i ~= n_cols then "┬" else "┐")
mid ..= string.rep("─", w) .. (if i ~= n_cols then "┼" else "┤")
btm ..= string.rep("─", w) .. (if i ~= n_cols then "┴" else "┘")
end
for i, row in ipairs(formatted_data) do
pad_strings(row)
formatted_data[i] = "│" .. table.concat(row, "│") .. "│"
end
formatted_data[#formatted_data + 1] = btm
formatted_data[ 0] = mid
formatted_data[-1] = hdr
formatted_data[-2] = top
return table.concat(formatted_data, "\n", -2)
end
local function run_benchmark(n_rounds: number,
n_calls: number,
benchmark: {[any]: any}, ...: any)
local results = {}
for key, functor in benchmark do
local name = (
if type(key ) == "string" then key
elseif type(functor) == "function" then "[" .. tostring(key) .. "] " .. debug.info(functor, "n")
else "[" .. tostring(key) .. "]"
)
table.insert(results, {
["Functor"] = functor,
["Name"] = name,
["Time Records"] = table.create(n_rounds),
["Total Time"] = 0.0,
})
end
for r = 1, n_rounds do
for i, result in results do
local functor = result["Functor"]
local st = os.clock()
for _ = 1, n_calls do
functor(...)
end
local ed = os.clock()
local elapsed_time = ed - st
result["Time Records"][r] = elapsed_time
result["Total Time"] += elapsed_time
end
end
for i, result in results do
local total_time = result["Total Time"]
local time_records = result["Time Records"]
table.sort(time_records)
result["Best Time"] = time_records[1] / n_calls
result["Worst Time"] = time_records[#time_records] / n_calls
result["Avg. Time"] = total_time / (n_rounds * n_calls)
result["Throughput"] = (n_rounds * n_calls) / total_time
result["Rank"] = #results
local sum = 0
for _, t in time_records do
sum += t
end
local mean = sum / #time_records
local var = 0
for _, t in time_records do
var += (t - mean) ^ 2
end
var /= #time_records
local stddev = math.sqrt(var)
local cv = stddev / mean
result["%RSD"] = cv * 100
end
for i, lhs in results do
for j, rhs in results do
if i ~= j then
if lhs["Best Time"] <= rhs["Worst Time"] then
-- Since we don't have a statistical library, we use this simple heuristic,
-- which shows the best case ranking of a function.
-- If multiple functions have the same rank, then there probably isn't enough
-- evidence that one of them is more performant than the other
lhs["Rank"] -= 1
end
end
end
end
table.sort(results, function(a, b)
return a["Total Time"] < b["Total Time"]
end)
local min_total_time = results[1]["Total Time"]
for i, result in results do
result["Factor"] = result["Total Time"] / min_total_time
end
local cols = {"Name", "Avg. Time", "%RSD", "Throughput", "Factor", "Rank", "Best Time", "Worst Time"}
print("\n" .. dataframe_to_string(results, cols, {
["Avg. Time" ] = format_duration,
["Best Time" ] = format_duration,
["Worst Time"] = format_duration,
["Total Time"] = format_duration,
["Throughput"] = format_throughput,
["Factor" ] = "%.3fx",
["%RSD" ] = "%.2f",
}))
end
-- Test cases:
--
local tbl_data = { _n = 10000, _p = 0.3, _r0 = 3000, _pr = 0.008705361364930851, _odds = 0.4285714285714286 }
local arr_data = { 10000, 0.3, 3000, 0.008705361364930851, 0.4285714285714286 }
local function sample_using_standard_args(rd: Random)
return args_workload(rd, 10000, 0.3, 3000, 0.008705, 0.42857)
end
local function sample_using_args_tagged_as_any(rd: Random)
return args_workload_with_any_type_hints(rd, 10000, 0.3, 3000, 0.008705, 0.42857)
end
local function sample_using_awful_args(rd: Random)
return args_workload_with_awful_type_hints(rd, 10000, 0.3, 3000, 0.008705, 0.42857)
end
local function sample_using_standard_table(rd: Random)
return standard_table_workload(rd, tbl_data)
end
local function sample_using_magic_table(rd: Random)
return magic_table_workload(rd, tbl_data)
end
local function sample_using_standard_array(rd: Random)
return standard_array_workload(rd, arr_data)
end
local function sample_using_magic_array(rd: Random)
return magic_array_workload(rd, arr_data)
end
local function sample_using_half_magic_array(rd: Random)
return half_magic_array_workload(rd, arr_data)
end
local rd = Random.new()
run_benchmark(20, 100000, {
using_standard_args = sample_using_standard_args,
using_args_tagged_as_any = sample_using_args_tagged_as_any,
using_awfully_typed_args = sample_using_awful_args,
using_standard_array = sample_using_standard_array,
using_standard_table = sample_using_standard_table,
using_magic_array = sample_using_magic_array,
using_magic_table = sample_using_magic_table,
using_half_magic_array = sample_using_half_magic_array,
}, rd)
Detailed Benchmark Results
┌──────────────────────────┬────────────┬──────┬─────────────┬────────┬──────┬────────────┬────────────┐
│ Name │ Avg. Time │ %RSD │ Throughput │ Factor │ Rank │ Best Time │ Worst Time │
├──────────────────────────┼────────────┼──────┼─────────────┼────────┼──────┼────────────┼────────────┤
│ using_args_tagged_as_any │ 411.397 ns │ 1.00 │ 2.431 M/s │ 1.000x │ 1 │ 404.314 ns │ 421.085 ns │
│ using_magic_array │ 411.963 ns │ 0.90 │ 2.427 M/s │ 1.001x │ 1 │ 406.846 ns │ 421.481 ns │
│ using_half_magic_array │ 413.424 ns │ 1.01 │ 2.419 M/s │ 1.005x │ 1 │ 406.927 ns │ 420.875 ns │
│ using_magic_table │ 415.501 ns │ 0.93 │ 2.407 M/s │ 1.010x │ 1 │ 409.878 ns │ 423.415 ns │
│ using_standard_args │ 466.897 ns │ 1.07 │ 2.142 M/s │ 1.135x │ 5 │ 461.447 ns │ 480.484 ns │
│ using_standard_array │ 468.936 ns │ 0.97 │ 2.132 M/s │ 1.140x │ 5 │ 464.366 ns │ 481.593 ns │
│ using_standard_table │ 470.348 ns │ 1.02 │ 2.126 M/s │ 1.143x │ 5 │ 466.596 ns │ 482.509 ns │
│ using_awfully_typed_args │ 1.723 µs │ 1.01 │ 580.439 K/s │ 4.188x │ 8 │ 1.693 µs │ 1.759 µs │
└──────────────────────────┴────────────┴──────┴─────────────┴────────┴──────┴────────────┴────────────┘
The Cursed Realm
By changing just one parameter from number to any, we can magically shift the performance of the following function:
Function Code
local function func(rd: Random, n: number, p: number, r0: number, pr: number, odds: number): number
if n == 0 or p == 0 then return 0 end
if p == 1 then return n end
local u, pu, pd, ru, rd_ = rd:NextNumber() - pr, pr, pr, r0, r0
if u < 0 then return r0 end
while true do
local active = false
if rd_ >= 1 then
pd *= rd_ / (odds * (n - rd_ + 1))
u -= pd
active = true
if u < 0 then return rd_ - 1 end
end
if rd_ ~= 0 then rd_ -= 1 end
ru += 1
if ru <= n then
pu *= (n - ru + 1) * odds / ru
u -= pu
active = true
if u < 0 then return ru end
end
if not active then return 0 end
end
end
Function Signature Comparisons
Original Function Signature
-- Every parameter of this function is correctly annotated as "number".
-- However, the compiler does not like it very much :(
local function no_magic(rd: Random, n: number, p: number, r0: number, pr: number, odds: number): number
Optimized Function Signature
-- The last parameter is now annotated as "any" instead of "number"
-- The compiler is LOVING this change! 😍🥰
local function white_magic(rd: Random, n: number, p: number, r0: number, pr: number, odds: any): number
De-optimized Function Signature
-- Every parameter except the fourth is annotated as "number".
-- The compiler absolutely HATES this parameter list and wants to destroy it!!
local function black_magic(rd: Random, n: number, p: number, r0: number, pr: any, odds: number): number
Performance Comparisons
The following results are obtained from a difference PC than the one we used above:
┌─────────────────────────┬────────────┬──────┬────────────┬────────┬──────┬────────────┬────────────┐
│ Name │ Avg. Time │ %RSD │ Throughput │ Factor │ Rank │ Best Time │ Worst Time │
├─────────────────────────┼────────────┼──────┼────────────┼────────┼──────┼────────────┼────────────┤
│ Magically Optimized! │ 661.954 ns │ 0.67 │ 1.511 M/s │ 1.000x │ 1 │ 656.346 ns │ 674.783 ns │
│ No Magic :( │ 721.910 ns │ 0.52 │ 1.385 M/s │ 1.091x │ 2 │ 714.817 ns │ 728.863 ns │
│ Magically De-Optimized! │ 793.683 ns │ 0.51 │ 1.260 M/s │ 1.199x │ 3 │ 787.313 ns │ 804.044 ns │
└─────────────────────────┴────────────┴──────┴────────────┴────────┴──────┴────────────┴────────────┘
Remarks
I have no explanations for this. It is likely a compiler bug that will eventually get patched.
CONCLUSIONS
Don’t Guess Types
In --!native mode, a wrong hint is worse than no hint. If you are unsure of a type, use any.
Don’t Lie to Silence the Linter
Don’t use a fake type just to get rid of the warnings. It can degrade the runtime speed!
Check Types During Refactors
If your code mysteriously slows down after a refactor, check your type definitions. They may have accidentally triggered some de-optimizations.
Performance Hacks
For extreme compute-heavy tasks, try experimenting with different type hints. The compiler seems to prefer table shapes and types inferred using typeof() or any over manual ones in some scenarios.
Please don’t get too paranoid about this micro-optimization though.