With this post I will share most of my knowledge regarding major scripting optimizations and common misconceptions regarding optimization. There I won’t talk about algorithms, as it’s a different kind of optimization:
- Cache locality
- Parallel computation
- Allocation overhead
- Common misconceptions
Cache locality
Cache locality is a big thing in programming. But sadly on Roblox it is often ignored due to a common misconception that in high-level languages it is impossible to make cache-friendly code. Which I consider to be wrong.
In fact, here is an example of a cache miss and the benefit of the cache locality at the same time:

The first 3 are the first-ever calls to the functions in the stack, and as you see, those take a HUGE amount of time. Your goal is to decrease those cache misses as much as possible. In this code the first cache miss was unavoidable, but we compensated for it via reusing those already cached functions in the code. Which resulted in a very high-performance code. It was also assisted by the fact that the instances that were operated on were created at the same time. Which resulted in something similar to an array structure, making the CPU far easier to predict where the code will access next.
You can decrease the number of cache misses by making the memory footprint of your game smaller. This can be achieved in various ways; I won’t describe all of them, but here are the two common ones:
- Batching work for a single function that does 1 task on an array of objects
- Avoiding unstructured object creation
A common misconception is that you should cache EVERY operation in Lua code, like:
local FindFirstChild = script.FindFirstChild
While this is a good practice if you process in bulk. It is actually a very bad practice when such cached functions are only used once per task. As those cached functions will be kept in the stack, polluting the caches and making actually useful code be flushed. I described it in more detail here.
Parallel computation
Which includes Vectors, CFrames, and Roblox’s parallel model. Here you need to understand that default math operations that are done in Vectors/CFrames are far more optimized. If you don’t understand the scale of just how much. It’s basically like comparing a Ford A to a Bugatti. Even native Luau code is poorly pipelined, let alone ignoring the fact that it does not use any SIMD. Unlike Roblox’s math, which really shines with CFrames, as multiplying a CFrame by a CFrame is as cheap as multiplying a number and a number. The reason is extreme pipelining and SIMD instruction use. These are considered parallel computations.
And as much as possible, you should use parallel computations. But you also need to understand that most of them are expensive to set up. With Roblox’s parallel model, it’s barely worth it. It’s only worth beginning at around 100 raycasts or similar computation required. Because of the cache miss you are going to get after synchronizing with the main thread for a single value computed, it will probably make it far less worth it.
You might be able to cheapen up the synchronization by doing something like
local computer = {}
RunService.Heartbeat:ConnectParallel(function()
computer[1] = 12
-- do stuff
end)
RunService.Heartbeat:Connect(function()
workspace.Name = tostring(computer[1])
-- apply computation to instances trough the shared table
end)
This avoids direct synchronization calls and instead will make your code purely rely on Roblox’s default synchronization. As parallel computation is run before the main event.
Allocation overhead
This is a common disagreement between programmers; some suggest allocation before the task to save up on general memory. Others suggest pre-allocating and then using the space in the task to avoid the allocation overhead.
Both parties are wrong and right. So basically it’s contextual. But when should we allocate and pre-allocate? If allocation is small, like 8 bytes, then it’s far better to just allocate at runtime, like caching a function before the task. But if it’s something huge measured in 200 bytes. Then pre-allocation is a lot more worth it. While sure it does pollute the RAM, it still avoids the massive overhead that 200-byte allocation will bring.
In the case of Roblox, the answer is very simple. Any table is worth caching. It can be an instance, an object, strings, etc. Built-ins like vectors and numbers aren’t worth caching because they have a small memory footprint as is.
Common misconceptions
Native code is an instant performance booster:
No it’s not, in fact in it’s current state it’s only as good for optimizing a tit bit basic number computations. Lack of simd and still poor pipelining won’t make your code be faster than raw luau. by some big margin.
