Short summary on programming optimizations

With this post I will share most of my knowledge regarding major scripting optimizations and common misconceptions regarding optimization. There I won’t talk about algorithms, as it’s a different kind of optimization:

  1. Cache locality
  2. Parallel computation
  3. Allocation overhead
  4. Common misconceptions

Cache locality

Cache locality is a big thing in programming. But sadly on Roblox it is often ignored due to a common misconception that in high-level languages it is impossible to make cache-friendly code. Which I consider to be wrong.

In fact, here is an example of a cache miss and the benefit of the cache locality at the same time:

image

The first 3 are the first-ever calls to the functions in the stack, and as you see, those take a HUGE amount of time. Your goal is to decrease those cache misses as much as possible. In this code the first cache miss was unavoidable, but we compensated for it via reusing those already cached functions in the code. Which resulted in a very high-performance code. It was also assisted by the fact that the instances that were operated on were created at the same time. Which resulted in something similar to an array structure, making the CPU far easier to predict where the code will access next.

You can decrease the number of cache misses by making the memory footprint of your game smaller. This can be achieved in various ways; I won’t describe all of them, but here are the two common ones:

  1. Batching work for a single function that does 1 task on an array of objects
  2. Avoiding unstructured object creation

A common misconception is that you should cache EVERY operation in Lua code, like:

local FindFirstChild = script.FindFirstChild

While this is a good practice if you process in bulk. It is actually a very bad practice when such cached functions are only used once per task. As those cached functions will be kept in the stack, polluting the caches and making actually useful code be flushed. I described it in more detail here.


Parallel computation

Which includes Vectors, CFrames, and Roblox’s parallel model. Here you need to understand that default math operations that are done in Vectors/CFrames are far more optimized. If you don’t understand the scale of just how much. It’s basically like comparing a Ford A to a Bugatti. Even native Luau code is poorly pipelined, let alone ignoring the fact that it does not use any SIMD. Unlike Roblox’s math, which really shines with CFrames, as multiplying a CFrame by a CFrame is as cheap as multiplying a number and a number. The reason is extreme pipelining and SIMD instruction use. These are considered parallel computations.

And as much as possible, you should use parallel computations. But you also need to understand that most of them are expensive to set up. With Roblox’s parallel model, it’s barely worth it. It’s only worth beginning at around 100 raycasts or similar computation required. Because of the cache miss you are going to get after synchronizing with the main thread for a single value computed, it will probably make it far less worth it.

You might be able to cheapen up the synchronization by doing something like

local computer = {}
RunService.Heartbeat:ConnectParallel(function()
   computer[1] = 12
 -- do stuff
end)

RunService.Heartbeat:Connect(function()
   workspace.Name = tostring(computer[1])
   -- apply computation to instances trough the shared table
end)

This avoids direct synchronization calls and instead will make your code purely rely on Roblox’s default synchronization. As parallel computation is run before the main event.


Allocation overhead

This is a common disagreement between programmers; some suggest allocation before the task to save up on general memory. Others suggest pre-allocating and then using the space in the task to avoid the allocation overhead.

Both parties are wrong and right. So basically it’s contextual. But when should we allocate and pre-allocate? If allocation is small, like 8 bytes, then it’s far better to just allocate at runtime, like caching a function before the task. But if it’s something huge measured in 200 bytes. Then pre-allocation is a lot more worth it. While sure it does pollute the RAM, it still avoids the massive overhead that 200-byte allocation will bring.

In the case of Roblox, the answer is very simple. Any table is worth caching. It can be an instance, an object, strings, etc. Built-ins like vectors and numbers aren’t worth caching because they have a small memory footprint as is.


Common misconceptions

Native code is an instant performance booster:

No it’s not, in fact in it’s current state it’s only as good for optimizing a tit bit basic number computations. Lack of simd and still poor pipelining won’t make your code be faster than raw luau. by some big margin.

10 Likes

You clearly just copy pasted someone’s opinion without any validation of its legitimacy.
Very misleading tutorial and you dont know what you are talking about

My honest rating is 5/10

Caching methods saves up on bytecode operation setup, avoids use of namecall entirety, only usecase where namecall is useful if you call method only once EVER.


Also by your logic using tables or buffers are wrong because ITS HEAP AAAA

Goodluck making game with using stack instead, you will absolutely be capable of writing EVERYTHING
200 registers :scream: so much mango mango


Real talk:

You somehow got everything wrong and im really impressed.

Preallocation is always essential just be careful to not make huge cross stack chains of upvalue captures.

You also seem to not understand differance betweeen simulated thread and parallel, the one you shown in this POST is a simulated thread.

Native code is always faster but you need to understand what can be compiled into it and what’s not so you don’t make compiler bothered to attempt to compile something that can’t be.

1 Like

I wonder where exactly


You step on the same exact rakes again.


Never stated anything like that. Also, I need to correct you that the luau stack is also, in fact, allocated on the heap.


From what I managed to understand through your grammar:

You absolutely did not read the post and instantly jumped to conclusions. Most of what you said here was already addressed in the post in a way that is correct relative to the general computer theory. Also, most of what you said here was imagined because I’ve never stated anything like you’ve written.

10 Likes

This is very helpful for people trying to understand the why in optimizations.
Nice post!

1 Like