Hi Developers,
We are always working to ensure that your scripts can harness all of the computational resources available on the hardware your experience runs on. In 2019, we’ve introduced Faster Lua VM, which is now part of Luau, and we’ve continued to deliver incremental performance improvements to our interpreter as well as many library functions; we regularly summarize this performance work among other improvements in Luau Recaps, and our interpreter is much faster now than it was in 2019.
However, the script interpreter is fundamentally limited in terms of execution performance, compared to natively compiled code of Roblox Engine. This is generally fine if the scripts are doing a limited amount of computation and instead utilize Luau and Roblox native libraries to do the heavy lifting. However, if a certain processor-intensive algorithm isn’t available as an engine API, reimplementing it in Luau is unlikely to run at peak performance.
To solve this problem, since the beginning of the year we’ve been working on native code generation support for Luau. While we have a lot of improvements yet to make, we’re excited to share this early technical preview, available in Studio on Windows and macOS via the beta feature “Luau Native Code”:
Instead of the usual process where scripts are compiled from source to bytecode for the bytecode to be interpreted, we augment the compilation step to take some of the functions in the scripts and compile them further into native code (x86-64 or AArch64 based on whether you’re running Studio on an Intel/AMD or ARM CPU). This eliminates the interpreter overhead and allows us to do deeper optimizations, which makes the code run faster.
Marking scripts as native
By default we do not compile any scripts to native code even if you opt into the beta; in addition to the beta being enabled, you need to put --!native
comment on top of your script, similarly to type checking comments that you might be used to. That’s it - you don’t need to make any other changes!
The annotation is currently required because we don’t yet have a good mechanism for automatically determining if compiling a given function is profitable, and compiling every function to native will make Studio start slower because of the sheer amount of Luau code that tends to run in plugins these days. We plan to develop automatic heuristics that will allow us to automatically determine if this is worthwhile in the future, along with per-function annotations to help guide this decision, but for now, you need to manually place --!native
in performance-intensive scripts.
Our approach is somewhere in between what is classically referred to as AOT (ahead-of-time) and JIT (just-in-time) compilation. We compile modules before they are executed, but we may choose to compile only some functions in the module, and a natively compiled function is optimized for certain assumptions - breaking these, such as using
getfenv
, will result in going back to interpreted code.
Importantly, native code generation should not affect the behavior of your scripts - only the time they take to run. If using native code generation results in a crash or a different result compared to what you get without native code - that is a bug, and feel free to report it in this thread!
Also importantly, the comment is simply ignored by production clients, servers, and if you disable the beta feature in Studio. So there should not be any issue with experimenting with this feature in your production games or plugins.
So, should you just add --!native
to every single script in your project? Well…
Performance expectations
To make good use of this feature, it is critical to understand what exactly native code generation makes faster (and what it doesn’t). Native code generation compiles the source of your functions to native code, so the script code you write runs faster. However, it does not change the implementation of code that is already provided to your script by Luau libraries (such as table.sort
), Roblox Engine (such as assigning .CFrame
on the part or manipulating UI instances), or other module scripts that you require if they don’t have a --!native
annotation.
Additionally, if your script spends relatively little time doing interesting computations, and instead spends most of the time creating objects and calling methods on them, or if your script is just defining data in enormous Luau tables, you’re also not going to see a significant speedup in practice. However, using --!native
means that loading your script into memory is now a tiny bit slower, and it now takes a little more memory. This is not a problem when you are applying --!native
selectively, but it means you shouldn’t use this indiscriminately.
The additional memory overhead is displayed in memory profiler under “lua/codegen” and “lua/codegenpages” categories. Note that right now there’s a sizeable fixed overhead that is present when beta feature is enabled even if no native scripts are running; we will be reducing the memory impact of this feature in the future.
Accordingly, we recommend:
-
Using this feature sparingly! We intend for this to unlock new performance opportunities for complex features and algorithms, e.g. code that spends a lot of time working with numbers and arrays, but not to dramatically change performance on UI code.
-
Profile your code before and after using this feature! Make sure that when you add
--!native
to a script you can measure a noticeable improvement in the time it takes to execute. If you don’t see the improvement, don’t add the comment. -
Use ScriptProfiler to identify opportunities for optimization and confirm that native code generation is helping! We will soon change the ScriptProfiler UI to show when a given function executes natively vs in an interpreter to help with this.
-
When you have a compute-intensive script that isn’t getting much faster, please send it to us so that we can take a look! We recommend either posting a script in this thread or, if it is sensitive, posting your general use case in this thread that you aren’t seeing a good speedup from, and a staff member will contact you to discuss it.
We anticipate that functions that involve heavy computation will likely run around 1.5 to 2.5 times faster when executed using native code. We’re actively working on various ways to optimize the performance even further. This means that in certain situations, the speed improvement could be even greater. We’re particularly interested in cases where the speedup isn’t as impressive as we expected, as your examples will guide us in deciding where to focus our efforts for enhancing the native code generation.
We also are starting to utilize type annotations during execution and expect that code that uses correct type annotations will perform better than code that doesn’t when native code generation is used. Notably, incorrect type annotations should never create correctness issues when native code is used but may result in slower execution.
Platform support
To utilize native code generation, you need to be using a recent Studio version (please check that it’s at least 0.592 as we occasionally see updates get stuck). On Windows or Intel Macs we require CPUs that support AVX1 instruction set (Intel Sandy Bridge or AMD Bulldozer and later, notably we do not support AMD Phenom or older Intel Pentium chips; if your CPU was manufactured in 2011 or later you’re probably good). On Apple Silicon hardware we require a native ARM build (we do not support Rosetta), and - importantly! - macOS 13 Ventura or later.
If your CPU or OS is not supported by our native codegen, all the scripts will run in interpreted mode as they usually do. You will get a warning in the output window saying that your system is not supported.
We do not anticipate these software or hardware requirements to change as we work toward the final release. We do expect that initially we will not support native code generation on clients (desktop or mobile) - so the first full release of this feature will be restricted to Studio and server-side scripts. That may change in a future far away, as on the clients we need to balance many complex competing factors that make native codegen much more difficult.
Tooling support
An important caveat is that debugging native modules is not supported; breakpoints placed in native modules will not work, and you won’t be able to step into a native module from another module either. Non-native scripts should work with the debugger as usual.
We do expect all other Studio profiling and inspection tools to work (e.g. you should be able to still inspect values if an error happens in native code; microprofiler and Script Profiler should still work). If something that is not a debugger doesn’t work, don’t hesitate to mention it in the thread!
All Roblox and Luau features should work without changes with native code generation. This includes parallel Luau (Parallel Luau [Version 2 Release]), module scripts (you can require native modules from non-native modules and vice versa), all APIs, and language features.
Importantly, the use of some features will trigger deoptimizations in native code and make it fall back to interpreted execution (with no noticeable behavior change). These include:
-
getfenv
/setfenv
(these are soft deprecated in general) -
Use of various builtins like
math.abs
with non-numeric arguments -
Passing improperly typed parameters to typed functions (for example, calling
foo(true)
when foo is declared asfunction foo(arg: string)
Generally speaking, you should not run into these if your code is type checked; if you are not using type checking or type annotations, native code generation will still work, although in certain cases type annotations may be required to extract maximum performance in native modules.
Upcoming performance work
Currently, native code generation supports all language features and constructs, so except for efficiency concerns, it should be safe to enable it in any script, however, some areas are known to need performance improvement.
In no particular order:
-
We’re starting to use type annotations to guide native code generation. This is currently limited to function arguments and doesn’t propagate very well into the function body. In the future, we will start using type annotations on local variables as well as types inferred from explicit annotations.
-
While Vector3 math works as expected from native code, we do not yet support it natively in the native code generator which means that it does not run as fast as it should. Currently, the strongest performance gains are obtained from scalar math.
-
There is a set of complex optimizations around code that has a lot of conditionals or some complex redundant expressions like table access, that we are currently missing; expect more improvements in idiomatic code in the future.
-
Function calls are not as fast as we’d like (they are still a little faster compared to the interpreter, but we currently extract more performance wins out of code that calls functions less). Automatic function inlining that we already do automatically along with performance-minded programming is recommended for now, but this is an area that we plan to improve.
-
In addition to higher-level optimizations, we also have cases where complex microarchitectural tuning of generated code is required to reach peak performance as we sometimes generate code that modern CPUs don’t execute very well, or generate too much code that doesn’t optimally utilize certain CPU units. This will be an area of ongoing improvement as well.
Additionally, as mentioned above, we currently do not compile any functions to native code by default unless you use --!native
annotation. In the future, we expect to develop profitability heuristics, both to compile some functions in regular modules to native code with no annotation, as well as to automatically disable native compilation for some functions in native modules when we are very certain that that’s a bad idea. For now, we recommend splitting very long scripts into more manageable modules and making the decision about whether to enable native code generation on individual smaller modules.
When will this ship?
You will notice that this is a “preview”, not a regular beta. This is specifically meant to indicate that while we’ve done a lot of work on this and we’re excited about how the feature is shaping up, we expect to do a lot more work before the feature is fully production-ready. As such we do not have an ETA on when this will be available on production.
We do need your feedback, not just in terms of making sure that everything works as it is supposed to, but also in sharing cases when the performance gains do not align with your expectations based on all the caveats above. This will help us prioritize the upcoming work to make the feature production-ready, as well as gather a solid collection of guidelines in terms of how to best take advantage of native code that we can document for other creators.
We do plan to start using it in some plugins that Studio ships with by default in the coming months, so even if you don’t use the feature directly, keeping it enabled might improve how responsive Studio is in compute-heavy tools like terrain editor!
With that in mind, huge thanks to the team that worked on this (@WheretIB first and foremost, as well as @machinamentum, @rep_movsb, and @zeuxcg), and we’re all very excited to work together with the community to bring this preview closer to a complete feature!