Luau Native Code Generation Preview [Studio Beta]

zeuxcg · August 31, 2023, 4:32pm

Hi Developers,

We are always working to ensure that your scripts can harness all of the computational resources available on the hardware your experience runs on. In 2019, we’ve introduced Faster Lua VM, which is now part of Luau, and we’ve continued to deliver incremental performance improvements to our interpreter as well as many library functions; we regularly summarize this performance work among other improvements in Luau Recaps, and our interpreter is much faster now than it was in 2019.

However, the script interpreter is fundamentally limited in terms of execution performance, compared to natively compiled code of Roblox Engine. This is generally fine if the scripts are doing a limited amount of computation and instead utilize Luau and Roblox native libraries to do the heavy lifting. However, if a certain processor-intensive algorithm isn’t available as an engine API, reimplementing it in Luau is unlikely to run at peak performance.

To solve this problem, since the beginning of the year we’ve been working on native code generation support for Luau. While we have a lot of improvements yet to make, we’re excited to share this early technical preview, available in Studio on Windows and macOS via the beta feature “Luau Native Code”:

Instead of the usual process where scripts are compiled from source to bytecode for the bytecode to be interpreted, we augment the compilation step to take some of the functions in the scripts and compile them further into native code (x86-64 or AArch64 based on whether you’re running Studio on an Intel/AMD or ARM CPU). This eliminates the interpreter overhead and allows us to do deeper optimizations, which makes the code run faster.

Marking scripts as native

By default we do not compile any scripts to native code even if you opt into the beta; in addition to the beta being enabled, you need to put --!native comment on top of your script, similarly to type checking comments that you might be used to. That’s it - you don’t need to make any other changes!

The annotation is currently required because we don’t yet have a good mechanism for automatically determining if compiling a given function is profitable, and compiling every function to native will make Studio start slower because of the sheer amount of Luau code that tends to run in plugins these days. We plan to develop automatic heuristics that will allow us to automatically determine if this is worthwhile in the future, along with per-function annotations to help guide this decision, but for now, you need to manually place --!native in performance-intensive scripts.

Our approach is somewhere in between what is classically referred to as AOT (ahead-of-time) and JIT (just-in-time) compilation. We compile modules before they are executed, but we may choose to compile only some functions in the module, and a natively compiled function is optimized for certain assumptions - breaking these, such as using getfenv, will result in going back to interpreted code.

Importantly, native code generation should not affect the behavior of your scripts - only the time they take to run. If using native code generation results in a crash or a different result compared to what you get without native code - that is a bug, and feel free to report it in this thread!

Also importantly, the comment is simply ignored by production clients, servers, and if you disable the beta feature in Studio. So there should not be any issue with experimenting with this feature in your production games or plugins.

So, should you just add --!native to every single script in your project? Well…

Performance expectations

To make good use of this feature, it is critical to understand what exactly native code generation makes faster (and what it doesn’t). Native code generation compiles the source of your functions to native code, so the script code you write runs faster. However, it does not change the implementation of code that is already provided to your script by Luau libraries (such as table.sort), Roblox Engine (such as assigning .CFrame on the part or manipulating UI instances), or other module scripts that you require if they don’t have a --!native annotation.

Additionally, if your script spends relatively little time doing interesting computations, and instead spends most of the time creating objects and calling methods on them, or if your script is just defining data in enormous Luau tables, you’re also not going to see a significant speedup in practice. However, using --!native means that loading your script into memory is now a tiny bit slower, and it now takes a little more memory. This is not a problem when you are applying --!native selectively, but it means you shouldn’t use this indiscriminately.

The additional memory overhead is displayed in memory profiler under “lua/codegen” and “lua/codegenpages” categories. Note that right now there’s a sizeable fixed overhead that is present when beta feature is enabled even if no native scripts are running; we will be reducing the memory impact of this feature in the future.

Accordingly, we recommend:

Using this feature sparingly! We intend for this to unlock new performance opportunities for complex features and algorithms, e.g. code that spends a lot of time working with numbers and arrays, but not to dramatically change performance on UI code.
Profile your code before and after using this feature! Make sure that when you add --!native to a script you can measure a noticeable improvement in the time it takes to execute. If you don’t see the improvement, don’t add the comment.
Use ScriptProfiler to identify opportunities for optimization and confirm that native code generation is helping! We will soon change the ScriptProfiler UI to show when a given function executes natively vs in an interpreter to help with this.
When you have a compute-intensive script that isn’t getting much faster, please send it to us so that we can take a look! We recommend either posting a script in this thread or, if it is sensitive, posting your general use case in this thread that you aren’t seeing a good speedup from, and a staff member will contact you to discuss it.

We anticipate that functions that involve heavy computation will likely run around 1.5 to 2.5 times faster when executed using native code. We’re actively working on various ways to optimize the performance even further. This means that in certain situations, the speed improvement could be even greater. We’re particularly interested in cases where the speedup isn’t as impressive as we expected, as your examples will guide us in deciding where to focus our efforts for enhancing the native code generation.

We also are starting to utilize type annotations during execution and expect that code that uses correct type annotations will perform better than code that doesn’t when native code generation is used. Notably, incorrect type annotations should never create correctness issues when native code is used but may result in slower execution.

Platform support

To utilize native code generation, you need to be using a recent Studio version (please check that it’s at least 0.592 as we occasionally see updates get stuck). On Windows or Intel Macs we require CPUs that support AVX1 instruction set (Intel Sandy Bridge or AMD Bulldozer and later, notably we do not support AMD Phenom or older Intel Pentium chips; if your CPU was manufactured in 2011 or later you’re probably good). On Apple Silicon hardware we require a native ARM build (we do not support Rosetta), and - importantly! - macOS 13 Ventura or later.

If your CPU or OS is not supported by our native codegen, all the scripts will run in interpreted mode as they usually do. You will get a warning in the output window saying that your system is not supported.

We do not anticipate these software or hardware requirements to change as we work toward the final release. We do expect that initially we will not support native code generation on clients (desktop or mobile) - so the first full release of this feature will be restricted to Studio and server-side scripts. That may change in a future far away, as on the clients we need to balance many complex competing factors that make native codegen much more difficult.

Tooling support

An important caveat is that debugging native modules is not supported; breakpoints placed in native modules will not work, and you won’t be able to step into a native module from another module either. Non-native scripts should work with the debugger as usual.

We do expect all other Studio profiling and inspection tools to work (e.g. you should be able to still inspect values if an error happens in native code; microprofiler and Script Profiler should still work). If something that is not a debugger doesn’t work, don’t hesitate to mention it in the thread!

All Roblox and Luau features should work without changes with native code generation. This includes parallel Luau (Parallel Luau [Version 2 Release]), module scripts (you can require native modules from non-native modules and vice versa), all APIs, and language features.

Importantly, the use of some features will trigger deoptimizations in native code and make it fall back to interpreted execution (with no noticeable behavior change). These include:

getfenv/setfenv (these are soft deprecated in general)
Use of various builtins like math.abs with non-numeric arguments
Passing improperly typed parameters to typed functions (for example, calling foo(true) when foo is declared as function foo(arg: string)

Generally speaking, you should not run into these if your code is type checked; if you are not using type checking or type annotations, native code generation will still work, although in certain cases type annotations may be required to extract maximum performance in native modules.

Upcoming performance work

Currently, native code generation supports all language features and constructs, so except for efficiency concerns, it should be safe to enable it in any script, however, some areas are known to need performance improvement.

In no particular order:

We’re starting to use type annotations to guide native code generation. This is currently limited to function arguments and doesn’t propagate very well into the function body. In the future, we will start using type annotations on local variables as well as types inferred from explicit annotations.
While Vector3 math works as expected from native code, we do not yet support it natively in the native code generator which means that it does not run as fast as it should. Currently, the strongest performance gains are obtained from scalar math.
There is a set of complex optimizations around code that has a lot of conditionals or some complex redundant expressions like table access, that we are currently missing; expect more improvements in idiomatic code in the future.
Function calls are not as fast as we’d like (they are still a little faster compared to the interpreter, but we currently extract more performance wins out of code that calls functions less). Automatic function inlining that we already do automatically along with performance-minded programming is recommended for now, but this is an area that we plan to improve.
In addition to higher-level optimizations, we also have cases where complex microarchitectural tuning of generated code is required to reach peak performance as we sometimes generate code that modern CPUs don’t execute very well, or generate too much code that doesn’t optimally utilize certain CPU units. This will be an area of ongoing improvement as well.

Additionally, as mentioned above, we currently do not compile any functions to native code by default unless you use --!native annotation. In the future, we expect to develop profitability heuristics, both to compile some functions in regular modules to native code with no annotation, as well as to automatically disable native compilation for some functions in native modules when we are very certain that that’s a bad idea. For now, we recommend splitting very long scripts into more manageable modules and making the decision about whether to enable native code generation on individual smaller modules.

When will this ship?

You will notice that this is a “preview”, not a regular beta. This is specifically meant to indicate that while we’ve done a lot of work on this and we’re excited about how the feature is shaping up, we expect to do a lot more work before the feature is fully production-ready. As such we do not have an ETA on when this will be available on production.

We do need your feedback, not just in terms of making sure that everything works as it is supposed to, but also in sharing cases when the performance gains do not align with your expectations based on all the caveats above. This will help us prioritize the upcoming work to make the feature production-ready, as well as gather a solid collection of guidelines in terms of how to best take advantage of native code that we can document for other creators.

We do plan to start using it in some plugins that Studio ships with by default in the coming months, so even if you don’t use the feature directly, keeping it enabled might improve how responsive Studio is in compute-heavy tools like terrain editor!

With that in mind, huge thanks to the team that worked on this (@WheretIB first and foremost, as well as @machinamentum, @rep_movsb, and @zeuxcg), and we’re all very excited to work together with the community to bring this preview closer to a complete feature!

system · August 31, 2023, 4:42pm

This topic was automatically opened after 10 minutes.

metatablecatmaid · August 31, 2023, 4:42pm

(can you give a tldr my brain is blanking on a lot of stuff here, though what I’ve been able to derive is compilation to native assembly?)

This looks interesting, is this related to the codegen project that I’ve been seeing on the Luau respository over the past few months? I wonder how it would with the App Store’s policies though because Apple tends to be not very nice to this kind of stuff.

The other question, even though its not exactly Roblox related, do you still need to include the Luau VM when using natively compiled code in an application?

Elttob · August 31, 2023, 4:42pm

This is nuts! I honestly didn’t even think we’d get here this quickly, especially given all the limitations around machine code on platforms like iOS that I thought would sink the feature.

I’m looking forward to testing this out with Fusion’s internals - it seems like a great candidate for pure computation workloads, especially when traversing deep nested graphs for dispatching updates.

Massive congrats to everyone who worked on this

Fimutsuu · August 31, 2023, 4:43pm

This is great, having code run faster is always welcome and would allow for more complex operations.
Awesome!

alex_unboxed · August 31, 2023, 4:45pm

THIS looks truly astronomical! INSANELY wicked roblox

grilme99 · August 31, 2023, 4:46pm

Hi, I was testing this beta earlier today with a Luau Flexbox algorithm implementation and saw little to no performance benefit. I’ve included the file in this reply, which includes the full source of the implementation and its benchmark suite. The original OSS repo can be found here.

The module resides in ReplicatedStorage.Packages.YogaSys, and the benchmark code is under ServerScriptService.

Flow.rbxl (283.1 KB)

I’m currently working under the assumption that the Flexbox algorithm is too big to compile to native code (it is all implemented in one module for function inlining). Could you confirm this is what’s happening? It seems like the perfect candidate for native codegen, so I’m stumped.

PysephDEV · August 31, 2023, 4:54pm

This is a great update towards code speedups, though I’d like to ask; would this be supported for clients within the next few years, or is this at a longer timeframe? I’ve some great usecases for this (notably; running gerstner wave simulations at runtime), but they’re all clientside which makes this a lot harder to utilize.

HKcat5100 · August 31, 2023, 4:57pm

This is just great period thank you Roblox

ObviouslyGreen · August 31, 2023, 4:58pm

Really cool stuff! Testing this for a mandelbrot implementation I wrote got me more than a 3.2x performance boost!!

Before enabling codegen

After enabling codegen

Also really cool that type annotations finally improve performance, makes it worth more doing.
Little by little getting closer to LuaJIT/C speed

ReturnedTrue · August 31, 2023, 5:01pm

As of late I’ve been working with generating fractals on the engine, with the addition of this and potentially DynamicImages in the future I’m really excited to see the performance gains over the system I have currently.

Great work!

zeuxcg · August 31, 2023, 5:02pm

Thanks for sharing! That’s exactly the feedback we want from the community on this feature.

On first glance this might be falling into the category of “code with lots of calls and other complex work which is less compute dense than native code generation engine currently likes”, but we’ll certainly do a deeper dive and see how future improvements to code generation can help here - this is a fair bit of code so it will take us some time to analyze

bluebxrrybot · August 31, 2023, 5:18pm

Seems great but I’m not really understanding what this is changing. It makes select things faster, great! But what do you mean by ‘native’, and what do you mean by ‘generation’?

C_Corpze · August 31, 2023, 5:26pm

Omg, I literally have no words for this.

Does this mean we can write Luau code about as fast as native C/C++ would be?
Because this would be a huuuuge game changer then.

You could re-implement certain Roblox features and write them however you want.

Don’t like Roblox’ current physics engine with all of it’s quirks?
Write your own!
If native Luau is going to be about as fast as C++ then performance no longer should be an issue.

Want your own pathfinding algorithm that’s more customized and fitted for your own needs?
Just write your own.

Want to do machine learning with neural networks on the CPU?
Normally this would be hella slow in a interpreted language.

BUUUUUT, if native Luau is as fast as C++ would be then this wouldn’t be an issue combined with parallel Luau to distribute the load.

This is amazing.

astraIboy · August 31, 2023, 5:34pm

In short:

Native code generation will take advantage of hardware resources avalible on their system for improved script excution performance, it compiles certain functions in your scripts into native code instead of intepreting, resulting in faster execution times around 1.5 to 2.5 times faster

It wont effect the behavior of your scripts only the execution time

C_Corpze · August 31, 2023, 5:43pm

To put it simple (for those who don’t quite get it).

Luau is a interpreted language.
This generally means that:

Runs on a virtual machine.
Kinda runs in some kind of “sandbox” or on top of an extra “layer”, requiring extra steps.
Can be really really slow at math but starts up fast after changing code.

Native code is closer to machine instructions.
This means that:

Much closer to your hardware.
Communicates more directly with your computer without extra steps, layers or being in some “sandbox”.
Extremely fast but needs to completely recompile / rebuild every time you change or edit code.
Really heckin’ good at MATH and ALGORITHMS and used almost everywhere where speed and performance really matters (like graphics, physics, algorithms, memory and other complex stuff).

This is a HUUUUGE simplification and I honestly can’t be bothered to go into full detail because of how big of an rabbithole it is but this should give an general understanding.

bluebxrrybot · August 31, 2023, 5:56pm

So it’s essentially compiling certain modules like C++ and C# do for all scripts before running. Cool!

XenoDenissboss1 · August 31, 2023, 6:24pm

Hey man this is a overall great addition however I do have a question that isn’t exactly about this feature specifically. Is there any chance we could see some optimizations/features geared towards the rendering side of things? As of right now I believe that getting some rendering improvements/new features (for developers to expand their range of possibilities along with letting them do some more aggressive optimizations) / new optimizations would be a game changer.

NeoInversion · August 31, 2023, 6:27pm

I might be wrong about this, but assuming you’re referring to library access, your own modules get statically linked if they’re also compiled with native codegen.

However, it does not change the implementation of code that is already provided to your script by Luau libraries …, Roblox Engine …, or other module scripts that you require if they don’t have a --!native annotation.

So hypothetically you could run the resulting code barebones if you don’t need any VM/engine libraries.

arxweb · August 31, 2023, 6:36pm

I like this a lot! I’ll implement this in my server sided anticheat and hopefully it will make it much more efficient (: