Luau Native Code Generation Preview Update

Would we ever be able to do more than 3-way SIMD?

2 Likes

So if I have single script architecture and hundreds of modules, how am I supposed to know what modules to enable --!native on? Script Profiler doesn’t report for modules.

4 Likes

Oh that’s neat, thanks!

I do wonder though, when does this codegen take place?
Is this at the start of a game? Whenever a script is activated/enabled?

When does it happen for modules? Do I have to require() them first before the codegen happens?
And if so, could it cause a potential hitch or freeze of a few milliseconds if a HUGE module were to be suddenly codegen’d on the spot?

I have so many questions about how it happens.
Is it like JIt’ed or does it pre-compile on Roblox servers and it just sends the already-compiled code over to the Roblox client?

Also heard native somehow uses slightly more memory than byte code? I wonder what the cause for that is.
I used to think that natively compiled programs were smaller because they don’t have to include an whole VM or runtime library.

3 Likes

Yes, that’s one of the reasons why you have to explicitly request codegen with the annotation right now: Doing the codegen adds additional upfront costs at load time so you should only do it on modules where you measure it providing enough performance benefits.

5 Likes

So both local scripts and server scripts will be compiled natively into actual production games now? I have a script that could greatly benefit from this; just making sure it’s now extended for game use now.

2 Likes

Will this - combined with parallel lua - improve the performance the terrain tools in studio given that they’re written in Lua?

3 Likes

i believe it’s only for server scripts as the roblox client does not have native code gen yet.

1 Like

I sees, thanks for the informative response!
Now I forgot to ask one more thing.

What’s the current state of codegen when metatables are heavily utilized?

If I (for some reason) wanted to achieve MAXIMUM speed and performance and optimal memory management.
Should I still use metatables for a object-oriented-style programming or should I completely avoid metatables and instead go with 100% functional programming?

Does native codegen have any specific practices involving tables, dictionaries, arrays, etc to get optimal performance?

Besides just making basic math operations faster, I’m really curious where codegen REALLY shines bright.
I’ve been teaching myself to use more typed variables and type checking in my code to sort of “future proof” it in case codegen (or the interpreter itself) gets more optimized for typed variables.

I’ve began coding more or less similar to how I’ve used languages like C++ and C# where everything is usually statically typed and where you can make classes and structs to contain data and functions.

2 Likes

The feature is not available on the clients yet, so we will not compile LocalScript natively.
We will post updates if anything changes in this area in the future.

The feature can be used right now on the servers and in Studio plugins.

We have seen improvements in terrain tools in testing and are planning to use it there in the future.

We support metatables in obj:func calls.
I wouldn’t expect to see much improvement with __index/__newindex, but implementation of that is already pretty good.

I would say especially good improvements are seen with math, bit32 and buffer libraries, and plain tables with no metatables.
We are experimenting with some exciting stuff around Vector2/Vector3/CFrame/Color3, but we need more time to finish that work.

2 Likes

Do you have a module that performs a lot of computation? Start there, try benchmarking it before and after putting --!native and see if it improves performance.


1 Like

This is great, Hope to this this used alot and it is surprising that Luau can work on a low level.

This is great, is there any way we can use this for our own data, Like Buffers or Arrays?

Great to see that parallel luau now has an implementation. Now I gotta learn parallel luau so my code can go light speed.

All of these are great. will there be plans for ways that we can go to an even lower level like

--@inline function
function add(a:number,b:number)
    return a+b
end
1 Like

By chance are there other plans to add more attributes to functions and variables.

1 Like

Will this affect CFrame lerping or using :PivotTo()?

2 Likes

When this will come to the clients then performance will significantly increase for tasks like actually unlock the possibilities for custom things like those hacky volumetric lighting solutions to be a lot more better in terms of performance, unless the biggest overhead to these are the billboards and such…

Another thing is this that some games use to move the leaves and trees with another very hacky super inefficient method that’s extremely heavy in performance as Roblox doesn’t currently have a way to “move” them realistically and performant. With native code generation such thing would perform a lot worse than what it is now. Of course nothing will come close to a native implementation.

1 Like

With that in mind, is it possible for you guys to make :Raycast any faster at all? are there still possible optimisations you can do?

We made it significantly faster in the last couple years (I believe almost 10x). It could always be faster still but a lot of the low hanging fruit has been picked at this point.

There are additional APIs we could consider in the future such as a piercing raycast returning all the hit parts along a path though, which would offer more performance for some tasks.

3 Likes

Just out of curiosity, is rayasting dealt on the CPU, or GPU. I’ve been doing a lot of ray tracing via EditableImage and the bottle neck has always been the raycast call. even for 1 ray per pixel


That would be really beneficial for many scenarios. Would love to see this become a reality!

It is all CPU. The GPU could do it faster, but with much less flexibility: To do it on the GPU it would have to be structured something like you submitting a batch of casts to do, and getting back the results next frame rather than immediately.

4 Likes

I can see a lot a developers using something like a BatchRaycast API or something. There are many scenarios and games of mine where I have to cast many many rays per frame. GPU processing would be super beneficial in that sense. Having both regular CPU and batch GPU methods would be great to see if possible

3 Likes

On the topic of SIMD, could you guys add some way for us to fully utilize SIMD as well? I’m not fully versed on SIMD, but I believe modern CPUs should be able to do much more than 3 parallel instructions.

I’ve got pretty optimized custom raycast, using cache-aligned buffers and pretty much every trick in the book to speed up BVH traversal, but the only additional performance I couldn’t previously access was via SIMD. 3-way is great, but can we get N-way SIMD depending on the device’s capability? 8-way would be awesome.

3 Likes