Will this - combined with parallel lua - improve the performance the terrain tools in studio given that they’re written in Lua?
i believe it’s only for server scripts as the roblox client does not have native code gen yet.
I sees, thanks for the informative response!
Now I forgot to ask one more thing.
What’s the current state of codegen when metatables are heavily utilized?
If I (for some reason) wanted to achieve MAXIMUM speed and performance and optimal memory management.
Should I still use metatables for a object-oriented-style programming or should I completely avoid metatables and instead go with 100% functional programming?
Does native codegen have any specific practices involving tables, dictionaries, arrays, etc to get optimal performance?
Besides just making basic math operations faster, I’m really curious where codegen REALLY shines bright.
I’ve been teaching myself to use more typed variables and type checking in my code to sort of “future proof” it in case codegen (or the interpreter itself) gets more optimized for typed variables.
I’ve began coding more or less similar to how I’ve used languages like C++ and C# where everything is usually statically typed and where you can make classes and structs to contain data and functions.
The feature is not available on the clients yet, so we will not compile LocalScript natively.
We will post updates if anything changes in this area in the future.
The feature can be used right now on the servers and in Studio plugins.
We have seen improvements in terrain tools in testing and are planning to use it there in the future.
We support metatables in obj:func
calls.
I wouldn’t expect to see much improvement with __index
/__newindex
, but implementation of that is already pretty good.
I would say especially good improvements are seen with math
, bit32
and buffer
libraries, and plain tables with no metatables.
We are experimenting with some exciting stuff around Vector2/Vector3/CFrame/Color3, but we need more time to finish that work.
Do you have a module that performs a lot of computation? Start there, try benchmarking it before and after putting --!native
and see if it improves performance.
This is great, Hope to this this used alot and it is surprising that Luau can work on a low level.
This is great, is there any way we can use this for our own data, Like Buffers or Arrays?
Great to see that parallel luau now has an implementation. Now I gotta learn parallel luau so my code can go light speed.
All of these are great. will there be plans for ways that we can go to an even lower level like
--@inline function
function add(a:number,b:number)
return a+b
end
By chance are there other plans to add more attributes to functions and variables.
Will this affect CFrame lerping or using :PivotTo()?
When this will come to the clients then performance will significantly increase for tasks like actually unlock the possibilities for custom things like those hacky volumetric lighting solutions to be a lot more better in terms of performance, unless the biggest overhead to these are the billboards and such…
Another thing is this that some games use to move the leaves and trees with another very hacky super inefficient method that’s extremely heavy in performance as Roblox doesn’t currently have a way to “move” them realistically and performant. With native code generation such thing would perform a lot worse than what it is now. Of course nothing will come close to a native implementation.
With that in mind, is it possible for you guys to make :Raycast any faster at all? are there still possible optimisations you can do?
We made it significantly faster in the last couple years (I believe almost 10x). It could always be faster still but a lot of the low hanging fruit has been picked at this point.
There are additional APIs we could consider in the future such as a piercing raycast returning all the hit parts along a path though, which would offer more performance for some tasks.
Just out of curiosity, is rayasting dealt on the CPU, or GPU. I’ve been doing a lot of ray tracing via EditableImage and the bottle neck has always been the raycast call. even for 1 ray per pixel
That would be really beneficial for many scenarios. Would love to see this become a reality!
It is all CPU. The GPU could do it faster, but with much less flexibility: To do it on the GPU it would have to be structured something like you submitting a batch of casts to do, and getting back the results next frame rather than immediately.
I can see a lot a developers using something like a BatchRaycast API or something. There are many scenarios and games of mine where I have to cast many many rays per frame. GPU processing would be super beneficial in that sense. Having both regular CPU and batch GPU methods would be great to see if possible
On the topic of SIMD, could you guys add some way for us to fully utilize SIMD as well? I’m not fully versed on SIMD, but I believe modern CPUs should be able to do much more than 3 parallel instructions.
I’ve got pretty optimized custom raycast, using cache-aligned buffers and pretty much every trick in the book to speed up BVH traversal, but the only additional performance I couldn’t previously access was via SIMD. 3-way is great, but can we get N-way SIMD depending on the device’s capability? 8-way would be awesome.
While we do explore ideas around SIMD, I don’t think there are any APIs that are coming for that any time soon.
Are Dot and Cross optimizations coming soon to native Vector3?
About 50% of my Vector3 ops are dot product, and 30% are cross product, with the remaining 20% being split evenly between scalar multiplication and addition, with the rare .unit or .magnitude.
Dot/Cross/Floor/Ceil/Magnitude/Unit for Vector3 are native in Roblox Studio when properly type annotated.
Support on servers will come in a few weeks.
My ray-BVH8 test is written in a way that could really leverage 8-way SIMD. If I were to construct Vector3’s and then do the addition and multiplication on said Vector3’s, would that be faster than having the math all in components?
In this case it’s important to consider the extra time it takes to pack data into a Vector3 and extra data back.
While it is a very fast operation, in short code examples we’ve seen that it could be faster to stay with numbers, for example, just doing Vector3.new(x, y, z).Magnitude
is slower that just computing the magnitude manually.
However, if the amount of Vector3 operations is higher, it can become a benefit.
Best case would be to have data stored in Vector3 form and operate on it, we’ve seen examples of such code that outperformed the individual 3 numbers.