Luau Native Code Generation Preview Update

It is all CPU. The GPU could do it faster, but with much less flexibility: To do it on the GPU it would have to be structured something like you submitting a batch of casts to do, and getting back the results next frame rather than immediately.

5 Likes

I can see a lot a developers using something like a BatchRaycast API or something. There are many scenarios and games of mine where I have to cast many many rays per frame. GPU processing would be super beneficial in that sense. Having both regular CPU and batch GPU methods would be great to see if possible

8 Likes

On the topic of SIMD, could you guys add some way for us to fully utilize SIMD as well? Iā€™m not fully versed on SIMD, but I believe modern CPUs should be able to do much more than 3 parallel instructions.

Iā€™ve got pretty optimized custom raycast, using cache-aligned buffers and pretty much every trick in the book to speed up BVH traversal, but the only additional performance I couldnā€™t previously access was via SIMD. 3-way is great, but can we get N-way SIMD depending on the deviceā€™s capability? 8-way would be awesome.

4 Likes

While we do explore ideas around SIMD, I donā€™t think there are any APIs that are coming for that any time soon.

1 Like

Are Dot and Cross optimizations coming soon to native Vector3?
About 50% of my Vector3 ops are dot product, and 30% are cross product, with the remaining 20% being split evenly between scalar multiplication and addition, with the rare .unit or .magnitude.

Dot/Cross/Floor/Ceil/Magnitude/Unit for Vector3 are native in Roblox Studio when properly type annotated.

Support on servers will come in a few weeks.

1 Like

My ray-BVH8 test is written in a way that could really leverage 8-way SIMD. If I were to construct Vector3ā€™s and then do the addition and multiplication on said Vector3ā€™s, would that be faster than having the math all in components?

In this case itā€™s important to consider the extra time it takes to pack data into a Vector3 and extra data back.
While it is a very fast operation, in short code examples weā€™ve seen that it could be faster to stay with numbers, for example, just doing Vector3.new(x, y, z).Magnitude is slower that just computing the magnitude manually.

However, if the amount of Vector3 operations is higher, it can become a benefit.
Best case would be to have data stored in Vector3 form and operate on it, weā€™ve seen examples of such code that outperformed the individual 3 numbers.

1 Like

Yeah, I figured that creating the Vector3 would kill any gains from Vector3 SIMD, although havenā€™t benchmarked it.

Unfortunately I canā€™t have the data pre-stored in Vector3 format because the data is encoded into a buffer. More specifically, my 8-wide BVH structure is one giant flattened buffer, with each node in the tree occupying 64 bytes (for 8 children, 8 bytes each, 6 bytes for boundaries: 2 bytes per axis ā€“ 1 for minbounds, 1 for maxbounds; 2 bytes for jump offset to child node).

In short, I need to encode the BVH in a buffer in this way to get optimal cache performance, because cache misses and memory reads were the main cost for BVH queries. Iā€™ve really written this with future SIMD support in mind, so fingers crossed thereā€¦!


this thing happened
(there is no anonymous function at line 1 in any of the scripts iā€™ve found)
iā€™m completely stumped, i have no idea why this is happening.
this only happened recently out of nowhere, it was working fine back then
i made no changes to it before and after this weird warning

: /

after a bit of tinkering with the script, --!optimize 0 fixed it.
ig that means the aggresive optimizations is doing something to mess it up

ok apparently it really doesnā€™t like explicit string types like these:

local str: {["a" | "b" | "c"]: any} = {
	a = "foo",
	b = "bar",
	c = "baz"
}

-- insert other code here

Piercing raycasts would be amazing, Iā€™m currently using 5 non-piercing raycasts to emulate the behaviour of a single piercing raycast!

4 Likes

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.