It is all CPU. The GPU could do it faster, but with much less flexibility: To do it on the GPU it would have to be structured something like you submitting a batch of casts to do, and getting back the results next frame rather than immediately.
I can see a lot a developers using something like a BatchRaycast API or something. There are many scenarios and games of mine where I have to cast many many rays per frame. GPU processing would be super beneficial in that sense. Having both regular CPU and batch GPU methods would be great to see if possible
On the topic of SIMD, could you guys add some way for us to fully utilize SIMD as well? Iām not fully versed on SIMD, but I believe modern CPUs should be able to do much more than 3 parallel instructions.
Iāve got pretty optimized custom raycast, using cache-aligned buffers and pretty much every trick in the book to speed up BVH traversal, but the only additional performance I couldnāt previously access was via SIMD. 3-way is great, but can we get N-way SIMD depending on the deviceās capability? 8-way would be awesome.
While we do explore ideas around SIMD, I donāt think there are any APIs that are coming for that any time soon.
Are Dot and Cross optimizations coming soon to native Vector3?
About 50% of my Vector3 ops are dot product, and 30% are cross product, with the remaining 20% being split evenly between scalar multiplication and addition, with the rare .unit or .magnitude.
Dot/Cross/Floor/Ceil/Magnitude/Unit for Vector3 are native in Roblox Studio when properly type annotated.
Support on servers will come in a few weeks.
My ray-BVH8 test is written in a way that could really leverage 8-way SIMD. If I were to construct Vector3ās and then do the addition and multiplication on said Vector3ās, would that be faster than having the math all in components?
In this case itās important to consider the extra time it takes to pack data into a Vector3 and extra data back.
While it is a very fast operation, in short code examples weāve seen that it could be faster to stay with numbers, for example, just doing Vector3.new(x, y, z).Magnitude
is slower that just computing the magnitude manually.
However, if the amount of Vector3 operations is higher, it can become a benefit.
Best case would be to have data stored in Vector3 form and operate on it, weāve seen examples of such code that outperformed the individual 3 numbers.
Yeah, I figured that creating the Vector3 would kill any gains from Vector3 SIMD, although havenāt benchmarked it.
Unfortunately I canāt have the data pre-stored in Vector3 format because the data is encoded into a buffer. More specifically, my 8-wide BVH structure is one giant flattened buffer, with each node in the tree occupying 64 bytes (for 8 children, 8 bytes each, 6 bytes for boundaries: 2 bytes per axis ā 1 for minbounds, 1 for maxbounds; 2 bytes for jump offset to child node).
In short, I need to encode the BVH in a buffer in this way to get optimal cache performance, because cache misses and memory reads were the main cost for BVH queries. Iāve really written this with future SIMD support in mind, so fingers crossed thereā¦!
this thing happened
(there is no anonymous function at line 1 in any of the scripts iāve found)
iām completely stumped, i have no idea why this is happening.
this only happened recently out of nowhere, it was working fine back then
i made no changes to it before and after this weird warning
: /
after a bit of tinkering with the script, --!optimize 0
fixed it.
ig that means the aggresive optimizations is doing something to mess it up
ok apparently it really doesnāt like explicit string types like these:
local str: {["a" | "b" | "c"]: any} = {
a = "foo",
b = "bar",
c = "baz"
}
-- insert other code here
Piercing raycasts would be amazing, Iām currently using 5 non-piercing raycasts to emulate the behaviour of a single piercing raycast!
This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.