I LOVE this update! It’s given me an average of 2-3 times performance improvements in my intersection tests. One thing that seemed odd to me is I’m not seeing a big speedup in my Ray->OBB intersection tests (~1.3x performance improvement compared to AABB which received a ~2.6x performance improvement). I assume this is due to heavy vector usage.
The code in question
-- Length is the length of the direction vector
-- Normalized is the direction vector after using .Unit
-- Bounds is the half size of the OBB
local function IntersectOBB(Length: number, Origin: Vector3, Normalized: Vector3, Bounds: Vector3, Rotation: CFrame): number?
local Minimum = 0
local Maximum = 100000
local Delta = (Rotation.Position - Origin)
--> Test plane intersections
for Size, Axis in {[Bounds.X] = Rotation.RightVector, [Bounds.Y] = Rotation.UpVector, [Bounds.Z] = Rotation.LookVector} do
--> Ray direction & axis length
local NomLength = Axis:Dot(Delta)
local DenomLength = Direction:Dot(Axis)
if math.abs(DenomLength) > EPSILON then
local PlaneMinimum = (NomLength + -Size) / DenomLength
local PlaneMaximum = (NomLength + Size) / DenomLength
--> PlaneMinimum needs to represent the closest intersection
if PlaneMinimum > PlaneMaximum then
local Temporary = PlaneMinimum
PlaneMinimum = PlaneMaximum
PlaneMaximum = Temporary
end
--> Replace with the nearest "far" intersection among the planes
if PlaneMaximum < Maximum then
Maximum = PlaneMaximum
end
--> Replace with the farthest "near" intersection among the planes
if PlaneMinimum > Minimum then
Minimum = PlaneMinimum
end
-- If "near" is farther than ray length then there is no intersection
if Minimum > Length then
return
end
--> If "far" is closer than "near" then there is no intersection
if Maximum < Minimum then
return
end
-- The ray is almost parallel to the planes, so they don't have any "intersection"
elseif (-NomLength + Size > 0) or (-NomLength + Size < 0) then
return
end
end
return Minimum
end
This sounds promising for the optimization of my track generation algorithm for my experience, only problem is the algorithm uses Vector3’s and CFrames from the start, so I guess I’ll have to wait until those data types are supported.
If I recall, native actually has a bunch of pros and cons.
Pros of native code:
Super mega ultra fast compared to interpreted languages.
Almost zero overhead? No more bridging between virtual machine and engine functions?
Cons of native code:
Possibly uses more memory due being less compact than bytecode.
Possibly slightly longer loading times in games that heavily use it due to compilation step.
Could be wrong but this seems logical to me.
Interpreted languages tend to be smaller and convenient but a little slower in execution speed while native might execute much faster but use more memory and experience a small hitch the first time you run it before it has finished compiling.
I know this is sometimes the case with C#.
C# uses JIT but also reuses it’s own JIT-compiled instructions.
So it starts up slower the first time you run a C# program but gets faster each time you run it again because it reuses instructions and does like iterative optimization (along with it being statically typed which makes it a more predictable language).
Can we expect this to also work with things like metatables, raycasting, mixed arrays, events/connections, etc soon?
I’m so excited to start using native Luau for things like logic that needs to execute every single frame or object-oriented systems that use metatables and methods.
I’ve been peeking in every now and then to look at the luau/CodeGen folder on GitHub and now that there’s a preview I have to say I’m impressed. From my limited testing, I’m seeing anywhere from a 20% to 35% uplift in performance in some of my code. MessagePack.utf8Encode sees a ~34% (!!!) increase in performance in certain tests as well.
I implemented a function to get the graphemes of a UTF8 string as an array to compare codegen to interpreter to library functions though I should make it very clear that you should always be using these engine features if they’re available. Free performance is free performance, however.
(Note that the version of this function using utf8.graphemes still runs ~1.5x faster)
GetGraphemes (~19-20% uplift)
--!native
--!optimize 2
local function isContinuation(n: number): boolean
return bit32.band(n, 0xC0) == 0x80
end
local function emptyBufferInto(buffer: {string}, target: {string})
table.insert(target, table.concat(buffer))
table.clear(buffer)
end
local function getGraphemes(s: string): {string}
local graphemes = {}
local charBuffer = {string.sub(s, 1, 1)}
for index, byte in {string.byte(s, 2, #s)} do
if not isContinuation(byte) then
emptyBufferInto(charBuffer, graphemes)
end
table.insert(charBuffer, string.char(byte))
end
emptyBufferInto(charBuffer, graphemes)
return graphemes
end
Very cool stuff! I’m looking forward to those profitability heuristics so I have less work to do!
TEST: Generate 40 drills, 3x10000 @ 500. 22,223,063 total world blocks Not Native:
Time Taken: 130.608s
Total Memory: 3068MB
Blocks Generated p/s: 170,150
Native:
Time Taken: 115.871
Total Memory: 3090MB
Blocks Generated p/s: 191,791
TEST: Generate 36 drills, 4x18000 @ 10000 then compress all chunks into memory. 55,327,053 total world blocks(time is how long it took to compress) Not Native: 32.681s Native: 28.950s
Getting a 4x improvement in our Sweep code (raycast but with arbitrary shapes in place of a point)
Getting a 6x improvement in our locomotion stepLinking code (which is just a big matrix inversion)
In our IK code, we get almost no improvement in real tests, though I don’t want to post it because it is sensitive. Should I just DM?
So to my understanding this will increase the performance on the CPU right? The GPU will stay unaffected as the script execution happen on the CPU.
Also if I am not wrong the Roblox app and its core scripts already run natively right? This is to get Luau from experiences that are made in Roblox to run native and therefor faster.
I wonder in a client side(when it gets released) at rendering how much faster games are gonna be now
If your IK code is using Vector3 heavily then this is known and we will be working on Vector3 optimizations soon! If your IK code is scalar only then we’ll reach out via DM.
Sorry for the error, that one’s my fault, not related to this beta.
Specifically, resizing single TrussParts is broken in Studio right now. It’ll be fixed in the next release, until then you can still resize them through the Size property in the properties pane if necessary.
Ah yes, it’s entirely quaternions represented as (scalar, Vector3). I saw in the post that Vector3s did not see a similar improvement, but I expected at least something because Sweep has a lot of Vector3 operations in it… I just didn’t expect the scalar improvements to be so great as to basically become free in comparison to Vector3, really impressive.
Will native code gen be taking advantage of SIMD if it isn’t already?
I think I stated this in earlier posts but the memory increase is likely due the fact that native code is less compact than Lua byte code.
You get more speed at the cost of maybe using a little bit more memory than normal.
And (maybe) possibly slightly slower loading times in games that heavily use this feature (due to compilation time).
Is it planned to allow pre-compiling code eventually? i.e. publish game → code gets compiled by roblox → players join game and receive compiled code
I feel like this would be really good since the initial slowdown will be gone AND on top of that exploiters won’t be able to steal code as easily because its machine code and not bytecode.