Luau Native Code Generation Preview [Studio Beta]

I LOVE this update! It’s given me an average of 2-3 times performance improvements in my intersection tests. One thing that seemed odd to me is I’m not seeing a big speedup in my Ray->OBB intersection tests (~1.3x performance improvement compared to AABB which received a ~2.6x performance improvement). I assume this is due to heavy vector usage.

The code in question
-- Length is the length of the direction vector
-- Normalized is the direction vector after using .Unit
-- Bounds is the half size of the OBB
local function IntersectOBB(Length: number, Origin: Vector3, Normalized: Vector3, Bounds: Vector3, Rotation: CFrame): number?
	local Minimum = 0
	local Maximum = 100000
	local Delta = (Rotation.Position - Origin)

	--> Test plane intersections
	for Size, Axis in {[Bounds.X] = Rotation.RightVector, [Bounds.Y] = Rotation.UpVector, [Bounds.Z] = Rotation.LookVector} do
		--> Ray direction & axis length
		local NomLength = Axis:Dot(Delta)
		local DenomLength = Direction:Dot(Axis)

		if math.abs(DenomLength) > EPSILON then
			local PlaneMinimum = (NomLength + -Size) / DenomLength
			local PlaneMaximum = (NomLength + Size) / DenomLength

			--> PlaneMinimum needs to represent the closest intersection
			if PlaneMinimum > PlaneMaximum then
				local Temporary = PlaneMinimum
				PlaneMinimum = PlaneMaximum
				PlaneMaximum = Temporary
			end

			--> Replace with the nearest "far" intersection among the planes
			if PlaneMaximum < Maximum then
				Maximum = PlaneMaximum
			end

			--> Replace with the farthest "near" intersection among the planes
			if PlaneMinimum > Minimum then
				Minimum = PlaneMinimum
			end

			-- If "near" is farther than ray length then there is no intersection
			if Minimum > Length then
				return
			end

			--> If "far" is closer than "near" then there is no intersection
			if Maximum < Minimum then
				return
			end
			-- The ray is almost parallel to the planes, so they don't have any "intersection"
		elseif (-NomLength + Size > 0) or (-NomLength + Size < 0) then
			return
		end
	end

	return Minimum
end
5 Likes

This sounds promising for the optimization of my track generation algorithm for my experience, only problem is the algorithm uses Vector3’s and CFrames from the start, so I guess I’ll have to wait until those data types are supported.

1 Like

If I recall, native actually has a bunch of pros and cons.

Pros of native code:

  • Super mega ultra fast compared to interpreted languages.
  • Almost zero overhead? No more bridging between virtual machine and engine functions?

Cons of native code:

  • Possibly uses more memory due being less compact than bytecode.
  • Possibly slightly longer loading times in games that heavily use it due to compilation step.

Could be wrong but this seems logical to me.
Interpreted languages tend to be smaller and convenient but a little slower in execution speed while native might execute much faster but use more memory and experience a small hitch the first time you run it before it has finished compiling.

I know this is sometimes the case with C#.
C# uses JIT but also reuses it’s own JIT-compiled instructions.

So it starts up slower the first time you run a C# program but gets faster each time you run it again because it reuses instructions and does like iterative optimization (along with it being statically typed which makes it a more predictable language).

4 Likes

Can we expect this to also work with things like metatables, raycasting, mixed arrays, events/connections, etc soon?

I’m so excited to start using native Luau for things like logic that needs to execute every single frame or object-oriented systems that use metatables and methods.

1 Like

This should help immensely with parallel actors for global illumination

Incredible. Can’t wait to try this out when Mac support lands.

3 Likes

Will this be a Roblox specific feature or something thats going to be added to Luau itself?

2 Likes

I’ve been peeking in every now and then to look at the luau/CodeGen folder on GitHub and now that there’s a preview I have to say I’m impressed. From my limited testing, I’m seeing anywhere from a 20% to 35% uplift in performance in some of my code. MessagePack.utf8Encode sees a ~34% (!!!) increase in performance in certain tests as well.

I implemented a function to get the graphemes of a UTF8 string as an array to compare codegen to interpreter to library functions though I should make it very clear that you should always be using these engine features if they’re available. Free performance is free performance, however.
(Note that the version of this function using utf8.graphemes still runs ~1.5x faster)

GetGraphemes (~19-20% uplift)
--!native
--!optimize 2

local function isContinuation(n: number): boolean
	return bit32.band(n, 0xC0) == 0x80
end

local function emptyBufferInto(buffer: {string}, target: {string})
	table.insert(target, table.concat(buffer))
	table.clear(buffer)
end

local function getGraphemes(s: string): {string}
	local graphemes = {}
	local charBuffer = {string.sub(s, 1, 1)}
	
	for index, byte in {string.byte(s, 2, #s)} do
		if not isContinuation(byte) then
			emptyBufferInto(charBuffer, graphemes)
		end
		
		table.insert(charBuffer, string.char(byte))
	end
	
	emptyBufferInto(charBuffer, graphemes)
	
	return graphemes
end

Very cool stuff! I’m looking forward to those profitability heuristics so I have less work to do!

3 Likes

Tested it in my mining game. Pretty cool

TEST: Generate 40 drills, 3x10000 @ 500. 22,223,063 total world blocks
Not Native:
Time Taken: 130.608s
Total Memory: 3068MB
Blocks Generated p/s: 170,150

Native:
Time Taken: 115.871
Total Memory: 3090MB
Blocks Generated p/s: 191,791

TEST: Generate 36 drills, 4x18000 @ 10000 then compress all chunks into memory. 55,327,053 total world blocks(time is how long it took to compress)
Not Native: 32.681s
Native: 28.950s

9 Likes

Cool feature. A little bit interested to know more about deeper reasonings as to why there’s a memory increase if you’re willing to share more.

Edit: Are you embracing rock music yet?

1 Like

This is great, but there was a part of plugin code broken while enabling this beta.

Mind taking a look?


(Can’t resize trusses)

3 Likes

Getting a 4x improvement in our Sweep code (raycast but with arbitrary shapes in place of a point)
Getting a 6x improvement in our locomotion stepLinking code (which is just a big matrix inversion)
:slight_smile:

In our IK code, we get almost no improvement in real tests, though I don’t want to post it because it is sensitive. Should I just DM?

8 Likes

So to my understanding this will increase the performance on the CPU right? The GPU will stay unaffected as the script execution happen on the CPU.

Also if I am not wrong the Roblox app and its core scripts already run natively right? This is to get Luau from experiences that are made in Roblox to run native and therefor faster.
I wonder in a client side(when it gets released) at rendering how much faster games are gonna be now

This seems unrelated to this beta (doesn’t go away when the beta is disabled), but we’ll take a look!

2 Likes

If your IK code is using Vector3 heavily then this is known and we will be working on Vector3 optimizations soon! If your IK code is scalar only then we’ll reach out via DM.

4 Likes

Sorry for the error, that one’s my fault, not related to this beta.

Specifically, resizing single TrussParts is broken in Studio right now. It’ll be fixed in the next release, until then you can still resize them through the Size property in the properties pane if necessary.

7 Likes

Ah yes, it’s entirely quaternions represented as (scalar, Vector3). I saw in the post that Vector3s did not see a similar improvement, but I expected at least something because Sweep has a lot of Vector3 operations in it… I just didn’t expect the scalar improvements to be so great as to basically become free in comparison to Vector3, really impressive.

Will native code gen be taking advantage of SIMD if it isn’t already?

1 Like

I think I stated this in earlier posts but the memory increase is likely due the fact that native code is less compact than Lua byte code.

You get more speed at the cost of maybe using a little bit more memory than normal.
And (maybe) possibly slightly slower loading times in games that heavily use this feature (due to compilation time).

2 Likes

Is it planned to allow pre-compiling code eventually? i.e. publish game → code gets compiled by roblox → players join game and receive compiled code

I feel like this would be really good since the initial slowdown will be gone AND on top of that exploiters won’t be able to steal code as easily because its machine code and not bytecode.

9 Likes

isn’t that just how it works?? oh wait nvm