Faster Lua VM: Studio beta

How do we go about asking for the new VM to be enabled? Email or the forums?

2 Likes

New VM has optimizations for typical OOP structures, I think we include optimizations for the type of code you’ve listed although we’re more focused on metatable-based OOP (MyClass.__index = MyClass etc.).

10 Likes

While in general I agree that NaN creates more problems than it solves, the issue is that we can’t change how NaN behaves everywhere in the code - what you’re observing only happens in constant expressions, and something like workpace.Gravity/0-workpace.Gravity/0 will still generate a NaN right now.

6 Likes

Are recursive functions optimised in any way under the new VM, and if so, what kinds of optimisations can we expect for them? I know that stuff like tail call optimisation isn’t present in the current VM so I’m wondering if those issues are still there in the new VM :slightly_smiling_face:

edit: also, would you advise we favour recursive functions over loops if it aids code readability, or would that cause optimisation issues?

We don’t implement tail calls in the new VM and don’t have special optimizations for recursive functions - function call cost is lower in the new VM but that’s just an across the board improvement. A recursive factorial runs ~1.5-2x faster in the new VM compared to old VM I believe.

A non-recursive algorithm is generally going to be faster in our VM unless you need to go out of your way to maintain some sort of custom stack. A non-recursive factorial runs ~2.5x faster than recursive factorial.

5 Likes

We currently limit the number of upvalues to 60 as well, primarily to make sure people don’t write code accidentally that breaks in the old VM. Once we remove the old VM we plan to increase this limit to 200 or so.

10 Likes

No plans to do this, sorry. Opting in to something like this on a per-game basis is very complex because the VM starts up in a non-replicated context so we can’t read any game settings easily. This would also complicate the setup for a very unique usecase. You’ll have to settle for setfenv/getfenv…

2 Likes

I hope nothing changes, but even if it does, a twofold performance boost should be worth it. Keep it up!

So does this mean raycasts are now less expensive or less ‘laggy’ to do on low end devices?

If so then: HURRAY :partying_face:!

My BindableEvent library got 6% faster. Holy balls.

So I feel a bit confused, because doing the sqrt = math.sqrt optimization trick seems to still yield better performance for me. It seems to be about 50% faster. I thought the new VM was supposed to prevent this technique from being effective? Granted, the new VM is still faster. Am I missing something? Here’s the code I was throwing into the command bar:

function Benchmark(label, itr, f)
	print("Running " .. label .. "...")
	local start = tick()
	for i = 1,itr do
		f()
	end
	local dur = (tick() - start)
	print(("%s: %ims"):format(label, dur * 1000))
end


local x = 0
local sqrt = math.sqrt
Benchmark("Test1", 10000000, function()
	x = sqrt(10)
end)

Benchmark("Test2", 10000000, function()
	x = math.sqrt(10)
end)
1 Like

That’s what I got. Plus, ipairs is now faster than a numeric for loop.

4 Likes

epic dev moment

Yeah, this is amazing :cool:

The math.sqrt optimization is not released yet.

2 Likes

I’m getting that sqrt is about 2.5x faster than math.sqrt in the new VM

for i = 1, 100000 do
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
end
print(tick() - t0)

local sqrt = math.sqrt
wait(2)

local t0 = tick()
for i = 1, 100000 do
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
end
print(tick() - t0)
--[[
>0.41047310829163
>0.14743971824646
]]

This does not encompass a full range of inputs of sqrt, but most of the time I expect positive values.
Apparently sqrt(-1) is about 3.5x slower than sqrt(1)

Still, apparently this is not yet implemented in the VM. I will still personally localize all the math functions anyway because math guys like their equations short and to the point.

5 Likes

In that case, there should be a native way to separate a large script into multiple scripts that can all access the same environment without having a ModuleScript return a function which acts like a block of code.

That’s what I’m currently using getfenv and setfenv for as my framework’s core alone has over 2000 lines, and I’m only done with basics. In other words, I could expect it to be around 7000 lines by the time I’m done and I obviously won’t keep it in a single script object.

Are there plans to enable the new Lua VM on the mobile app in the coming weeks?

1 Like

I’m getting these results fairly consistently with your test on the new VM:
0.19356298446655
0.20484709739685

2 Likes

Maybe they just changed it over or I haven’t received the beta latest updates! This is good, people who don’t like to localize math funcs don’t gotta do it no more. What CPU are you running?

1 Like