Faster Lua VM: Studio beta

dylanjkl · June 27, 2019, 1:30pm

How do we go about asking for the new VM to be enabled? Email or the forums?

AxisAngle · June 27, 2019, 2:06pm

zeuxcg · June 27, 2019, 2:43pm

New VM has optimizations for typical OOP structures, I think we include optimizations for the type of code you’ve listed although we’re more focused on metatable-based OOP (MyClass.__index = MyClass etc.).

zeuxcg · June 27, 2019, 2:46pm

While in general I agree that NaN creates more problems than it solves, the issue is that we can’t change how NaN behaves everywhere in the code - what you’re observing only happens in constant expressions, and something like workpace.Gravity/0-workpace.Gravity/0 will still generate a NaN right now.

Elttob · June 27, 2019, 2:49pm

Are recursive functions optimised in any way under the new VM, and if so, what kinds of optimisations can we expect for them? I know that stuff like tail call optimisation isn’t present in the current VM so I’m wondering if those issues are still there in the new VM

edit: also, would you advise we favour recursive functions over loops if it aids code readability, or would that cause optimisation issues?

zeuxcg · June 27, 2019, 3:12pm

We don’t implement tail calls in the new VM and don’t have special optimizations for recursive functions - function call cost is lower in the new VM but that’s just an across the board improvement. A recursive factorial runs ~1.5-2x faster in the new VM compared to old VM I believe.

A non-recursive algorithm is generally going to be faster in our VM unless you need to go out of your way to maintain some sort of custom stack. A non-recursive factorial runs ~2.5x faster than recursive factorial.

zeuxcg · June 27, 2019, 4:11pm

We currently limit the number of upvalues to 60 as well, primarily to make sure people don’t write code accidentally that breaks in the old VM. Once we remove the old VM we plan to increase this limit to 200 or so.

zeuxcg · June 27, 2019, 4:16pm

No plans to do this, sorry. Opting in to something like this on a per-game basis is very complex because the VM starts up in a non-replicated context so we can’t read any game settings easily. This would also complicate the setup for a very unique usecase. You’ll have to settle for setfenv/getfenv…

BraxbroRoblox · June 27, 2019, 5:23pm

I hope nothing changes, but even if it does, a twofold performance boost should be worth it. Keep it up!

x86_architecture · June 27, 2019, 6:08pm

So does this mean raycasts are now less expensive or less ‘laggy’ to do on low end devices?

If so then: HURRAY !

pobammer · June 27, 2019, 6:18pm

My BindableEvent library got 6% faster. Holy balls.

sleitnick · June 27, 2019, 6:25pm

So I feel a bit confused, because doing the sqrt = math.sqrt optimization trick seems to still yield better performance for me. It seems to be about 50% faster. I thought the new VM was supposed to prevent this technique from being effective? Granted, the new VM is still faster. Am I missing something? Here’s the code I was throwing into the command bar:

function Benchmark(label, itr, f)
	print("Running " .. label .. "...")
	local start = tick()
	for i = 1,itr do
		f()
	end
	local dur = (tick() - start)
	print(("%s: %ims"):format(label, dur * 1000))
end


local x = 0
local sqrt = math.sqrt
Benchmark("Test1", 10000000, function()
	x = sqrt(10)
end)

Benchmark("Test2", 10000000, function()
	x = math.sqrt(10)
end)

pobammer · June 27, 2019, 6:28pm

That’s what I got. Plus, ipairs is now faster than a numeric for loop.

4D_X · June 27, 2019, 6:39pm

epic dev moment

Yeah, this is amazing

3dsboy08 · June 27, 2019, 7:00pm

The math.sqrt optimization is not released yet.

AxisAngle · June 27, 2019, 7:17pm

I’m getting that sqrt is about 2.5x faster than math.sqrt in the new VM

for i = 1, 100000 do
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
end
print(tick() - t0)

local sqrt = math.sqrt
wait(2)

local t0 = tick()
for i = 1, 100000 do
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
end
print(tick() - t0)
--[[
>0.41047310829163
>0.14743971824646
]]

This does not encompass a full range of inputs of sqrt, but most of the time I expect positive values.
Apparently sqrt(-1) is about 3.5x slower than sqrt(1)

Still, apparently this is not yet implemented in the VM. I will still personally localize all the math functions anyway because math guys like their equations short and to the point.

nooneisback · June 27, 2019, 7:21pm

In that case, there should be a native way to separate a large script into multiple scripts that can all access the same environment without having a ModuleScript return a function which acts like a block of code.

That’s what I’m currently using getfenv and setfenv for as my framework’s core alone has over 2000 lines, and I’m only done with basics. In other words, I could expect it to be around 7000 lines by the time I’m done and I obviously won’t keep it in a single script object.

UnderMyWheel · June 27, 2019, 7:45pm

Are there plans to enable the new Lua VM on the mobile app in the coming weeks?

unmiss · June 27, 2019, 8:07pm

I’m getting these results fairly consistently with your test on the new VM:
0.19356298446655
0.20484709739685

AxisAngle · June 27, 2019, 8:10pm

Maybe they just changed it over or I haven’t received the beta latest updates! This is good, people who don’t like to localize math funcs don’t gotta do it no more. What CPU are you running?