Faster Lua VM: Studio beta

madattak · June 27, 2019, 10:42am

This is perfect timing, some of my new code was starting to get a bit heavy on ol’ processor, this should help a lot!

EDIT: Microprofiler read a speed increase of about 15% in my code, and that’s with the studio debugger disabled for both VMs. Not bad! Might try and get some more accurate results later, as this was just a quick test.

madattak · June 27, 2019, 10:45am

Personally, I vote for this NOT to be changed, or at least change how NaN is handled. If you accidentally send NaN, inf, etc. to a BodyMover or constraint, it causes unpredictable bugs to start spilling out into various aspects of the game, which from personal experience, can send you down hours of false leads before finding out why parts of the level are randomly prevented from rendering, why parts are stuck in some weird superposition of existing/not existing, or why vehicles are randomly warping backwards in time to where they were a few seconds ago.

buildthomas · June 27, 2019, 10:51am

Changing the way the math operations work without extensive notice can be very dangerous. As you can see from @AxisAngle’s code, some people attach a specific semantic to NaN being the result of an operation.

It won’t be possible to tell if you actually got a 0 as a result or NaN, and you wouldn’t be able to tell the difference between NaN and -NaN anymore. If you don’t want NaN spilling into properties of Instances then you should safeguard your code from producing NaN. (i.e. by checking that you aren’t doing .unit on a vector with length 0, or 0/0, etc)

What might be more effective is some way to set a warning message to appear when you set a certain property of an Instance to NaN.

madattak · June 27, 2019, 10:55am

That’s completely fair, and you’re right, the outcome of the equation should stay NaN. I would very much appreciate a warning message, and have created a request for this in the past.

What happened with my code was that a vector that I thought could never be 0,0,0 in very rare circumstances actually could be, which then caused all of the wacky bugs I mentioned earlier. Because they were so weird, and I’d never experienced any NaN errors before, I never thought to check for NaN vectors for a long time resulting in a lot of wasted time. But anyway, wrong thread for this I guess, I might make a new feature request.

dylanjkl · June 27, 2019, 1:30pm

How do we go about asking for the new VM to be enabled? Email or the forums?

AxisAngle · June 27, 2019, 2:06pm

zeuxcg · June 27, 2019, 2:43pm

New VM has optimizations for typical OOP structures, I think we include optimizations for the type of code you’ve listed although we’re more focused on metatable-based OOP (MyClass.__index = MyClass etc.).

zeuxcg · June 27, 2019, 2:46pm

While in general I agree that NaN creates more problems than it solves, the issue is that we can’t change how NaN behaves everywhere in the code - what you’re observing only happens in constant expressions, and something like workpace.Gravity/0-workpace.Gravity/0 will still generate a NaN right now.

Elttob · June 27, 2019, 2:49pm

Are recursive functions optimised in any way under the new VM, and if so, what kinds of optimisations can we expect for them? I know that stuff like tail call optimisation isn’t present in the current VM so I’m wondering if those issues are still there in the new VM

edit: also, would you advise we favour recursive functions over loops if it aids code readability, or would that cause optimisation issues?

zeuxcg · June 27, 2019, 3:12pm

We don’t implement tail calls in the new VM and don’t have special optimizations for recursive functions - function call cost is lower in the new VM but that’s just an across the board improvement. A recursive factorial runs ~1.5-2x faster in the new VM compared to old VM I believe.

A non-recursive algorithm is generally going to be faster in our VM unless you need to go out of your way to maintain some sort of custom stack. A non-recursive factorial runs ~2.5x faster than recursive factorial.

zeuxcg · June 27, 2019, 4:11pm

We currently limit the number of upvalues to 60 as well, primarily to make sure people don’t write code accidentally that breaks in the old VM. Once we remove the old VM we plan to increase this limit to 200 or so.

zeuxcg · June 27, 2019, 4:16pm

No plans to do this, sorry. Opting in to something like this on a per-game basis is very complex because the VM starts up in a non-replicated context so we can’t read any game settings easily. This would also complicate the setup for a very unique usecase. You’ll have to settle for setfenv/getfenv…

BraxbroRoblox · June 27, 2019, 5:23pm

I hope nothing changes, but even if it does, a twofold performance boost should be worth it. Keep it up!

x86_architecture · June 27, 2019, 6:08pm

So does this mean raycasts are now less expensive or less ‘laggy’ to do on low end devices?

If so then: HURRAY !

pobammer · June 27, 2019, 6:18pm

My BindableEvent library got 6% faster. Holy balls.

sleitnick · June 27, 2019, 6:25pm

So I feel a bit confused, because doing the sqrt = math.sqrt optimization trick seems to still yield better performance for me. It seems to be about 50% faster. I thought the new VM was supposed to prevent this technique from being effective? Granted, the new VM is still faster. Am I missing something? Here’s the code I was throwing into the command bar:

function Benchmark(label, itr, f)
	print("Running " .. label .. "...")
	local start = tick()
	for i = 1,itr do
		f()
	end
	local dur = (tick() - start)
	print(("%s: %ims"):format(label, dur * 1000))
end


local x = 0
local sqrt = math.sqrt
Benchmark("Test1", 10000000, function()
	x = sqrt(10)
end)

Benchmark("Test2", 10000000, function()
	x = math.sqrt(10)
end)

pobammer · June 27, 2019, 6:28pm

That’s what I got. Plus, ipairs is now faster than a numeric for loop.

4D_X · June 27, 2019, 6:39pm

epic dev moment

Yeah, this is amazing

3dsboy08 · June 27, 2019, 7:00pm

The math.sqrt optimization is not released yet.

AxisAngle · June 27, 2019, 7:17pm

I’m getting that sqrt is about 2.5x faster than math.sqrt in the new VM

for i = 1, 100000 do
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
	math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
end
print(tick() - t0)

local sqrt = math.sqrt
wait(2)

local t0 = tick()
for i = 1, 100000 do
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
	sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
end
print(tick() - t0)
--[[
>0.41047310829163
>0.14743971824646
]]

This does not encompass a full range of inputs of sqrt, but most of the time I expect positive values.
Apparently sqrt(-1) is about 3.5x slower than sqrt(1)

Still, apparently this is not yet implemented in the VM. I will still personally localize all the math functions anyway because math guys like their equations short and to the point.