Faster Lua VM: Studio beta

So as long as you’re not using the “hack” type syntax the code should be fine?

1 Like

Calling objects like they’re functions is the only thing that is not going to work anymore. It wasn’t supposed to work before. Everything that worked before that isn’t calling objects like they’re functions is going to work in the new VM. If you have some example of something that worked before that isn’t calling objects like they’re functions, it’s going to work in the new VM. Nothing that worked before is going to stop working, except for calling objects like they’re functions, which never should have worked in the first place.

10 Likes
Spoiler


Yes, we get that xd

Despite that, we don’t know yet whether calling objects like functions is the only thing that isn’t going to work. Then there is also stuff like the depth limit, and some known and not yet known bugs which are yet to be discovered.
Redoing the entire vm is a big change which is bound to have some bugs. There are some details which haven’t been mentioned, which might break a specific piece of code. So in my opinion it’s too early to say something like this yet

2 Likes

I think the key here is “going to work”. If something is different and it’s not mentioned in the behavior changes section, it’s a bug.

3 Likes

This is speaking completely from the implementation of vanilla Lua.

8 bytes (which isn’t the actual amount) is not a lot to worry about, and is necessary for the behavior of Lua upvalues. At almost any scale it should be just about negligible, unless for some reason many functions are being nested or quickly instantiated which may indicate another problem.

Vanilla Lua defines upvalues as: https://www.lua.org/source/5.1/lobject.h.html#UpVal, and LClosures (Lua functions) keep an array of them. The UpVal struct is a bit bigger than 8 bytes, so I’m not sure that the idea of this being a performance problem is founded in proper research.

Globals are also stored in a table, so they will probably end up using more memory than just the upvalue approach; Lua defines an entry in a hashtable here: https://www.lua.org/source/5.1/lobject.h.html#Node

Upvalues references are also collected as soon as their references go out of scope, while a global in the environment lives for as long as that environment. Although all of this together doesn’t give a good reason for trying to crunch these things down even smaller. The new codebase Roblox has might even handle it differently already, but the way vanilla Lua does it is already lightweight and out of the way.

I also personally don’t know if it’s possible to easily replicate the behavior of upvalues we have now with less memory.

5 Likes

Here’s a memory usage difference on various structures with and without the new VM. The first one is from last month before the Studio beta was introduced. These were just done in studio, so the live game may be different.


There are quite a few improvements, but function memory usage is the same across the board.
Looks like I was wrong about how much memory is used on top of the +8 per function.

Here’s the same tests run on the current live game using my “distributed profiler” after about 250m iterations across a few dozen clients (random people who followed me there).


Here are the performance tests with their periods relative to creating a blank function:

“func (1gu)” is a function with 1 “global upvalue” or shared upvalue, and you can see that it is about 1.1x slower than creating a function that has no upvalues.

I didn’t put very much thought into the naming of the tests, so here’s the code that the memory and performance tests both use:

Add("control", function() return false end)

Add("table (0)", function() return {} end)

Add("func (0)", function(v1) return function() end end)

Add("table (1h)", function() return {[1] = nil} end)

Add("table (1a)", function() return {nil} end)

Add("func (1u)", function(v1) return function() return v1 end end)

Add("table (2h)", function() return {[1]=nil,[2]=nil} end)

Add("table (2a)", function() return {nil,nil} end)

Add("func (2u)", function(v1,v2) return function() return v1,v2 end end)

Add("table (3h)", function() return {[1]=nil,[2]=nil,[3]=nil} end)

Add("table (3a)", function() return {nil,nil,nil} end)

Add("func (3u)", function(v1,v2,v3) return function() return v1,v2,v3 end end)

Add("table (4h)", function() return {[1]=nil,[2]=nil,[3]=nil,[4]=nil} end)

Add("table (4a)", function() return {nil,nil,nil,nil} end)

Add("func (4u)", function(v1,v2,v3,v4) return function() return v1,v2,v3,v4 end end)

Add("table (8h)", function() return {[1]=nil,[2]=nil,[3]=nil,[4]=nil,[5]=nil,[6]=nil,[7]=nil,[8]=nil} end)

Add("table (8a)", function() return {nil,nil,nil,nil,nil,nil,nil,nil} end)

Add("func (8u)", function(v1,v2,v3,v4,v5,v6,v7,v8) return function() return v1,v2,v3,v4,v5,v6,v7,v8 end end)


local v1,v2,v3,v4,v5,v6,v7,v8;

Add("func (1gu)", function() return function() return v1 end end)

Add("func (2gu)", function() return function() return v1,v2 end end)

Add("func (3gu)", function() return function() return v1,v2,v3 end end)

Add("func (4gu)", function() return function() return v1,v2,v3,v4 end end)

Add("func (8gu)", function() return function() return v1,v2,v3,v4,v5,v6,v7,v8 end end)

Add("func2 (0gu)", function()
return (function()
return function() end
end)()
end)

Add("func2 (1gu)", function()
return (function()
return function() return v1 end
end)()
end)

Add("func2 (2gu)", function()
return (function()
return function() return v1,v2 end
end)()
end)

Add("func2 (3gu)", function()
return (function()
return function() return v1,v2,v3 end
end)()
end)

Add("func2 (4gu)", function()
return (function()
return function() return v1,v2,v3,v4 end
end)()
end)

Add("func2 (8gu)", function()
return (function()
return function() return v1,v2,v3,v4,v5,v6,v7,v8 end
end)()
end)

The memory tests use collectgarbage(“count”), and the performance code is preceded by this:

local function newTest(method)
	return function(count, tick0, tick1, spoof)
		local f = method
		
		tick0 = tick0()
		for i = 1, count do
			f()
		end
		tick1 = tick1()
		
		return tick1 - tick0
	end
end

local profiles = {}
local function Add(name, method)
	profiles[#profiles+1] = {
		Name = name;
		Test = newTest(method);
		TestControl = newTest(function() return false end);
	}
end

This is not always the case.
If a single function has sole access to 1 upvalue, the function will use (88 - 40 = 48) total bytes. According to my tests, 8 bytes will be allocated for each additional instantiated function that references that upvalue.
On the other hand, globals use 40 bytes in the hash table, and ~length_of_string + 33 bytes for storing a global’s unique string in Lua’s string hash. I didn’t include this in the tests, but accessing a global in a function does not affect its memory usage or creation speed:
image

If a variable is used one or twice, upvalues will use less memory; If a variable is referenced in hundreds of instantiated functions, globals will use less memory.
This doesn’t account for how much memory the global’s string constant uses internally relative to the script’s data, as I’m not sure how Roblox implements that.

For clarification, I’m trying to suggest features that will make my game run faster without relying on setfenv. As far as I know, no other Roblox game uses a generalized data simplification and compile system like mine does, so my use-case is very unique. This post details my setfenv use-case:
https://devforum.roblox.com/t/do-you-use-setfenv-and-if-so-why/236325/28?u=tomarty

Here’s an interesting paper on the subject of closures:

6 Likes

This only really clarifies on the actual sizes of things, such as an upvalue for Roblox being stored in approximately 44 bytes. But that brings up the question, what kind of device are you targeting where this is a huge problem, and isn’t being caused by something else such as a decision in the programming paradigm?

So to be clear the only things “breaking” are the Incorrect ways of a syntax aka a “hack” type syntax?

My goal is to do as much as I possibly can with the Roblox engine. When something in Lua is slow, it means I can do less of that thing. I want my game to have thousands of trees and hundreds of characters. If an API like raycasting is made faster, it means I’ll be able to run a few dozen more characters, and if traversing/creating Lua structures is made faster, my LOD system will be able to run a few hundred more trees before the game lags. Of course bottlenecks are often not Lua-side, but Lua is the only variable I have direct control over, so I try to improve performance as much as I can.

At this point it seems like you are just trolling and purposely asking the same question which has been answered clearly multiple times. If you are still confused I suggest you reread the thread again. Yes, the only difference with namecall will be not being able to “call” instance methods and normal scripts will work fine. If this doesn’t answer your question I suggest you reread.

3 Likes

As pretty much everyone before explained about twice or thrice, you should not worry about anything that is explained on the developer hub or the official 5.1 Lua documentation. If there are any additional stuff we should take care of, they will be mentioned here.

These won’t break:

game:GetService("Players") --This is the official way to get a service and it will work
workspace:FindPartsInRegion3(...) --This is the how you would normally get the parts in a region
game:GetService("Players"):GetPlayerFromCharacter(char) --Here just as an example to clear things out even further

These are the alternative incorrect variants of the above that will no longer work and were only possible because of a bug:

game("GetService","Players") --not gonna work
workspace("FindPartsInRegion3",...) --still not working
game("GetService","Players")("GetPlayerFromCharacter",char) --definitely not correct
4 Likes

Alrighty, sorry if it seemed as if I was trolling, It wasn’t my intent.

3 Likes

I recall once having performance issues with getting to read mobile gyro/accelerometer. Would this version of Lua help with this sort of thing?

I cannot test this right now as I’m in work. :disappointed:

Does this also mean exploiters would have to completely rework their script injection tools to work with the new VM?

If so double :+1:

I’ll add the test place to the list! Note that we aren’t fully ready to start doing place specific testing - I’ll need to check what the status is, I’ll ping you privately when we’re ready.

1 Like

It’s not very difficult but I don’t think we should. Idiomatic iteration is using pairs/ipairs, and we haven’t seen cases where a call to pairs affects performance enough to care; we are planning to optimize calls to certain builtin functions in other ways.

We wouldn’t expose hashLength like that. I’m not sure what table.find is supposed to do here?

I’m not sure what the instructions would do in this case. Keep in mind that all “special” paths for builtins have to painstakingly handle the setfenv/getfenv case - what if you replace setmetatable with setfenv?

Constant upvalues of primitive types are folded into the functions that need them (and stop being upvalues). We don’t currently optimize locals of complex types such as setmetatable in your example. In general we expect local caching to not be as necessary, and probably won’t go out of our way to make local caching faster.

This requires a different mechanism from setfenv. Would injectfenv I noted earlier in this thread work for your usecase?

We plan to optimize the use of upvalues in certain cases but I’m not sure it would significantly impact the memory use.

We have this optimization in our TODO list but it’s very complex to maintain semantics perfectly especially in presence of setfenv - you can mutate the environment of the created function object after the fact, which is how you can observe the difference…

3 Likes

Both reported bugs - 0/0 misbehaving and very large scripts taking a lot of time to compile - have been fixed in Studio 392 that just went live. Please let us know if you see any other problems with behavior or performance.

We’re getting ready to try this on live games, I expect that we can enable this for some games on server / on desktop next week. Mobile might take a bit longer since we need to make sure all fixes have fully propagated.

1 Like

Some optimization for using next like that would be appreciated just for the sake of legacy code. It was a stylistic choice before and with the new VM it’s actively punished people who made that decision despite it not mattering when they made it.

1 Like

Not really. They just have to change the bytecode conversion format to match with the new VM, which some of them have already finished doing.

Code with next doesn’t run any slower than it used to though, it just doesn’t run as fast as pairs/ipairs.

In general our optimization process relies on identifying things we can improve in code that’s part of our benchmark suite (which is a collection of standalone Lua benchmarks, Lua code that we wrote internally as well as Lua code that some members of the community wrote). This is where the focus is, so we’re more likely to improve something that we see affecting the performance in one or multiple representative tests and less likely to improve something that’s a more niche usecase.

We might implement this specific optimization at some point, but it just isn’t a priority.

2 Likes