Faster Lua VM: Studio beta

I tested the new Lua VM with an implementation of SHA-256, utilizing the upcoming bit32 library.
Using an input of 260 KB, this is how long it took to compute their hashes:

  • Old Lua VM: 1.2972564697266
  • New Lua VM: 0.17563724517822

These results are quite astonishing! I can’t wait for this to go live!


SHA-256 Implementation

This is just a cleaned up version of an implementation I found on GitHub Gist.

local band = bit32.band
local bnot = bit32.bnot
local bxor = bit32.bxor

local rrotate = bit32.rrotate
local rshift = bit32.rshift

local primes = 
{
	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
	0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
	0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
	0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
	0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
	0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
	0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
	0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
	0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,
}

local function toHex(str)
	local result = str:gsub('.', function (char)
		return string.format("%02x", char:byte())
	end)
	
	return result
end

local function toBytes(value, length)
    local str = ""
	
    for i = 1, length do
        local rem = value % 256
        str = string.char(rem) .. str
        value = (value - rem) / 256
    end
	
    return str
end

local function readInt32(buffer, index)
    local value = 0
	
    for i = index, index + 3 do 
		value = (value * 256) + string.byte(buffer, i)
	end
	
    return value
end

local function digestBlock(msg, i, hash)
	local digest = {}
	
	for j = 1, 16 do 
		digest[j] = readInt32(msg, i + (j - 1) * 4) 
	end
	
	for j = 17, 64 do
		local v = digest[j - 15]
		local s0 = bxor(rrotate(v, 7), rrotate(v, 18), rshift(v, 3))
		
		v = digest[j - 2]
		digest[j] = digest[j - 16] + s0 + digest[j - 7] + bxor(rrotate(v, 17), rrotate(v, 19), rshift(v, 10))
	end
	
	local a, b, c, d, e, f, g, h = unpack(hash)
	
	for i = 1, 64 do
		local s0 = bxor(rrotate(a, 2), rrotate(a, 13), rrotate(a, 22))
		local maj = bxor(band(a, b), band(a, c), band(b, c))
		
		local t2 = s0 + maj
		local s1 = bxor(rrotate(e, 6), rrotate(e, 11), rrotate(e, 25))
		
		local ch = bxor(band(e, f), band(bnot(e), g))
		local t1 = h + s1 + ch + primes[i] + digest[i]
		
		h, g, f, e, d, c, b, a = g, f, e, d + t1, c, b, a, t1 + t2
	end
	
	hash[1] = band(hash[1] + a)
	hash[2] = band(hash[2] + b)
	hash[3] = band(hash[3] + c)
	hash[4] = band(hash[4] + d)
	hash[5] = band(hash[5] + e)
	hash[6] = band(hash[6] + f)
	hash[7] = band(hash[7] + g)
	hash[8] = band(hash[8] + h)
end

local function sha256(msg)
	do
		local extra = 64 - ((#msg + 9) % 64)
		local len = toBytes(8 * #msg, 8)
		
		msg = msg .. '\128' .. string.rep('\0', extra) .. len
		assert(#msg % 64 == 0)
	end
	
	local hash = 
	{
		0x6a09e667,
		0xbb67ae85,
		0x3c6ef372,
		0xa54ff53a,
		0x510e527f,
		0x9b05688c,
		0x1f83d9ab,
		0x5be0cd19,	
	}
	
	for i = 1, #msg, 64 do 
		digestBlock(msg, i, hash)
	end
	
	local result = ""
	
	for i = 1, 8 do
		local value = hash[i]
		result = result .. toBytes(value, 4)
	end
	
	return toHex(result)
end

------------------------------------------------------------------

local input = string.rep(".", 26e4)

local now = tick()
local result = sha256(input)

print(tick() - now)
print(result)

------------------------------------------------------------------
16 Likes

Where did you enable the bit32 library? I want to use that for something I made a while back, but was too clunky and slow to be of any use. Asking here in case others want to know how to.

It’s currently disabled via an FFlag (specifically FFlagLuaBit32), likely because it’s not available everywhere yet. If you enable that flag (you can DM me if you want help with that as this is a public thread), you should be able to access it in Studio.

Not really. The new lua is supposed to be faster, and without you editing any code.

That depends. A lot of people use next in loops instead of pairs, but the optimization only covers pairs. Localizing variables speeds up the old VM greatly when global functions have to be called a lot, but now doing that disables the optimization as well.

I feel like many people coming from other Lua scenes will find it confusing why their carefully optimized code runs worse than some random stuff a child could write.

Also, while everyone says that next and other stuff will run just as fast, nobody considers the reason why they were chosen over something else in the first place.
I had an enormous module for custom pathfinding for high number of units that would apply the same result to all of them by using a slightly modified flow-field algorithm. It worked pretty well, but still slower than a frame, so I had to distribute the iterations over multiple frames to maintain acceptable experience. Now that we have this, I have to rewrite the entire code as there were A LOT of localized variables, usage of custom hashtables (avoiding the use of next) and so on.

This has been stated many times already, but ok, here were go again. You don’t need to rewrite code.

If you have code that runs slower than it does in old VM, you can report this as a bug. If you have code that doesn’t run as fast as you’d like, we can look at what the performance problems are. If your code has a performance problem with for loops due to use of next or localized pairs, we can implement an optimization for this. As stated before, all code we’ve looked at so far doesn’t do either of these.

8 Likes

Do the places have to be “popular” or?

If not then I’d love for my development universe to have this to ensure this works for my upcoming game as early as possible!
Here is the Universe ID: 1012447630

I’m waiting to be added to the beta program so I can test this on a soon to be released game.
I’m extremely excited to try this out, is it cool if I post a load of my results from some of my heavy game scripts here when I can via tick() functions?

I’m not sure if this is off-topic, but will bit32 be enabled at or around the same time as the new VM?

He means it’s up to you to analyze the performance of your game between the old and new Lua VM and see if there’s any sign of obvious performance downgrade, then report it here. They won’t look at your game themselves.

1 Like

You can index unassigned globals without error:
print(a.a)--nil
a=nil print(a.a)--error

6 Likes

bit32 is independent and will be enabled once all mobile clients are updated with the new version so that it’s safe to use.

2 Likes

Thanks! That’s a bug, we’ll take a look.

1 Like

With the new vm coming, is there any chance to get the bit operators/metamethods added along with the library?

What metamethods are you talking about? Wrt bit operators, bit32 library should be sufficient I think.

These metamethods from 5.3 It would only make sense to add them if the operators were added though.

That would require modifying the bit32 library as lua 5.3 has bitwise operators, so it’s not done through functions. The bit32 was imported from lua 5.2 which has no bitwise metamethods, so the functions will have to be modified to call them.

Though it shouldn’t be hard as we already have tostring() calling the __tostring method.

Ok - gotcha - that’s Lua 5.3. Our current plans don’t include adding bitwise operators or first-class integers.

Do you know if there will every be a change from 5.1 to 5.2/5.3?

Adding onto this; I’m sure with the new parser implemented by Roblox, it would be easier to implement Lua 5.3 syntactic features, and it could be a nice goal even if not in the near future.

The implementation of bitwise operators (with or without the integer subtype), goto, and maybe even string hex escapes could introduce some nice shortcuts for operations that would otherwise be harder to write as function calls or boilerplate. Bit operations I can also see having dedicated opcodes in the new VM to be blazing fast.

Empty statements would also be one of those improvements without an use really but that’d still be nice to have.

1 Like