Luau Progress Recap

A few months ago, we’ve released our new Lua implementation, Luau (Faster Lua VM Released) and made it the default for most platforms and configurations. Since then we’ve shipped many smaller changes that improved performance and expanded the usability of the VM. Many of them have been noted in release notes but some haven’t, so here’s a recap of everything that has happened in the Lua land since September!

Debugger beta

When we launched the new VM, we did it without the full debugger support. The reason for this is that the new VM is substantially different and the old implementation of the debugger (that relied on line hooks) just doesn’t work.

We had to rebuild the low level implementation of the debugger from scratch - this is a tricky problem and it took time! We are excited to share a beta preview of this with you today.

To use this, simply make sure that you’re enrolled in the new Lua VM beta:

image

After this you can use the debugger as usual. If you see any bugs, please feel free to report them!

Performance improvements

  • The for loop optimization that specializes pairs/ipairs now works for localized versions of these globals as well, as well as next, table expressions
  • a^k expressions are now faster for some trivial values of k such as 2 and 0.5
  • Calling methods and accessing properties on deeply nested Roblox objects is now significantly faster than it used to be (~2x faster for objects that have an 8-deep nesting) - the cost is now independent of the hierarchy depth.
  • Accessing .X/.Y/.Z properties on Vector types is now ~30% faster
  • On Windows and Xbox, we’ve tuned our interpreter to be ~5-6% faster on Lua-intensive code
  • For a set of builtin functions, we now support very quickly calling them from VM via a new fastcall mechanism.

Fastcall requires the function call to be present in source as a global or localized global access (e.g. either math.max(x, 1) or max(x, 1) where local max = math.max). This can be substantially faster than normal calls, e.g. this makes SHA256 benchmark ~1.7x faster. We are currently optimizing calls to bit32, math libraries and additionally assert and type. Also, just like other global-based optimizations, this one is disabled if you use getfenv/setfenv.

Lua library extensions

We’ve implemented most library features available in later versions of upstream Lua, including:

  • table.pack and table.unpack from Lua 5.2 (the latter is same as global unpack, the former helps by storing the true argument count in .n field)
  • table.move from Lua 5.3 (useful for copying data between arrays)
  • coroutine.isyieldable from Lua 5.3
  • math.log now accepts a second optional argument (as seen in Lua 5.2) for the logarithm base

We’ve also introduced two new functions in the table library:

  • table.create(count, value) can create an array-like table quickly
  • table.find(table, value [, init]) can quickly find the numeric index of the element in the table

Autocomplete support for table.create/table.find will ship next week

Lua syntax extensions

We’ve started taking a look at improving the Lua syntax. To that end, we’ve incorporated a few changes from later versions of Lua into the literal syntax:

  • String literals now support \z (skip whitespace), \x (hexadecimal byte) and \u (Unicode codepoint) escape sequences

and implemented a few extra changes:

  • Number literals now support binary literals, e.g. 0b010101
  • Number literals now support underscores anywhere in the literal for easier digit grouping, e.g. 1_000_000

Note that the literal extensions aren’t currently supported in syntax highlighter in Studio but this will be amended soon.

Error messages

Error messages are slowly getting a bit of love. We’ve improved some runtime errors to be nicer, in particular:

  • When indexing operation fails, we now specify the key name or type, e.g. “attempt to index foo with ‘Health’”
  • When arithmetic operations fails, we now specify the type of arithmetic operation, e.g. “attempt to perform arithmetic (add) on table and number”

We’ve also improved some parse errors to look nicer by providing extra context - for example, if you forget parentheses after function name in a function declaration, we will now say Expected '(' when parsing function, got 'local'.

We are looking into some reports of misplaced line numbers on errors in multi-line expressions but this will only ship later.

Correctness fixes

There are always a few corner cases that we miss - a new Lua implementation is by necessity subtly different in a few places. Our goal is to find and correct as many of these issues as possible. In particular, we’ve:

  • Fixed some cases where we wouldn’t preserve negative zero (-0)
  • Fixed cases where getfenv(0) wouldn’t properly deoptimize access to builtin globals
  • Fixed cases where calling a function with >255 parameters would overflow the stack
  • Fixed errors with very very very long scripts and control flow around large blocks (thousands of lines of code in a single if/for statement)
  • Fixed cases where in Studio on Windows, constant-time comparisons with NaNs didn’t behave properly (0/0==1)

Also, the upvalue limit in the new VM has been raised to 200 from 60; the limit in Lua 5.2 is 255 but we decided for now to match the local limit.

Script analysis

Along with the compiler and virtual machine, we’ve implemented a new linting framework on top of Luau which is similar to our old script analysis code but is richer. In particular, we support a few more checks that are enabled by default:

  • Unreachable code warning, for cases where function provably doesn’t reach a specific point, such as redundant return after a set of if/else statements where every branch returns or errors.
  • Unknown type warning, which was emitted before for Instance.new/GetService/IsA calls, is now also emitted when the result of type/typeof is compared to a string literal
  • We now recognize and flag mistaken attempts to iterate downwards with a for loop (such as for i=9,1 or for i=#t,1 as well as cases where numeric for loop doesn’t reach the stated target (for i=1,4.5)
  • We now detect and flag cases where in assignment expressions variables are implicitly initialized with nil or values are dropped during assignment
  • “Statement spans multiple lines” warning now does not trigger on idiomatic constructs involving setting up locals in a block (local name do ... name = value ... end)

We also have implemented a few more warnings for common style/correctness issues but they aren’t enabled yet - we’re looking into ways for us to enable them without too much user impact:

  • Local variables that shadow other local variables / global variables
  • Local variables that are assigned but never used
  • Implicit returns, where functions that explicitly return values in some codepaths can reach the end of the function and implicitly return no values (which is error-prone)

Future plans

There’s a fair bit of performance improvements that we haven’t gotten to yet that are on the roadmap - this includes general VM optimizations (faster function calls, faster conditional checks, faster error handling including pcall) and some library optimizations (in particular, Vector3 math performance improvements). And we’re starting to look into some exciting ways for us to make performance even better in the future.

Also we’re still working on the type system! It’s starting to take shape and we should have something ready for you by the end of the year, but you’ll learn about it in a separate post :smiley:

As always don’t hesitate to reach out if you have any issues or have any suggestions for improvements.

154 Likes

Does this include typeof or is it exclusively type? I’m not sure if it makes sense to do it for typeof (it may not be much faster) but that’s bound to see more use in Roblox code.

If we were to want to pitch additional fastcall optimizations, would we make a feature request for that, or would we post here?

3 Likes

Love the updates with Luau, glad that new methods are coming out to make things easier (table.find has me really hyped!!)

I do have one question: What is the advantage to using table.create over directly creating an array (local a = {1,2,3,...})? I don’t see any differences between the two. Amazing updates overall!

3 Likes

omg YES! Finally, I don’t need to loop through tables to find if a value exists, I’ve always found those types of loops quite annoying and unnecessary, and this takes it one step further, returning the index. Big thumbs up! :+1:

10 Likes

It’s more performant to use table.create most likely and it’s also much smaller. Example:

-- Previously you'd do this
local tbl = {} -- Empty table
for _=1, 100 do -- _ to void variable. Loop 100 times (100 0s)
	table.insert(tbl, 0) -- Fill table with zeros
end

-- Now you can do this
local tbl = table.create(100, 0) -- This is done in C/C++ so its almost gauranteed to be a bit faster

Speaking of table.create, if you call table.create(n, varThatIsNil) this will just be an empty table. Internally would anything happen if a second argument is not supplied or is nil? Or would an error be thrown?

Edit: Doing a quick little test shows that something does happen. If you call table.create(100000) or table.create(100000, nil) for example your studio will freeze up even though creating a table with nil values is unnecessary since the nil values will be ignored in all cases. Might make sense to return an empty table if the value argument is nil and save some resources even if its rare.

4 Likes

One of the use cases is having the table created at a specific size, to avoid re allocations. Often times when each element wouldn’t be the same.

local t = table.create(1000)
for i=1,1000 do
    t[i] = i
end

Filling it in with nil would be fine in this case.

6 Likes

When I saw the new script analysis warnings, I thought I would have lots of warnings in my game’s large code base… Turns out I only had two!

In the future, I would like to know where I can disable some of these warnings, and I kind of would prefer some of them to be off by default, or have varied settings per place (though I’ll probably keep them on for most projects).

One of the most annoying opinionated warnings for me is the “This statement spans multiple lines; use indentation to silence”, because it can be overbearing in many scenarios. Overall, and with the addition of static type safety, these tools can be powerful, but I hope to see more granular user control coming along with it.

4 Likes

I should mention that we fixed a case where this warning was triggering on idiomatic constructs that involved local declaration e.g.

local foo do
...
foo = value
...
end

In general this warning exists because in some types of code, the expression that overflows to the next line without indentation is naturally read as two separate expressions (because Lua doesn’t enforce statement terminators). This can (and has) lead to confusing bugs.

3 Likes

Just type - not typeof for now. This thread probably is a reasonable place for requests like this - normally a separate thread would be better but it’s a bit easier to track replies to the thread in this case.

There’s a very specific reason for us to not do this. Consider code like this:

local t = {}
for i=1,10000 do
t[i] = i
end

The reason why table.create supports nil is because this code can be rewritten like this for a solid performance win:

local t = table.create(10000)
for i=1,10000 do
t[i] = i
end

In this case table.create preallocates storage for the table without changing its length to make subsequent table assignments faster. Obviously if you don’t know what the size of the table is going to be, preallocation isn’t effective.

8 Likes

what’s the specifics on this
is this common use based or based on a certain bound of exponents

This is based on commonly used exponents, not a range.

1 Like

Wow, amazing work!

Just one question, will we get a table.map() function to apply a function to every value in the table? Or should we just use loops for now?

2 Likes

Is there any news you can share on multi-threaded Lua execution? Would it allows us to use 100% of the CPU power through Lua?

13 Likes

Alrighty. Well in that case, is there a chance we could get these fastcall optimizations for typeof too? :stuck_out_tongue: I prefer to use it in Roblox for basically the same reason you’d use type (type checking arguments, primarily) so having it be faster would be a plus.

I would also like that sort of optimization on string.byte because it ends up in stuff like hashing or base conversion a bunch. There may not be common enough use to justify it though.

Thank you for clearing that up! That does make a lot of sense. The above reply also made me think… Are string functions already using this optimization? Some of my code does tons of string processing (gsub, sub, gmatch, etc). Depending on the size of the content I’m processing I could be making hundreds to hundreds of thousands of string calls.

I wrote a basic compression algorithm which does this and I have had 10-15kb strings that get fed through it which is 10-15k iterations and many more calls.

We had a version of string.byte using this that accelerates our Base64 benchmark, but string.byte is slightly tricky so we haven’t completed the implementation to be production-ready yet.

Please share the code for this with us if you can (PM works) - we can then add this to our benchmark suite, so that we can focus on updates that clearly improve it. It’s hard to tell without testing whether a specific optimization is impactful or not, and we want to carefully expand the builtins because for them to be fast, they by necessity have to replicate the original function’s logic so we need to be careful to keep the behavior exactly identical.

Out of curiosity, do you use typeof as a blanket replacement for type and use it for primitive types a lot, or do you mostly use type for primitive types and typeof for Roblox types? Code examples where you often use typeof would be appreciated - I think our typeof coverage in our benchmark suite is lacking. type specifically improved our Roact benchmark by like 5% because it had a number of type assertions.

Not ready to talk about this yet, sorry - we have not started the implementation. The goal for this project is indeed to allow Lua to use ~100% of CPU power, but there’s going to be caveats and this is going to ship late 2020.

8 Likes

We don’t have plans to implement table.map. Our general policy for new Lua functions is:

  • A function gets implemented if it’s impossible to implement yourself in Lua or the implementation is really involved
  • A function gets implemented if it’s very often used and everybody ends up reimplementing it
  • A function gets implemented if it gives non-trivial performance benefits over a reimplementation in Lua

Generally two of these should be true for a function to be added.

The reason why we implemented table.create is a combination of 1 and 3. You can not implement an efficient replacement for this function in some cases, and in some cases you can but the implementation is crazy (I think @Tomarty has one?)

The reason why we implemented table.find is a combination of 2 and 3. It’s often used so it makes sense to have it in the library, and it’s 2-3 times faster than Lua implementation.

table.map doesn’t really fit these right now:

  • It’s easy to implement in Lua
  • It’s not a commonly used mechanism in typical Roblox code (I recognize that some people are familiar with functional programming constructs, but the scope is just different vs table.find et al)
  • It’s not going to be faster if we implement it in C
  • Moreover, it is likely to be slower because every function call will go through C->Lua boundary
  • Every time you call this function, you’d have to allocate a closure for the transform function. So we would be inviting inefficient code.

Coincidentally we plan to implement some closure allocation optimizations that may make the last point a non-issue in the future, but the design hasn’t been finalized - as usual, there’s some odd interactions with getfenv/setfenv (aka my worst enemy).

So yeah, please use for loops for now.

3 Likes