Improving Binary and Hexadecimal Integer Literal Parsing Rules in Luau

WheretIB · August 19, 2022, 5:31pm

Hello Developers!

In the coming days we will begin rolling out improvements regarding binary and hexadecimal integer literal parsing rules. In some cases this could affect your experience. To help with this we have a rollout that involves multiple stages of in studio warnings as well outreach to developers affected.

Please take a moment to read over these changes and feel free to ask questions or provide feedback.

Overview:

The logic that was used to parse binary and hexadecimal integer literals resulted in some literals not being expressed as intended by the developer. They were able to be formatted in a way that doesn’t produce the number you might intend and, in some cases, allows for syntax that is invalid in the original Lua language.

With that in mind, here are three issues that are being addressed.

Silent Hexadecimal Integer Literal Overflow

Luau parses hexadecimal and binary literals as 64-bit integers before converting them to Luau numbers.

As a result, numbers that exceed 2^64 are silently truncated to 2^64, this can result in unexpected program behavior.

Going forward, there will be a linter warning in Script Analysis to notify you about places where such truncation is happening. For example:


-- Hexadecimal number literal exceeded available precision and has been truncated to 2^64

local x = 0x11111111111111111111111111111111111AA

-- The line above results in the same number as this:

local x = 0xffffffffffffffff

For some context, Lua 5.1 would have been parsed this as 2.3787461545099e+43 or 4294967295 (0xffffffff), depending on the C compiler version. In Lua 5.3, the last 16 hexadecimal digits are taken into account which results in 1229782938247303594 (0x11111111111111AA).

Looking ahead, we can revisit how parsing of large integers are performed and if there is a need to implement hex floats.For now it’s important for us to report code that does not behave as might be expected.

Silent binary integer literal overflow

The same issue happens with binary integer literals and a new linter warning will be generated as well. For example:


-- Binary number literal exceeded available precision and has been truncated to 2^64

local x = 0b11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

-- The line above results in the same number as this:

local x = 0b1111111111111111111111111111111111111111111111111111111111111111

Double ‘0x’ prefix

It was possible to make a mistake and write ‘0x’ two times in front of a hexadecimal integer literal:


-- Hexadecimal number literal has a double prefix, which will fail to parse in the future; remove the extra 0x to fix

local a = 0x0x20

-- Same as

local a = 0x20

Rollout

For binary and hexadecimal integer literal overflow, we plan to enable the linter warnings this week and keep it enabled for the foreseeable future.

We will be reaching out to specific experience developers to make the fix. Please keep an eye out for communications from us if you are seeing linter warnings.

Once we are sure that experiences on the platform are fixed, we will revisit this and consider changing it to a parsing error.

For double hexadecimal integer literal prefix, we have a stricter plan:

Starting this week, Studio will generate a lint warning for double hexadecimal integer literal prefix. We will reach out to developers that have been affected by this.

On September 19, 2022, this will become a hard parsing error in Studio and script with an error will not run. Then October 19, 2022, this will become a hard parsing error on Client and Server as well.

Please let us know if you have any questions or concerns.

Thanks!

EDIT: we have relaxed the timeline to give developers more time to apply required changes.

system · August 19, 2022, 5:41pm

This topic was automatically opened after 10 minutes.

Xan_TheDragon · August 19, 2022, 5:44pm

Is it intentional that the warning displays for values greater than 2^64? I ask because Roblox cannot accurately represent integers above 2^53, and as such I would think that this is where warnings should start to arise.

jitlua · August 19, 2022, 5:49pm

(post deleted by author)

zeuxcg · August 19, 2022, 5:59pm

We have considered also warning for integers that aren’t exactly representable even if they are below 2^64 - we may do this in the future, but for now we just flag the specific issue with integers that are too long for the parser to handle.

metatablecatmaid · August 19, 2022, 7:37pm

Im curious why Luau isn’t actually capable of 64 bit integers (signed or unsigned), considering a lot of APIs rely on the int64 type.

Kampfkarren · August 19, 2022, 9:07pm

It’s important to highlight integer support and bitwise operators. For Luau, it’s rare that a full 64-bit integer type is necessary - double-precision types support integers up to 2^53 (in Lua which is used in embedded space, integers may be more appealing in environments without a native 64-bit FPU). However, there’s a lot of value in having a single number type, both from performance perspective and for consistency. Notably, Lua doesn’t handle integer overflow properly, so using integers also carries compatibility implications.

Ksuose · August 20, 2022, 2:00pm

Glad that ROBLOX is finally taking steps on improving LuaU. Unfortunate that I’m fully letting go off developing soon, it’s been great, keep 'em coming!

ffrostfall · August 20, 2022, 2:17pm

Roblox has been taking steps to improve Luau for a long time, I’m not sure what you mean?

blobbyblob · September 12, 2022, 1:03pm

Would you mind tweaking this language in the lint warning to use “2^64-1” instead of “2^64”? I ran into this today where I was trying to represent 2^64, and got this warning. It seemed silly, since that was the number I was trying to represent and therefore, it wouldn’t be an issue if it got truncated to that value.

zeuxcg · September 12, 2022, 4:06pm

If you need 2^64, does anything prevent you from typing 2^64? 2^64-1 isn’t representable by double-precision floating point numbers, so the overly long number literals are indeed truncated to 2^64 - the lint warning text is correct.

blobbyblob · September 12, 2022, 5:17pm

Yeah, I’m not concerned about changing the parser to allow representing 2^64 as 0x100…00, I just felt the error message should be more precise. The value wasn’t truncated to 2^64, it was truncated to 2^64-1.

Edit: and I realize the above code has some issues because doubles only have ~54 bits of precision, I just wrote it as an example & don’t want that to detract from my point.

zeuxcg · September 16, 2022, 4:09am

No, both 0x1_0000_0000_0000_0000 and 0xFFFF_FFFF_FFFF_FFFF are interpreted as 2^64:

print(0x1_0000_0000_0000_0000 == 0xFFFF_FFFF_FFFF_FFFF)
-- output: true
print(0x1_0000_0000_0000_0000 == 2^64)
-- output: true
print(math.frexp(0x1_0000_0000_0000_0000))
-- output: 0.5 65

blobbyblob · September 16, 2022, 11:57am

Oh right, of course – forgot that after being interpreted as an unsigned 64 bit number, it gets converted to a double and would be rounded to fit in 52 bits of precision.

It still feels wrong to have the linter say to me, “sorry, you can’t express this number – we’ll interpret it as that same number”. My first response is “OK, so there’s no problem, why bother me with this?” I really just want that lint warning to be more clear as to what it’s trying to defend against so I don’t have to go on a hunt to figure out what it really means.

Maybe be clear that there should be only 16 digits in a hex representation?

zeuxcg · September 16, 2022, 5:29pm

Ah, I see your point - the lint is really targeting the more generic problem though, eg 0xabcd_1234_5678_abcd_1234. Specifically for 0x1_0000_0000_0000_0000 you’re correct in that this number is a) exactly representable as written, b) unlikely to ever change the parsing behavior.

system · January 14, 2023, 5:30pm

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.