Hexadecimal and binary literals in Luau cannot go after 2 ** 64 (parsed as an 64-bit uint rather than an IEEE 754 double)

Blockzez · February 1, 2021, 9:08am

In some cases, in Lua 5.1 (Lua 5.0 and earlier have no hexadecimal literals); when you enter a large hexadecimal literal, it caps at 2⁶³ (or 2³² depending on the platform). If you type in numbers like 0x10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, it’ll return 1.844674407371e+19 or 4294967295 (assuming IEEE 754 double is used as a number) despite the literal being equivalent to 2¹⁰²⁴ and only IEEE 754 double precision as the only numberic data type in Luau (IEEE 754 infinity)
In Roblox’s case, it returns 1.844674407371e+19 (18446744073709551615 technically but because value specified by IEEE 754 double precision floating point format loses precision and rounds integers after 2⁵³ via half even so it’ll print 18446744073709551616)

Though I don’t know anything about what causes this bug, I suspect the bug is caused by C’s compiler; I know that C89 doesn’t support hexadecimal floats while C99 does and Lua’s source code are C89-compliment. I suspect Roblox compiled Luau using some sort of C89-compliment C compiler or that it parses the literal as an 64-bit unsigned integer first as I know that C99 introduced hexadecimal floats. Either way I’m certain that the hexadecimal literal parsed as an 32/64-bit unsigned integer in some implementions of Lua 5.1.

This bug is certainly patched in Lua 5.2 along with the introduction of hexadecimal floats and hexadecimal exponent literal.

Right now, the bug can both be found on the Studio and in-game; for the reproduction, run this part of code in somewhere in Roblox. Doesn’t matter whether it’s a server script or a client script, they both return the exact same result.

local v = 0x10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

if v == 4294967295 then
	print("This bug is not patched - 32-bit unsigned integer")
elseif v == 2 ^ 64 - 1 then
	print("This bug is not patched - 64-bit unsigned integer")
elseif v == math.huge then
	print("This bug has been patched on the version of Lua used")
else
	print(string.format("This bug is not patched or it's interpreted as a true integer - This literal is parsed either as a true integer, as an integer that either clamps at %.0f or as floating point format more precise than IEEE 754 double precision", v))
end

This doesn’t apply to hexadecimal value in tonumber in Roblox’s Lua ergo, in Luau this doesn’t equal even though they’re both the same value logically:

print(0x10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ~= tonumber("0x10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"))

but inserting this hexdecimal value into tonumber is still capped at 2³²/2⁶⁴ for PUC-Rio Lua 5.1.

To conclude this is a bug spanning only in some platforms of Lua 5.1 (I guess if compiled with C89); the bug only persists on Lua implementions where hexadecimal floats and exponents aren’t supported (coincidentally) so tonumber in Luau doesn’t have this bug; Roblox’s Luau for hexadecimal literal is one of them as if you type in print(0x10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) into a script/command bar, it’ll incorrectly parse as 1.844674407371e+19 rather than inf. This also applies to binary literals.

Autterfly · February 1, 2021, 1:05pm

It doesn’t matter which C compiler or libraries are being used because Lua implements its number decoding in its own lexer. The reason why it’s different between tonumber and literals is probably because they use 2 different pieces of code.

Here’s the lexer’s read_numeral implementation in both 5.1 and 5.2:
https://www.lua.org/source/5.1/llex.c.html#read_numeral
https://www.lua.org/source/5.2/llex.c.html#read_numeral

In the earlier version, luaO_str2d is called which just checks if the prefix is x or X. It then explicitly uses C’s long integer converter and casts it to a Lua number. This is probably on purpose, as it wasn’t a “feature” they would have found much use in.

In the later versions, an alternative in-house converter with hexadecimal float support is used instead of the standard one.

Blockzez · February 1, 2021, 1:58pm

I don’t know how did I miss luaO_str2d
Checking the source, I believe that It runs lua_str2number first, which by the looks of it is defined here which is strtod: Lua 5.1.5 source code - luaconf.h.
In C99, strtod accepts hexadecimal floats/exponents, in C89 it doesn’t ergo I suspect that lua_str2number can pass 0x12p34 as a numerical value if you’re using a C99-complient compiler so in C89 it’d only parse 0 and x being the *endptr while in C99 it’d parse 0x12p34 and \0 being the *endptr and it checks if the *endptr is either x or X then it uses C’s long int.
According to this the endptr is

Reference to an already allocated object of type char* , whose value is set by the function to the next character in str after the numerical value.

If the endptr (after the next character of the numerical value) is x or X then it’ll check for C’s long integer.

int luaO_str2d (const char *s, lua_Number *result) {
  char *endptr;
  *result = lua_str2number(s, &endptr);
  if (endptr == s) return 0;  /* conversion failed */
  if (*endptr == 'x' || *endptr == 'X')  /* maybe an hexadecimal constant? */
    *result = cast_num(strtoul(s, &endptr, 16));
  if (*endptr == '\0') return 1;  /* most common case */
  while (isspace(cast(unsigned char, *endptr))) endptr++;
  if (*endptr != '\0') return 0;  /* invalid trailing characters? */
  return 1;
}

Autterfly · February 1, 2021, 2:13pm

My bad, I also completely missed lua_str2number being strtod. That’s an interesting side effect for Lua to have; I would guess then that the reason 5.2 includes its own hex-float converter is to avoid this situation where behavior differs by compiler.

zeuxcg · February 1, 2021, 5:45pm

In Luau, both 0b and 0x formats are intentionally parsed as integers - not floats - so this is not a bug. In both cases the amount of precision Luau guarantees in the input is 64 bits, which is beyond a IEEE 754 double so there should never be a case when you need to use a larger number.