JSONEncode inflates any float number to 33 bytes

rogeriodec_games · October 26, 2021, 12:56am

Reproduction Steps

Run this script:

local HttpService = game:GetService("HttpService")
local tab = {
	["$5"] =  {
		["1"] = 15,
		["2"] = -0.098,
		["3"] = -330
		}
}

print('Table: ', tab)
print('JSON: ', HttpService:JSONEncode(tab))

It will print:

  Table:   ▼  {
    ["$5"] =  ▼  {
       ["1"] = 15,
       ["2"] = -0.098,
       ["3"] = -330
    }
  }
  JSON:  {"$5":{"2":-0.0980000000000000037747582837255,"3":-330,"1":15}}

Expected Behavior
The JSON float number should NOT be converted to 33 bytes.

Actual Behavior
This is a very serious problem, since the DataStore stores data in JSON format, and in this case, float numbers will have an absurd increase in size.
For example:
A table of 1 million positive floats using only ONE DECIMAL PLACE (1.3, 0.5, 3.7, etc) originally takes up 3 bytes per number. In this way, occupation 3 million bytes.
But, as we see, due to this bug, the same table will occupy 33 bytes per number (11 times more!), generating a final size of 31 MB, making it impossible to use it in DataStore, as this is limited to 4 MB.

Workaround
In this case, I am forced to transform the float number into a string, which will take up two additional bytes for the quotes.

While it’s possible to work around this bug, it is important to investigate, as it is likely that many developers can inappropriately bump into a DataStore limit.

Issue Area: Engine
Issue Type: Other
Impact: Moderate
Frequency: Constantly

anon81993163 · October 29, 2021, 7:24am

It is impossible to fix floating-point errors, they keep coming back anyways because the binary has a limit on how precise the real number can be. You can only manage.

rogeriodec_games · October 29, 2021, 8:11pm

I have to disagree.
Since HttpService:JSONEncode is a function developed in-house in C, just do something like:

“If the converted final float value is different from the original float value (informed in the original JSONEncode call), use the original value instead of the final value”

mutex_lock · October 29, 2021, 10:04pm

local n = 0.1
print(tostring(n), game:GetService("HttpService"):JSONEncode(n))

Output: 0.1 0.100000000000000005551115123126
If it’s possible to print the number as a string without floating-point errors, it should also be possible to include it in a JSON string without floating-point errors.

There is no point in saving the rounding error in JSON because that level of precision is not even usable

print(string.format("%.55f",0.1000000000000000011111111))
> 0.1000000000000000055511151231257827021181583404541015625

Blockzez · October 29, 2021, 10:31pm

This has nothing to do with floating point errors.
This is a formatting issue.
V8 prints -0.098 as -0.098 and uses the same internal representation (except in rare and edge cases) as Luau number.

mutex_lock:

local n = 0.1
print(tostring(n), game:GetService("HttpService"):JSONEncode(n))
Output: 0.1 0.100000000000000005551115123126
If it’s possible to print the number as a string without floating-point errors, it should also be possible to include it in a JSON string without floating-point errors.

It doesn’t help the fact that tostring on numbers are rounded to 14 significant digits which doesn’t make all values reproducible with floats more than 14 significant digits like 2⁵².

To OP:
My guess is probably because JSONEncode probably uses the exact decimal representation rather than the shortest decimal representation of the binary float while being correct like the one in V8.
I don’t know how it internally converts from number to string though.

VisualPlugin · October 30, 2021, 2:01am

I would just convert the floats to strings under the %.5f format specifier. The issue here is twofold: unneeded double quotes and, in the case of smaller decimal fractions, excess zeroes.

Blockzez · October 30, 2021, 10:29am

This is not a great advice.

First, why 5 fractional digits? Why not convert float to string in a similar manner to V8’s Number.toString?

Second, it doesn’t help that Luau VM number type is an IEEE 754 double precision floating point format therefore you can’t reproduce all the representation of all of the values. __tostring have the same problem.

rogeriodec_games · October 30, 2021, 1:07pm

My main concern, as I said in op, is the huge wasted space in Data store, since all there is converted to JSON…

rubitonlive · October 30, 2021, 3:03pm

I would consider using a binary format, or at least storing the float value in binary (which would take up a constant 4 bytes). string.pack would be useful in this case; you could also considering using something like BSON.

rogeriodec_games · October 30, 2021, 4:08pm

I don’t understand what this have to do with the space wasted in Data store…

rubitonlive · October 30, 2021, 4:20pm

I am suggesting a way for you to shrink the 33-bytes textual representation of your number into a static 4-bytes binary format.

buildthomas · October 30, 2021, 4:42pm

There seem to be two separate issues discussed in this thread:

JSONEncode inflates numbers unnecessarily (real issue), which inflates the length of the resulting string which may be inconvenient for some use cases (such as logging it, or sending it over HttpService)
Datastores usage is inefficient as a result because it is stored in JSON, but it’s not clear that JSONEncode is the same implementation as how the datastore back-end stores the data. So this seems only an assumed issue and not an empirical one.

Is the second one an actual issue you have encountered, or are you just assuming?

For what use cases are you storing millions of numbers raw in json? At this point you probably want to binary-pack the data anyway, since storing numbers in text form is still a waste of space.

VisualPlugin · October 30, 2021, 7:20pm

Five is an arbitrary value I chose; it can represent 64ths (increments of 0.015625) with full accuracy. In most cases I can think of, (decimal) floats are more often divided into powers of ten than into powers of two. You do bring a valid point; you might come across a number that before converted into JSON, would compare correctly with a constant, but not so when that same number is loaded from the JSON.

wravager · October 30, 2021, 8:19pm

PHP also seems to have this issue in earlier versions as floating points do not have a direct binary representation (see here). Even though integers in Lua are still floats, I’d assume the issue is avoided here as they have a direct binary representation.

As for a solution it seems like you will either have to deal with the extra (although minimal if done correctly) data in the encoding or deal with converting the short version as a string before passing it through.

To add to this, possibly JSONEncode is using an earlier implementation of Lua as Lua 5.3 seems to have changes for tostring(number)? (see here)

Blockzez · October 30, 2021, 9:39pm

Luau is Lua 5.1-based.
(Like I said before) All Luau VM numbers are represented in the IEEE 754 double precision floating point format.

Which is not trivial to implement!

I don’t think that this has something to do with “direct” binary representation.
For example, 0.5 is represented in the shortest and exact decimal representation as 0.5 in virtually all implementations of double to ascii converter.
Define “direct” binary representation.
Obviously, Infinity, NaN(x), and -0 is also a float and they don’t have a “direct” binary representation and they’re not an issue either.

wravager · October 30, 2021, 9:49pm

There can be hard coded implementations for those numbers. See the links I posted for reference of what I was trying to convey about the tostring() conversions.

rogeriodec_games · October 30, 2021, 10:26pm

Currently, I have manually truncated the float numbers to a few digits, just to save space in the DataStore.
At the same time, although I know the data is stored as JSON in the DataStore, I’m not sure if float numbers suffer the same bug in the final implementation when they’re written in DS.
So I left this as a warning, to be investigated because as I said, it’s possible that other developers bump into a space limitation unfairly just because they’re unnecessarily wasting space with 33-byte float numbers.

buffyreal123 · December 8, 2021, 2:17am

Thanks for reporting and this is fixed now. Please let us know if you still see this issue.

buildthomas · December 8, 2021, 5:23pm

This doesn’t seem like the same issue – I recommend going to #help-and-feedback:scripting-support , finding out a minimal repro if it is a bug, and then posting it as a separate bug report.

system · October 25, 2022, 10:20pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.