Datastores: Strings cannot contain ASCII characters with a value higher than 127

Datastore operations will throw an error when the data that is attempted to be stored contains ASCII characters with a value higher than 127.

How to reproduce:

  1. Open a baseplate and upload it to a game with Studio API access.
  2. Run the following in command bar:
local data = game:GetService("DataStoreService"):GetDataStore("Test")

data:SetAsync("TestKey", "\219")
  1. Observe output.

Observed behavior:

An error is thrown:

14:21:02.178 - 104: Cannot store string in DataStore

The error message is not specific for starters (it could mention why it can’t store the string), and it shouldn’t error because the documentation mentions nothing about these limits, only a size limit on the string as a whole.

Expected behavior:

The call should store the value properly and also be able to retrieve the value properly from Datastores again. It would also be useful that the entire ASCII range can be used as this allows for tighter packing of bits for encoding/decoding algorithms.

If that is not possible, the documentation needs to be adjusted instead.

Additional behavior:

Credit to @Anaminus for the find.

Also happens with tables that contain these characters anywhere:

> data:SetAsync("TestKey", {"\217"})
14:28:54.490 - 104: Cannot store Array in DataStore

> data:SetAsync("TestKey", {["\169"] = 2})
14:28:20.389 - 104: Cannot store Dictionary in DataStore

> data:SetAsync("TestKey", {Index = "\182"})
14:31:00.101 - 104: Cannot store Dictionary in DataStore

Error messages are, again, too vague as well. Refer to this thread for more cases.

7 Likes

If I remember correctly the rule here is actually that everything must be valid UTF-8. In UTF-8 values > 127 are used exclusively for encoding multi-byte codepoints. So a single byte > 127 can’t be valid UTF-8, but “\239\191\189” (� , the replacement character U+FFFD UTF-8 encoded) should be valid.

Yes we need to document this, and the error message should be improved…

9 Likes

Lua strings are wrapped ASCII char arrays, though, so wouldn’t it be possible to just store the string as a byte/char array with the size marker within the DataStore? Sort of how Lua handles TString. As thomas mentioned this would allow for much greater flexibility and encoding/compression magic.

2 Likes

The datastore uses JSON encoding. If you JSONEncode the data, that’s what is being stored internally. Any required escape sequences in strings contribute to your size limit. The size limit is based on the number of bytes in the UTF-8 JSON string.

Because of this and several other backend details everything has to be valid UTF-8. It’s a bit unfortunate because this is the reason we can’t handle tables with mixed keys in addition to arbitrary binary strings. JSON can’t encode it, and the backend can’t round trip it.

Changing this would be very high risk and take a lot of work unfortunately. Doing this right could easily take a good engineer half a year or more. It’s unlikely we’ll mess with it…

As a developer, I just want to save a blob of bytes.

I don’t want to think about how ROBLOX serializes it. That is an implementation detail that I should be able to be ignorant of.

Maybe create a new API that is better and deprecate the old one?

There should probably be two API calls. One to do a raw save of a block of bytes and one that serdes an object to/from a Lua table.

Seems like this will be a serious issue for internationalization.

11 Likes

Anyone sane would agree with you, but raw binary might be a hard sell. There’s just no one really focused on really overhauling DataStores at the moment.

It’s not an issue for internationalization at least. DataStore used to be ASCII only, but I fixed it to support UTF-8 a while ago. With this you can store any Unicode text that could possibly be displayed on Roblox.

5 Likes

Write raw binary to a key, like a playerId

People want to serialize stuff without bit-twiddling everything.

1 Like