Datastore operations will throw an error when the data that is attempted to be stored contains ASCII characters with a value higher than 127.
How to reproduce:
Open a baseplate and upload it to a game with Studio API access.
Run the following in command bar:
local data = game:GetService("DataStoreService"):GetDataStore("Test")
data:SetAsync("TestKey", "\219")
Observe output.
Observed behavior:
An error is thrown:
14:21:02.178 - 104: Cannot store string in DataStore
The error message is not specific for starters (it could mention why it can’t store the string), and it shouldn’t error because the documentation mentions nothing about these limits, only a size limit on the string as a whole.
Expected behavior:
The call should store the value properly and also be able to retrieve the value properly from Datastores again. It would also be useful that the entire ASCII range can be used as this allows for tighter packing of bits for encoding/decoding algorithms.
If that is not possible, the documentation needs to be adjusted instead.
If I remember correctly the rule here is actually that everything must be valid UTF-8. In UTF-8 values > 127 are used exclusively for encoding multi-byte codepoints. So a single byte > 127 can’t be valid UTF-8, but “\239\191\189” (� , the replacement character U+FFFD UTF-8 encoded) should be valid.
Yes we need to document this, and the error message should be improved…
Lua strings are wrapped ASCII char arrays, though, so wouldn’t it be possible to just store the string as a byte/char array with the size marker within the DataStore? Sort of how Lua handles TString. As thomas mentioned this would allow for much greater flexibility and encoding/compression magic.
The datastore uses JSON encoding. If you JSONEncode the data, that’s what is being stored internally. Any required escape sequences in strings contribute to your size limit. The size limit is based on the number of bytes in the UTF-8 JSON string.
Because of this and several other backend details everything has to be valid UTF-8. It’s a bit unfortunate because this is the reason we can’t handle tables with mixed keys in addition to arbitrary binary strings. JSON can’t encode it, and the backend can’t round trip it.
Changing this would be very high risk and take a lot of work unfortunately. Doing this right could easily take a good engineer half a year or more. It’s unlikely we’ll mess with it…
Anyone sane would agree with you, but raw binary might be a hard sell. There’s just no one really focused on really overhauling DataStores at the moment.
It’s not an issue for internationalization at least. DataStore used to be ASCII only, but I fixed it to support UTF-8 a while ago. With this you can store any Unicode text that could possibly be displayed on Roblox.