“Data stores can only accept valid UTF-8 characters” error appearing

My game started having issues saving data to the DataStores causing player data to be lost. This error is appearing on the Server:

DataStoreService: CantStoreValue: Cannot store Dictionary in data store. Data stores can only accept valid UTF-8 characters. API: UpdateAsync, Data Store: ReservedServers

“ReservedServers” is my custom datastore key for holding reserved server data when players create a custom roleplay, and this is also happening to another datastore called “ColorSaveFile” that saves custom character customization data. This is not affecting any of my other datastores. What these two datastores have in common is that they both save user-generated strings.

This suddenly started happening on February 24th, 2025. I have not made any changes to how data is saved to the datastores, or what type of data is saved. There were also no changes or updates made to the game that day.


I am still having this problem and it is severely affecting the game because it is causing save data loss. The game has been able to save user-generated strings to DataStores for several years with no issue.

A private message is associated with this bug report

1 Like

Sorry, looks like we haven’t fully solved this. I’ll get my team to take a look.

For these two Data Stores – what is the full list of Luau types you are trying to store? Here is the list of supported types: Versioning, listing, and caching | Documentation - Roblox Creator Hub.

Is it possible you’re attempting to store some object, e.g. Vector3 inside of these Data Stores?

For values that are failing – can you please try running HttpService:JSONEncode on the table? This should reproduce the same failure that Data Stores is giving you.

1 Like

Yes, I was able to capture an example of a save file that failed to save in a live game.
The printed output contains the type of the value, and the index names of the table that it’s nested in. All of the values are tables of strings, booleans, or numbers. I save Vector3 values as a table of 3 numbers.

I looked through the entire output, and it doesn’t seem like there is any difference between the values that this is trying to save versus the values that save without issue. This only happens to a small percentage of players.

I am guessing that players are able to put in some kind of unrecognized symbol in strings. Forcing those values to be strings with tostring() still causes the error. This is only a guess because I’m not sure.










Thanks! Could you try running HttpService:JSONEncode on the user-supplied input after your tostring()? If that’s the issue, you should also see a failure there.

1 Like

I ran HttpService:JSONEncode on the tables that the DataStore was failing to save, and they errored.

After doing print statement to see the Lua types of every value, all of the values that are attempting to be saved are values that DataStores accept (strings, booleans, and numbers), even on the tables that are failing to save.

However, I finally found what I think is the cause of the issue. Players are able to type unrecognized symbols, and these characters cannot be saved in a DataStore, even though they are strings. I made print statements to show the Lua Type and content of the strings, and all of the ones that are failing have the same thing in common. They are strings that hold a strange symbol:

(These are examples of roleplay names and bios that players tried to save but couldn’t that were captured with print statements)





image
image
image
image
image

It seems like on February 24th, DataStore stopped being able to encode data for this emoji/character.

If you need the user IDs of these players, I can send them in a DM.

1 Like

These are likely non-UTF-8 characters:

Wow, thanks so much for this. This looks exactly like my problem. Weird that it only started happening to me on that specific day.

How should I remove these unrecognized characters from player’s strings?

1 Like

I am now sanitizing every string that gets saved using utf8.len(), but I am still getting a lot of errors from the DataStore. I make every string pass through this function that turns the entire string containing a non UTF-8 character into a “?”.

function GlobalsReplicated.SanitizeString(text)
	local num, invalPos = utf8.len(text)
	if num ~= nil then
		return text
	else
		print(text,num,invalPos)
		return "?"
	end
end

I am still getting the same error, even though I am sanitizing the string.

Edit:

A lot of the strings that are being flagged as non utf-8 characters seem like they are using the same emoji that was already used in the same string.

In this example, a player was trying to set their roleplay name to “:sunflower:-Fallen Entity-:sunflower:”, but the second “:sunflower:” is being flaged as non utf-8.

image

Here is an example of me naming myself the same thing. The second sunflower emoji is being flagged as non utf-8 and if someone tries to save this name, it gives the DataStore error. In my example where I tested it on myself, the function did not flag it as having a non utf-8 character, or else the entire string would have turned into a “?”. This name gave me an error in the DataStore.

I will try the custom function. I used utf8.len() because this post said that it was effective after a recent update.

I finally found the root cause of the issue.

I use string.sub() on player-created strings so roleplay names and bios are kept under a certain length. When string.sub() is run on a string that has characters with bytes over 127, like emojis or text in another language (Russian, Japanese, Chinese letters etc.) the byte at the end of the string gets malformed.

Example:

Input
print(string.sub("🌻Cute🌻",1,12))

Output
🌻Cute🌻

Input
print(string.sub("🌻Cute🌻",1,11))

Output
🌻Cute��� ← This causes the UTF-8 error.

It looks like some new/unintended behavior from string.len() started on Feb 24th because this was not happening before.

2 Likes

@ShinyGriffin @Judgy_Oreo

utf8.len can be used to check for invalid utf8 now. Invalid surrogates used to be skipped by the function but are properly checked now.

2 Likes

I’m glad you figured this out! As suggested, I’d recommend checking length and validity of the string when you receive the user input instead of trying to trim and save as-is behind the scenes.

As for Data Stores, we’ll continue looking into how we can give you better error messages that help you track down these types of problems earlier. I’m marking this resolved, as Data Stores is correctly rejecting your non-UTF-8 characters.

Would the string.sub() behavior count as a bug and should I make a new bug report? This is new behavior that did not used to happen as I made no changes to the way strings are shortened and saved,

I don’t believe this is a bug with string.sub() either. The emojis in your example are represented by multiple bytes (hence string.len() appearing larger), and it is not “safe” to slice the string in between those bytes.

There are a few other topics on this, including this one.

1 Like
function Unicode.sub(str, startIndex, endIndex)
	endIndex = endIndex or -1

	local length = utf8.len(str)
	if not length then 
		return "" 
	end

	if startIndex < 0 then 
		startIndex = length + startIndex + 1 
	end

	if endIndex < 0 then 
		endIndex = length + endIndex + 1 
	end

	if startIndex < 1 then 
		startIndex = 1 
	end

	if endIndex > length then 
		endIndex = length 
	end

	if startIndex > endIndex then 
		return "" 
	end

	local startByte = utf8.offset(str, startIndex)
	local endByte = utf8.offset(str, endIndex + 1)

	if not startByte then 
		return "" 
	end

	if not endByte then 
		endByte = #str + 1 
	end

	return string.sub(str, startByte, endByte - 1)
end
1 Like