I agree with Dapale here on this. It’s definitely fitting for critical.
Now’s the time to make your game work with DataStores being down if they aren’t already so we know how to get around it
I mean, in my case, if players’ data hasn’t loaded, they aren’t going to want to play anyway. I imagine that’s the same in most cases.
Have fun paying lots of money a month for strong reliable databases, heh. I might start setting up a system for phantom forces to auto upload data if saves fail, then on rejoin, overwrite. Especially with datastores failing hard right now…
I mean writing code around it so that your game doesn’t break if its DataStore reliant to start the game. Also can’t seem to get my DataStores to fail, in game and in studio test.
Removed ROBLOXCRITICAL tag.
@Dapale @Xan_TheDragon @RBX_Lua
Please keep in mind that ROBLOXCRITICAL is not subjective, a bug report needs to meet criteria to qualify as ROBLOXCRITICAL: https://devforum.roblox.com/t/please-read-before-posting-steps-to-report-a-bug/24388
(Note: “Data loss” does not include data losses from bad datastore code. Datastore failures are to be expected from time to time and code needs to handle these temporary failures gracefully. I just checked my own games and the error rates are far from 100%, so while very inconvenient, it’s not an insurmountable issue.)
The issue has been escalated.
Thank you for the report. Our engineering team is actively investigating this issue and we’ll follow up soon with more information.
Ohhhhh, this is why I’m getting tons of messages about lost data.
So, if something like this DOES happen and we need to notify easier, what do we do?
You file a bug report without a ROBLOXCRITICAL tag.
And we should assume that’s going to be seen instantly?
According to my data there is large-scale data store failure and players being forcibly removed from the game.
The best datastore code in the world won’t fix that
Operations has internal status trackers that will catch these issues faster than users report on them, so the bug report is really just additional comments on the issue.
How many of those are after a final retry?
All of them? if I understand correctly, re-tries are handled “under the hood” and if an error happens then that is the final status.
edit: to clarify, these are datastore load errors, not save errors.
Auto-retry isn’t enabled yet. Right now you should be retrying requests manually. If your game handles retries gracefully, partial outages like these shouldn’t have much impact aside from slightly longer request times as they’re retried. This includes both loading/saving data.
To add to my post above
The statistics I have are Per attempt in code to save.
I log each individual datastore SetAsync if its successful or failed, then log the errors as well.
Which within 30 minutes, had
87,630 successful
4,086 failures
(4.4%)
They attempt to save again and again until successful.
Something interesting to note:
Here are all the biggest datastore errors and how many times they have occured in the last four months total not including today:
And Here is the error occurring today:
This is not the same error that usually accompanies high-traffic outages. This is an anomaly
The problem is that, at least in my limited experience, datastore errors are usually not “gracefully” distributed among servers. I don’t have concrete data but from anecdotal experience usually some servers are reasonably fine while others barely get any requests accepted.