Strange DataStore CURL error causing DataStore service to fail most, if not all, of the time across many games

I agree with Dapale here on this. It’s definitely fitting for critical.

3 Likes

Now’s the time to make your game work with DataStores being down if they aren’t already so we know how to get around it :eyes:

3 Likes

I mean, in my case, if players’ data hasn’t loaded, they aren’t going to want to play anyway. I imagine that’s the same in most cases.

1 Like

Have fun paying lots of money a month for strong reliable databases, heh. I might start setting up a system for phantom forces to auto upload data if saves fail, then on rejoin, overwrite. Especially with datastores failing hard right now…

1 Like

I mean writing code around it so that your game doesn’t break if its DataStore reliant to start the game. Also can’t seem to get my DataStores to fail, in game and in studio test.

Removed ROBLOXCRITICAL tag.

@Dapale @Xan_TheDragon @RBX_Lua
Please keep in mind that ROBLOXCRITICAL is not subjective, a bug report needs to meet criteria to qualify as ROBLOXCRITICAL: https://devforum.roblox.com/t/please-read-before-posting-steps-to-report-a-bug/24388

(Note: “Data loss” does not include data losses from bad datastore code. Datastore failures are to be expected from time to time and code needs to handle these temporary failures gracefully. I just checked my own games and the error rates are far from 100%, so while very inconvenient, it’s not an insurmountable issue.)

The issue has been escalated.

Thank you for the report. Our engineering team is actively investigating this issue and we’ll follow up soon with more information.

4 Likes

Ohhhhh, this is why I’m getting tons of messages about lost data.

So, if something like this DOES happen and we need to notify easier, what do we do?

You file a bug report without a ROBLOXCRITICAL tag.

And we should assume that’s going to be seen instantly?

According to my data there is large-scale data store failure and players being forcibly removed from the game.

The best datastore code in the world won’t fix that

9 Likes

Operations has internal status trackers that will catch these issues faster than users report on them, so the bug report is really just additional comments on the issue.

1 Like

How many of those are after a final retry?

All of them? if I understand correctly, re-tries are handled “under the hood” and if an error happens then that is the final status.

edit: to clarify, these are datastore load errors, not save errors.

1 Like

Auto-retry isn’t enabled yet. Right now you should be retrying requests manually. If your game handles retries gracefully, partial outages like these shouldn’t have much impact aside from slightly longer request times as they’re retried. This includes both loading/saving data.

2 Likes

To add to my post above
The statistics I have are Per attempt in code to save.
I log each individual datastore SetAsync if its successful or failed, then log the errors as well.

Which within 30 minutes, had
87,630 successful
4,086 failures
(4.4%)

They attempt to save again and again until successful.

2 Likes

Something interesting to note:

Here are all the biggest datastore errors and how many times they have occured in the last four months total not including today:

And Here is the error occurring today:

This is not the same error that usually accompanies high-traffic outages. This is an anomaly

4 Likes

@BOF had some issues with this on twitter too, so this post is relevant to him as well

1 Like

The problem is that, at least in my limited experience, datastore errors are usually not “gracefully” distributed among servers. I don’t have concrete data but from anecdotal experience usually some servers are reasonably fine while others barely get any requests accepted.