About an hour ago, we began receiving reports of data loss in our game, even though we haven’t updated it since yesterday.
The issue started at 12:55 PM EST. We’ve temporarily made our game private while we investigate the cause.
A few weeks ago, we switched to a new datastore solution. Previously, we were using DataStore2, but now we suspect Roblox may be returning incorrect dates for keys.
Here’s a code snippet we believe is causing the problem. This code determines whether to use our old datastore system or the new one we developed internally. Our best lead so far is that this code somehow changed its behavior about 1.5 hours ago, despite the game itself not having been updated in roughly 24 hours.
I am from the Data Store team, we did not make any changes to the service and do not see any errors from our dashboard or reported from other experiences.
Can you describe me what data you are storing in your Data Stores?
Can you provide me with some Data Store keys, the value you expect to be stored, and the value you are retrieving out that is different from you expectations?
Hi team, could you post a summary here of what issues you saw with timestamps?
E.g. At 9:22AM, noticed DataStoreInfo not being populated correctly - were all fields corrupted, or certain ones? Which ones did you use in your switch logic?
Hey, thanks so much for the rapid response. We deployed a fix and rollback which were successful.
I’m still investigating what happened. I haven’t been able to find any evidence of corrupted or incorrect data from DataStoreService. The thing that’s really confusing to me is that (at time of incident) all of our data storage logic went unchanged for over a week, and zhongbot mentioned that DataStores, as we have been using them, have been unchanged for a week.
That speaks to me like an issue akin to a feature flag flipping, or backend service changing/failing. However, we confirmed that our own services didn’t experience any issues at or around time of incident. So the last thing that I can think of is maybe there was an FFlag in DataStoreService in the engine (if there are any at all) that flipped?
Either that, or an extremely rare race condition with our code occurred in a few servers at the same time. Very strange.
Our code uses DataStoreKeyInfo.UpdatedTime to decide which is the most recent data. If a server ever decides to use DataStore2 data over our new system, we see logs which indicate that. During the incident, we saw logs saying it chose DataStore2 data because it was more recent, but never any logs indicating we wrote to DataStore2 at any point.
I’ve narrowed it down to this sequence:
A couple/few seconds before the log indicating we overwrote fresh data with stale DataStore2 data appears, something somewhere reads stale DataStore2 data
Whoever read that, writes it again
The server the player joined compares timestamps between our system’s last write and DataStore2, sees that DS2 is more recent, overwrites fresh data with DS2 “fresh” stale data
I’ve gone over all of the relevant code several times and haven’t been able to come up with a plausible way this sequence could happen. Regardless, we fixed it by just completely ripping that bit out since we’re confident now in our new system.