Fix DataStores' present state

EchoReaper · September 6, 2016, 5:50pm

I’m not going to try to mince words – in their current state, DataStores are garbage. They’re difficult to use, tedious to use, and are super unreliable. Let’s analyze each of these issues:

#DataStores are confusing:

DataStores have a significantly higher learning curve than anything else on ROBLOX. It’s not as simple as “Save this for the player” and “Load this player’s data” – DataStores involve handling data server errors, managing request limits, maintaining a cache, and building up and executing a request queue. This level of complexity makes using DataStores confusing, and has resulted in a plethora of DataStore questions from even ROBLOX’s most notable developers on the developer forums.

#DataStores are tedious:

DataStores are entirely unusable in their vanilla state unless you want to deal with hundreds of PMs regarding data loss. Every single developer on ROBLOX looking to use DataStores has to implement some sort of DataStore interface just to be able to use them, which has resulted in developers re-implementing the same thing as other developers over and over. There are likely thousands of DataStore interfaces developed by different individuals for their own projects that do exactly the same thing, and there will no doubt be thousands more in the future.

DataStores are unreliable:

To the small number of people who have made it this far, congrats – your reward is: data loss. Even after all of the hard work expended to get past the previous two obstacles, the developed DataStore interface will not be perfect. It will have some sort of flaw that you won’t find out about until, you guessed it, hundreds to thousands of players have lost data. No matter how much time and effort is spent developing amazing DataStore interfaces, there will always be bugs, and this leads to an untrustworthy and unreliable relationship between developers and DataStores.

#What is the solution?

In order for DataStores to be simple to use and reliable, they need to automagically work. This means making the cookie-cutter DataStore interfaces that have to be re-implemented over and over part of vanilla DataStores. DataStores should automatically compensate for data server errors, should not have request limits, and should automatically use a cache & throttle requests. When developing DataStores, ROBLOX likely came to the conclusion that they couldn’t have something that was both easy to use and powerful, but that is definitely not the case. Below is an implementation which both preserves the existing power of DataStores while at the same time making DataStores easy to use.

First, the existing methods of DataStore (GetAsync, etc) should not change. These are the backbone of DataStores that we leave exposed to preserve their power.

For making DataStores easy and possible to use out of the box, we duplicate the existing methods of DataStore and drop the Async suffix. These methods (Get, Set, Update, etc) would access the Async methods through an internal DataStore interface which automatically compensated for data server errors, and used a cache + request throttling (instead of saving to DataStores immediately, update the cache and then save every x seconds) to ensure the request limit is never reached. The only way requests could error and fail is if there was a coding error (e.g. attempt to index nil value) or the data was too large (trying to save 100 pages of text for instance). With this, DataStores would be usable out of the box, easy to use, and reliable. The issues concerning DataStores would be resolved.

Nitefal · September 6, 2016, 6:10pm

So, basically. i made a complete system for datastores, that handles erros, cache’s and everything it needs.
But sometimes a player still loses their data, i always thought it was something in my system even though i rewrote it serveral times because of it.

But it’s nothing on my side?

EchoReaper · September 6, 2016, 6:14pm

Yes and no.

Yes it’s on your side because your DataStore interface has bugs.

No it’s not on your side because bugs are inevitable and it’s unacceptable that anyone who uses datastores is forced to experience data loss as they fix bugs one by one

Seranok · September 6, 2016, 6:38pm

It is true that a caching layer is required to use data stores properly, but anyone can develop a reusable data store wrapper module. Examples:

The real limitations are:

Results of GetAsync are cached for 10 seconds
Getting throttled because another game misuses data stores
No charts for data store success/throttled/failure rates for my game
No programmatic way to determine how close requests are to being throttled

EchoReaper · September 6, 2016, 6:50pm

You misunderstand – it’s not re-usability that’s the issue. Individuals have to implement the same thing other people have already made. Sally is making a DataStore interface that Bob, Phil, Roy, and thousands of other users have already made. DataStores shipped incomplete: anyone who wants to use them, as you mentioned, has to make a caching layer. You’ve also made two incorrect assumptions as well:

You’re wrong that anyone can make a DataStore interface. Even people on the developer forums have issues creating one. Yes, they can use code written by someone else, but how many people check backwoods wiki articles for that, and why should users need to repeatedly need to complete a feature ROBLOX shipped incomplete?
You’re wrong that anyone can make a (working) DataStore interface. Even if someone makes a DataStore interface, it will have bugs. As mentioned in the OP, the only way to find bugs is to wait for hundreds to thousands of people to lose data. Players lose data en masse across multiple games because developers are individually maintaining their own interfaces which each have their own bugs. DataStores are unreliable because they force you to suffer through data loss in order to get something workable.

Den_S · September 6, 2016, 7:17pm

I feel like the main point of failure is people always assuming data loads correctly. Errors are simply not fully avoidable.
Distinguishing between “no data exists” (nil returned) and “failed to get data” (error thrown) is super important and very likely the main cause of data loss as many people probably simply override all saved data with default values for new players.
I do not think you can generalize this in a higher level API, simply because handling this “failed to get data” case differs per game. You can however document this!

Data state corruption is another issue, although less common (update one key → updating another key fails). This could be improved by having an API to update multiple keys atomically.

It might be useful to have an API to synchronize a value with some DataStore value, but then you still have to handle “failed to load so far” properly. Which again, differs from game to game.

From experience, error rates reduce a lot by retrying requests that throw ProvisionedThroughputExceeded up to 3 times. This could be integrated with the existing methods too (but this would also result in uncontrollable longer yield times, which can be undesired to some people).

Hiding all DataStore errors does not sound great in my eyes…

What if getting data fails? Yields the thread for hours until it succeeds? This can also result in tens of thousands of requests queuing up during an outage. Programmer errors can still occur! Developers may set a default value and simply override it when the Get-function returns a value. Meanwhile they could’ve queued another data-overriding Set request with the default data…
What about many concurrent Update requests on the same key? Do they keep retrying forever and yield potentially forever, taking down any future Update request on the same key too?
What to do when a server wants to shut down while there’s still a large queue of requests to be submitted? Session data loss isn’t completely avoidable (and making the user aware some DataStore issues are taking place is crucial too).

I think the examples on Documentation - Roblox Creator Hub need a lot of improvement. Error handling should be explained with a very high priority, not as an afterthought (it’s presented like that currently imo). Illustrating common programming mistakes could improve it a lot too, imo.

Alkan · September 6, 2016, 7:23pm

Im for making the system better, but I havent heard of any data losses from my players. I feel like I do datastores right and they work pretty well for me.

EchoReaper · September 6, 2016, 9:09pm

Get/Set use a local cache. There’s no such thing as yielding except in the case of when the data is first loaded. Trying to set unloaded data can be circumvented by providing a IsLoaded method (similar to Player.DataReady from DP) and erroring when setting data that isn’t loaded.

Local cache. Update updates the local cache, and queues a save request. Save requests don’t stack. If I update a key 999 times within a couple of seconds, the data is only saved once with the latest value when the first update’s request rises to the top of the queue. There are no update requests queued after the first one because they see that there’s already a save request in the queue.

The number of requests will be significantly reduced from what they’d be now since save requests don’t stack. It’s extremely unlikely that there will be a large queue of requests because the maximum number of save requests in the queue is the number of keys you’ve used.
The server would save requests in a manner similar to OnClose. Hold off on shutting down for 30 seconds and try to save all the data. If all of the data can’t be saved because there’s a million save requests, then that data is lost.

No we cannot prevent users from using DataStores incorrectly. If we passed on features because users could use API members incorrectly, ROBLOX wouldn’t exist. The goal is to prevent data loss under normal and intended use of DataStores.

maxvee · September 6, 2016, 9:23pm

You haven’t explained how we’re actually supposed to solve those problems. For example, caching won’t really work if your game is generating more data per second than the (third party) database interface allows for. The data would just pile up - but where exactly? The game server RAM? The HDD? And what do we do with all the excess data when it’s time to shut down the server?

Same for “data server errors” - the database is provided by a third party, and if they’re malfunctioning, we can’t do anything about it. Same data piling problem.

EchoReaper · September 6, 2016, 9:36pm

Not sure what you’re going on about. Aside from unrelated bugs, I’ve had 0 issues with caching data and saving when the server shuts down.

When the server shuts down, I yield OnClose until all save requests are saved. If there are too many requests to be saved in 30 seconds, then that data is lost, but that’s not an issue because the cache has already been saved to the database at set intervals, and even in the worse case scenario the maximum number of saves I have to make are #keys since changes update the local cache and not the database. If someone is saving to thousands of keys right as the server shuts down, that’s not really an issue with the implementation – no matter how DataStores function that’s not going to work. As far as people who use DataStores normally are concerned, saving just works magically.

For piling up, I imagine you mean servers that run for long periods of time (days) because a small cache of player data from 30 players doesn’t even scratch memory usage – it’s just another variable. When servers are running for long periods of time, keys can be pushed out of the cache whenever they’re not accessed within a certain period of time or whenever the cache gets full, similar to how memory works. As far as anyone using DataStores normally is concerned though, it just automagically works and they won’t even need to check to make sure a key is in the cache before accessing it because their cache is so small.

maxvee · September 6, 2016, 9:51pm

[quote]For piling up, I imagine you mean servers that run for long periods
of time (days) because a small cache of player data from 30 players
doesn’t even scratch memory usage – it’s just another variable. When
servers are running for long periods of time, keys can be pushed out of
the cache whenever they’re not accessed within a certain period of time
or whenever the cache gets full, similar to how memory works. As far as
anyone using DataStores normally is concerned though, it just
automagically works and they won’t even need to check to make sure a key
is in the cache before accessing it because their cache is so small.[/quote]
No, I’m talking specifically about a case where the game is constantly trying to submit more data per minute than the database is willing to accept. Since, say, the database store bandwidth is limited, and excess data generated by the application have to be stored somewhere, right? It’s going to fill up the supposed cache with the data - and then what? You can’t really discard keys in this scenario - you’re going to lose data if you do.

buildthomas · September 6, 2016, 9:53pm

One thing you could do to make error handling more intuitive for datastores, is to give us explicit information on whether the request succeeded or otherwise how it failed exactly. To find out exactly why a request failed right now, we have to parse the error string, which isn’t really convenient or forward-compatible.

Just to give an idea of what I’m envisioning:

[RequestStatus status, Variant data] GlobalDataStore:Get(string key)

It would never cause a Lua exception unlike GetAsync, and it will return a status + data of the request. The RequestStatus is the enum which indicates if the request succeeded, or for what reason it failed otherwise.

For example, if we’re hitting the limits, then we know not to immediately retry. If the request just happened to fail due to a connection issue, then we know that we can safely retry a few seconds later. Maybe our data is faulty and can’t be saved at all, then we know to completely throw away the request and mark it in a log. Knowing the request status (and not having to wrap these calls in pcall/etc to protect against Lua exceptions) would make handling datastores much more intuitive.

EchoReaper · September 6, 2016, 10:08pm

This is a non-issue so long as the same keys are being saved to. Let’s say I have four keys. Each key is written to 2 times a minute. The database only accepts one change per key per minute. You’re imagining something like this will happen:

With the suggested implementation, this would be the actual behavior:

The maximum number of requests in the queue is #keys. You don’t get an extra request for each update – all the queue contains is “Hey this key needs to be updated”. There aren’t multiple cached values for each key either – each key only has a single cache. Key4 could be last in the queue, and I can update it 100 times before it reaches the top, but throughout that whole time there’s only one cached value for Key4.

EchoReaper · September 6, 2016, 10:10pm

That doesn’t help much because it’s just an overglorified pcall, and all of the issues in the OP still apply. DataStores are still confusing, DataStores are still tedious (everyone has to finish DataStores to be able to use them), and still unreliable (regardless of how detailed the error messages are you’ll still have bugs in your code).

Den_S · September 6, 2016, 10:27pm

What’s exactly wrong with the existing Lua modules making the entire DataStore stuff higher level like stravant’s, which fits for the most common use cases for saving/loading data? It sounds like it’s pretty much what you’re looking for (assuming it handles all edge cases properly).

I’ve also always been using raw datastore calls myself and do not experience any data loss problems, but you obviously have to be aware of the different possible cases when loading/saving data.

maxvee · September 6, 2016, 10:29pm

In this case, a simple Lua table would work as your cache.

I’m talking about more complex scenarios when you need to updateAsync() with a callback and concurrent writes.

einsteinK · September 7, 2016, 3:18pm

DataStores are like FilteringEnabled:
One of the main limitations is the developer using it

The DataStore API is fine.
Issues are

The implementation by developers.
Not such a big problem, as there are modules that make it a lot easier.
The data limit: Some games want to store lots of data (mostly save player-built stuff)
The limit of requests should be plenty for any game, if it’s properly used.
(If you save data every second, you’re probably doing something wrong)
If the 10s cache is a problem, you should probably OnUpdate anyway.
(which should in theory fire ASAP and also update the cached value, not that that matters)
Server issues.
This is actually the main problem for data stores.
When the servers have their “issues”, no data can be saved or loaded.
Developers can deal with it, but the player won’t have/keep its data.
If the servers work 24/7 and requests don’t take longer than a few seconds, it’s perfect.

There’s another issue that only applies for certain games, which can’t be easily fixed:
If for some reason, all/most servers UpdateAsync the same key, it’s a bit… bad…

Not related to issues, but a highly wanted feature: Get all keys (and values?) for a datastore.
It’s possible to store keys in a table on creation, but it isn’t very clean or safe…
(mostly because of the that UpdateAsync issue)
Oh, and the ability to delete keys, or SetAsync(nil).

TL;DR: DataStores are fine, just need proper servers and a way to get all or delete keys.

EchoReaper · September 7, 2016, 4:30pm

If users have to use the same Lua code over and over again to be able to use a feature, that’s a sign that the feature doesn’t suit users’ needs and is incomplete. Your question and its answer are no different than “Someone made a good ScrollingFrame module so why do we need ScrollingFrames?”

Whether you know it or not, you’re using a DataStore interface. Whenever you place an item in your tycoons, or whenever you get one more visitor in your recently-released tycoon, you don’t immediately save each change to the DataStore. You have manual saving, and it looks like you have a system that iterates through the park and saves everything as a whole – this is your interface. As for not experiencing data loss issues, it’s a little too early to be saying that – your most recent release has only been out for about a month.

@Den_S @Seranok @einsteinK You can claim that DataStores are fine the way they are, but you can’t pretend the myriad of threads on both the developer forums and the ROBLOX forums regarding data loss and how annoying DataStores are to use don’t exist. Most notable among these threads are ones where developers are suggested to move over to developer products from gamepasses. These threads are substantial evidence that DataStores are not fine the way they are. Will post links after I get home to my desktop.

EchoReaper · September 7, 2016, 4:34pm

Oh, I completely forgot to mention that the suggested API is not meant for cross-server communication. The suggested API is mainly for per-player data, as player statistics, inventories, etc are the main, and sometimes only, use of DataStores, at least in the games I’ve adapted my own DataStore interface to like @Beac_n’s Zombie Rush, @Causiticity’s Hooked, and a number of my games. Per-player DS interfaces are the things people are copying and pasting into all of their projects, whereas cross-server communication is more feature-specific and can’t be applied to every game.

einsteinK · September 7, 2016, 4:58pm

Probably the reason stuff goes wrong early on.

After a proper implementation, data loss might still happen during one of ROBLOX’ famous breakdowns.