Pondering DataStores - Why DataStore2?

PurpleHorseMint · January 27, 2023, 5:08pm

Hello there, I’ve been a developer for a long time but recently started poking around the Roblox ecosystem. I found the Data Store limitations an interesting problem to solve and built a proof-of-concept implementation for my game project.

I don’t know if this is the right forum or considered good-form but I have questions about someone else’s (publicly available, published) code. Specifically DataStore2

Prior to discovering the DataStore2 package, I built my own simple caching solution that throttles updates to the store. It seems to work reasonably well as a proof-of-concept. Later, I discovered DataStore2 exists and given the number of references to it and claims of reliability, I started digging into the implementation.

Functionally our caching mechanism is very similar (trivially, hold a per-player table in memory) but beyond that, the design to prevent data loss diverges.

My concept was this: hold data store updates in memory just long enough to prevent hitting API quota limits then do writing. Then tune the frequency of writes to account for the number of data stores you want to write to.

DataStore2 takes a different approach of storing everything in memory until the player leaves or the server shuts down and then writing (unless explicitly asked to save). They also use the so-called “OrderedBackups” or “berezaa’s method” of saving data which appears to be saving every instance of player data changes over time.

This raised some questions in my mind that I cannot find answers to.

NOTE: These questions are out of curiosity - I assume I may be missing something and/or that DataStore2 was developed with requirements or limitations I do not understand or are no longer relevant (it looks like it was developed a few years ago).

Why is it a good idea to only write data to the store when the player leaves (again, unless the save API is explicitly invoked which is clearly not the “default/encouraged” way to use it) when the server could potentially shutdown any time? While it may be rare, I assume servers crash sometimes for any number of reasons (developer error, roblox service problems) and it seems that having a system that only stores updates in memory indefinitely until the player leaves leaves a significant hole in reliability. In theory, a trivial hedge against this risk is to occasionally write cached data.
Why is “OrderedBackups” or “berezaa’s method” of saving data needed at all when Versioning appears to do the same thing? I assume versioning is more performant given it is leveraging internal APIs, not to mention with more support for managing the store through the DataStore cloud API now.
Here the DataStore2 docs claim

In normal data stores, you’d save all your data into one giant player data table to minimize data loss/throttling.

I believe this claim rests on the assumption that DB writes are happening very infrequently given a trivial example like writing 256KB once to one store is less efficient than writing 16KB to three stores (assuming the bottleneck is moving the bits, not the connection setup/takedown). Is that the case? Are data store connections so fragile that making 1 write is that much more reliable than make 3 writes?

Expanding on this a bit, perhaps this is related, I’ve never quite understood this piece of Roblox Data Stores documentation, Create Fewer Data Stores. It says,

Data stores behave similarly to tables in databases. Minimize the number of data stores in an experience and put related data in each data store. This approach allows you to configure each data store individually (versioning and indexing/querying) to improve the service’s efficiency to operate the data. As more features become available, this approach also allows you to more easily manage data stores.

Specifically, why, “Minimize the number of data stores…”? Maybe I’m overthinking this but if data stores act just like tabes in a database, I would find it strange to read a database doc saying, “Minimize the number of tables you make…”. The number of tables an app needs is dictated by the requirements of the app and the database design - nothing more, nothing less. So then the only translation of this that makes sense to me then is, “Do good database design”. Which feels out of place here at best. But maybe this documentation is just geared for folks that are inexperienced using databases. Or, again, maybe I don’t understand

That’s all I’ve got for now… again, take this in the spirit it is intended - it is not a criticism of DataStore2. I am trying to learn more about the limitations and constraints Roblox developers must take into account when building a reliable interface to data stores. These are questions I’ve been unable to find answers to in documentation or other forum posts.

Thank you!

@Kampfkarren if you have a minute, would love your thoughts

Edit: here’s a link to a review of my own (possibly naive) caching implementation

Kampfkarren · January 27, 2023, 11:02pm

It’s usually a good idea to autosave, and especially save when someone purchases. I just don’t think it’s the place for the data store library to do this.

It didn’t exist at the time. The “Standard” saving method is recommended now.

Why do you ever want to risk the possibility of one data store write working, but others not? Why delay for every single get rather than on just one? There might as well not be a storage limit at all unless you’re doing something monumentally cool. It’s not a matter of anything other than logistics.

Kampfkarren · January 27, 2023, 11:03pm

IMO the biggest reason to use DataStore2 is that it is a dead simple API that has been battle tested to hell and back, and lets you not have to worry about throttling. The data store APIs are a terrible thing to get wrong.

PurpleHorseMint · January 28, 2023, 12:48am

@Kampfkarren thanks for taking the time to read and respond - appreciate it!

The data store APIs are a terrible thing to get wrong.

Couldn’t agree more - that’s why I posted

Broadly, your comments have confirmed what I thought. In summary, versioning was not around when the “OrderedBackups” method was developed and that you’re making tradeoffs in server-shutdown risk and database writes that largely make sense for many Roblox games.

I’ll respond to a couple of things you said, again, not to criticize the choices you made but only as a matter of elucidation for other readers. I actually admire DataStore2 - the code is well written, leverages OOP in a nice way and recreates Javascript promises as well as several promise library / ES6 promise handlers that can be tricky! I know because I’ve done the same in Typescript. Anyway, point is - DataStore2 is a nice, valuable piece of implementation. But every library must make choices and my previous questions (and following responses) were primarily to understand what drove those choices.

It’s usually a good idea to autosave, and especially save when someone purchases. I just don’t think it’s the place for the data store library to do this.

It depends on how well your target customer (of the API) understands the tradeoffs and risks. The getting started docs and guides clearly have a bent towards “no need to save”. And it is an interesting assertion that, paraphrasing, it isn’t the place for a data store library to, well, save data. I’d suggest a data store API should save data in whatever way best serves the purposes of the end API user. I assume you believe this is what it is doing but the point stands that saving occasionally as a backend process would fill a gap in server failure modes that doesn’t appear to exist today.

Why do you ever want to risk the possibility of one data store write working, but others not?

Because it’s all about trade offs - the choice you made has the downside of requiring larger writes, likely often with much of the data never having been updated. Which when coupled with the above (were you to add in occasional writes) becomes an even bigger issue because now you have to consider the performance of those larger writes while the game is running, probably under variable loads when sometimes it might be fine and sometimes it might not be.

A library could make 16 writes of 256 bytes quite efficiently over the course of a minute, avoid the “autosave” problem and probably have little performance impact. If I design everything to be in one blob, this becomes much more problematic.

As you say, apparently DataStore2 works well so perhaps you made the right tradeoffs. I was largely interested to understand what tradeoffs were under consideration and/or if there was something I was largely missing. I think it’s just tradeoffs so I appreciate you enlightening me on this!

Before being aware of DataStore2 I wrote my own proof-of-concept for a caching mechanism that “writes occasionally” which is the alternative approach I’ve been describing. Would love your thoughts on it if you time!

Kampfkarren · January 29, 2023, 11:21am

There isn’t any problem with larger writes, and I don’t think you need to consider the performance at all. It’s all I/O. Nothing is blocked. Compare to the extremely non-theoretical problem of slower saving AND loading from multiple requests, as well as the added instability. If there’s a tradeoff there’s an absurdly clear winner here.

PurpleHorseMint · January 29, 2023, 5:11pm

Since the max table size is 4MB, any data store library should be doing the math based on objects near that size e.g. 3.9MB. So in some scenario where I store everything in one blob, at 3.9MB, change 1 string in that blob e.g. “yes” to “no” then call save, DataStore2 will write 3.9MB to the data store when exactly 3 BYTES changed.

Given that “extremely non-theoretical problem” it seems absurd to claim:

There isn’t any problem with larger writes, and I don’t think you need to consider the performance at all

And this is an assertion without any proof:

Compare to the extremely non-theoretical problem of slower saving AND loading from multiple requests, as well as the added instability.

I’ve given several, specific examples of the advantages of using multiple writes - but this is the second time you’ve claimed they are “slower and less” stable without evidence. Setting up an HTTP(s) connection is trivial and as prone to failure as any other HTTP(s) call of which HUNDREDS if not THOUSANDS are happening every minute within any Roblox game. Given that, to claim that making 16 calls to the data store in a minute is inherently “worse” than, e.g. making one, due only to the number of calls without explaining why that would be true, particularly in light of the advantages to doing so is facile.

You are clearly entrenched in your position. I suspect we don’t have much else to learn here. I wish you the best.