Have DataStores cache updates automatically

When a server goes pass the set limit for a DataStore, it should update a local cache instead of causing an error. When all the players leave a server and there are still updates that haven’t gone through to the database, the server should not shut down. Instead, it should enter a state in which nobody can join and it waits until its able to send all the updates to the database.

Of course, we can implement an update cache ourselves in Lua. It’s such an essential thing, though, that it should be hardcoded, like the GET caches are. Also, we can’t implement the “waiting to send updates” server state I described above, which is pretty essential. Imagine a new game blowing up on the front page with low update limits that doesn’t save data for the last few players to leave because the server shuts down before the cached updates can write through to the database.

8 Likes

i agree, roblox’s playerbase has been explosively increasing, and datastores don’t seem like they’re able to keep up outside of saving basic information

4 Likes

I made a suggestion around the lines of this a while back, but this would be problematic in data accessed by multiple servers. If server A is out of requests and updates key 1 to “q”, server B still sees key 1 as what it was previously. DataStores really aren’t capable of using the same key across multiple servers for games with large amounts of servers, but in rare occurrences, keys may need to be accessed across servers on a scale that doesn’t break them (for instance, making a purchase on an auction item).

The majority of the use DataStore sees, at least in my experience, involves data only needed in the current server, but DataStores still need to be able to support correct syncing across multiple servers. I believe the best way to do this is to leave the current xAsync methods, but add Get, Update, and Set without the Async prefix, exhibiting the behavior you described.

1 Like

You could use a cache coherence protocol for this. I’m pretty sure Roblox already uses one for its GET caches. How else would it know to invalidate something cached for GetAsync when another server writes to that key?

I think this is a necessary step they need to take because it will not only make DataStores safer but also reduce the traffic going to and from the database significantly.

1 Like

I don’t really think doing this is the best way of doing things. The GetAsync caching the server currently does already makes Datastores hard to use. The major problem with throttling is in my opinion the lack of visibility into it. You don’t know if your request got throttled or failed for some other reason, so developers do not know if they are making too many requests. Developers also can not easily calculate their current request budget. This just makes things worse, as developers think they can safely make more requests than they actually can and they don’t know when their requests are actually succeeding.

Another major problem with this is that the number of requests made by a game per player must be kept to a reasonable level. If there is no throttling enforced (cached sets will have to be made at a later time) a game could be using a ton of bandwidth.

I think the best way to make this easier for developers is providing a really good Datastore module that includes caching. This way developers would have visibility into exactly how it worked and would be able to change it to suit their needs.

If you need to cache more data than you can save in OnClose you are pretty much doomed anyway. What if the server crashes or if a player joins another server before their cached data has been saved?

It doesn’t. I don’t ever use GetAsync specifically because of the caching. The only way to get a players most recent data 100% of the time is use UpdatAsync and update the data while reading it.

ROBLOX features needing 3rd-party modules in order to be used and/or understood sounds yucky, and there’s no coverage to make them accessible + known to everyone using the features. Is there any reason against improving the DataStore API outright?

Roblox should provide the module and support it. We need a better for Roblox to distribute Lua like the player scripts and the new chat for sure. We would also need a way to nicely document the module on the wiki the same way APIs are documented.

Providing a Datastore API that is both powerful and easy to use is difficult. The current API includes a lot of things that are meant to make it easier to use, and they mostly do make it easier to use but not easier to use 100% correctly. Any things we add to a Lua module would be accessible for developers to edit as necessary to suit their needs and use cases. If we implement these same things in C++ developers will not be able to see exactly what they do or edit them.

If we use a module Lua, then we’re going to need something in the API that allows for all the servers of a game to communicate with each other to keep all their caches coherent. (This would also make common inter-server things like public announcements a lot easier.) At this point, with how fast Roblox is growing, we’re either going to have to have better, coherent DataStore caches or pay for more (unnecessary) traffic to and from the database. Using caches is a common sense solution IMO. They’ll even make DataStore yield functions faster.

EDIT:
Actually, we wouldn’t need to have something in the API that lets servers talk to each other directly. That’s only if we’re going to implement coherency using snooping, which scales badly when you have a lot of servers. Directory-based cache coherence is probably the better option:

Anyhow, this would still require extra stuff in the API, which I think should just be handled in C++.

Why is cache coherency an issue for you? If you are just saving player data, you can easily save the players data once a minute and when the player leaves. There is never a need for other servers to know what is in another servers cache in this case.

If there are some cases where SetAsync caches, then it does not make the guarantee that data is either saved or it will fail. It is very hard for developers to make their code work 100% correctly without this guarantee. For example, I want to save some data about a player, then teleport that player to a different place and then load the data on the other side. I could think the data was saved, but it really was not and the SetAsync will execute at some unknown time in the future.