DataStore Data Limit Increase

This is awesome! This can be very helpful for bigger games and it gives us the opportunity to save data in a more suitable way if needed!

1 Like

Thank you, thank you, THANK YOU Roblox! It’s going to be a long time before I worry about save space again.

yay!!! DataStores are forever changed

1 Like

Wow! This is an awesome increase! :clap:

The above is correct. I would also like to point out that the limit is in bytes after UTF8 conversion, not in characters. There’s little difference for ASCII characters where each character encodes to one byte, but it’s not necessarily true for characters from other alphabets. Cyrillic characters, for example, encode into two UTF8 bytes thus one can only save 2,097,152 repeats of Cyrillic letter ‘ю’. This was true for the old limit as well.

3 Likes

Hey all,
Awesome to see everyones excited about the release! To answer a few of your questions:

This does not replace Large Object Stores. Keep your eyes peeled for more DataStore upgrades throughout this year! We wanted to release in the intermediary to enable developers to start using this increase now!

You’ll be able to write data at the same speed you were writing before. The six-second cooldown between writes still exists, so make sure to accommodate for that.

OrderedDataStore can only store integer numbers, so this change is not applicable.

8 Likes

Isn’t it weird how Rōblox would allow users to store large chunks of data per field free of charge, yet still have us pay ~300 robux for audio two minutes long - which wouldn’t even come close to 4 MB under most circumstances?

Data stored in data stores doesn’t go through moderation. Audio does and it’s human moderation.

1 Like

Valid point. Do you have any sources to cite that tells that all audio is moderated by humans? There’s quite a community of users on this platform that share ‘bypassed’ audio clips.

this is great for games with large user generated structures, other than that, i’d like to see a feature to returning the percentage of the storage used for a key, such as DataStore:StorageAvailable(key), and then it’ll return a percentage varying from 0 to 100, depending on how much storage has been used.

1 Like

As a means of compression, I’ve designed a system that can convert arrays of “bytes” (this is just tables in lua where each value fits into a byte e.g. {0x00, 0x05, 0x1F, ...} where no value is > 0xFF) into doubles, effectively collapsing one of these byte tables into something 1/8th of the size.

Details on the process for more information

Let’s say I have an array: local byteArray = {0x40, 0x7F, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00}
What this function does is takes those 8 “bytes” and merges them into a double precision floating point - lua’s numeric datatype. 0x407F220000000000 as a double is 498.125. That new double value is packed into a new table, and a table of these packed values is saved to the data store.

When I write numbers into a data store, is any conversion done to the values e.g. if I save {123, 456, 789} into a data store, will the numbers themselves take up 24 bytes of data (8 bytes for each number), or is that converted to some other form that will make each number take more (or less) data? I’m seeing a lot about UTF-8 conversion but I don’t know if that applies here or not.

1 Like

Wow. I’m just starstruck. If my math is right, then thanks to this limit increase, my maximum ghost data size just went from 16 minutes up to a whopping 4 hours. I… I don’t even think I’ll ever need to save a replay that long, but… wow. Thank you guys so much for this. I don’t think I’ll ever need to worry about storage limits ever again.

I wouldn’t do that if I were you. On the surface, it seems fine, but there are about 9,007,199,254,740,987 values that are completely unrepresentable while doing that – the bit pattern of nan isn’t preserved, so doubles in the ranges [0x7ff0000000000001, 0x7fffffffffffffff] and [0xfff0000000000001, 0xffffffffffffffff] are all representable exactly the same in Lua.

See: https://float.exposed/0x7ff0000000000001 if you’re curious.

3 Likes

Values are stored as JSON, so numbers would be extremely inefficient. {123,456,789} would be stored as literally as [123,456,789], or 13 bytes. Depending on the exact value, you can get a number that’s as long as 24 bytes.

The lowest overhead you can get is storing a string encoded in Base64 or something. I have a Base85 module written specifically for this purpose. It has an overhead of 25% (or 3,355,440 effective bytes), which beats Base64’s 33%, and certainly beats the overhead of an array of numbers.

2 Likes

In my game, still under construction, I will have to store literally tens of thousands of objects, each in its X, Y, Z position, and Y orientation. In this way, even if I reduce the value of each axis to 4 decimal places, I will still have a large amount of space.

I’m using DataStore2.

I have already created a dictionary system, but I believe that this will still not be enough.
In my case, for example, I will have to store each object as for example:

{ObjName = "Object001", Position = {1.23456789, 2.34567890, 3.45678901}, OrientationY = -90}

… where, using a dictionary system and truncating the numbers to 4 decimal places, I would still have a JSON string for each object, like this:

[1=4321,2=[1.2346,2.346,3.457],3=-90]

… where elements 1, 2, and 3 will be replaced by “ObjName”, “Position” and “OrientationY” via dictionary translation.

That is, each object will occupy at least 40 bytes; if I have 10,000 objects in the game, it will be at least 400,000 bytes (0.38 MB) PER PLAYER, in addition to other less bulky data that I will have to store.

This means that, even with this recent increase in the storage space limit to 4,000,000 characters, only 10 players with the same amount of items will overflow my limit.

Am I right in these calculations?

You can greatly decrease the size by imposing limits on your game, and then storing data in a binary format. If you use binary to influence your limits, you’ll be able to store data much more compactly.

For example, let’s say you have a game where you can arrange furniture in a room. The room is flat; objects can’t be placed above or below each other, so that’s already a limit. You’re also not going to let the player place furniture outside the room, so there’s another limit. You also likely don’t need furniture to be placed with extreme precision; no player is going to notice or care that their couch isn’t placed exactly 0.01 studs off the wall.

You might decide that 1/4 of a stud is a fine amount of precision, and that a room sized 64 by 64 studs is plenty big enough. Choosing these particular limits would allow you to store each coordinate in just a single byte, or 2 bytes for the entire position of one object.

I once designed a binary format for storing bricks of various size, color, material, and so on. With the previous limit of 256KB, I calculated that it would be able to store over 19,000 bricks. With the new limit, that would be over 300,000.

I’ve been meaning to write a post that goes into more detail on this sort of thing. Perhaps I should work on that.

5 Likes

Thanks, I’ll try your Base85 code also.
But I’d like to confirm: the 4,000,000 chars is the limit for 1 PLAYER or the sum of ALL PLAYERS?
I hope it’s a limit for each player, so I have not to worry if I’ll have 1 or 10,000 players, since there is no space limit for the SUM of all players, just 4,000,000 chars for each individual player, right?
I’m using DataStore2.

The limit is per DataStore key, which you get an unlimited number of. If you wanted to, you could store unlimited data (don’t, though). How keys and data within those keys is organized is left entirely up to the developer.

I’m not sure how DataStore2 organizes data. It may be focused on the most common use-case where keys are assigned per player. You’d have to ask @Kampfkarren for more detail.

1 Like

This should only apply per player, not as a global cap for all players.

2 Likes

I was thinking of creating a compression system just for numbers (0 ... 9) plus 6 any other characters. like .,+-[], totaling 16) that would be derived from a numerical array in JSON, where it would be possible to put everything in just 4 bits.
In this way, it would be possible to have 2 numbers stored in a single byte, ie, 50% compression.
This would be excellent, especially when storing many numbers, CFrames, etc.

But now I see that this is impossible because JSON does not accept all bytes / characters ranging from 0 to 255.
As you said, Base64 is used to ensure that any source byte will result in JSON-readable chars, however, a simple test here (https://cryptii.com/pipes/binary-to-base64) shows that 4 bytes with value 255 (11111111 11111111 11111111 11111111) will result in 8 JSON bytes (/////w==) .
Thus, an efficient compression system is difficult due to these JSON limitations.

1 Like