I should warn you that Roblox’s DataStore limits in 2026 are going to make voxel games very challenging or potentially unrealistic.
The real issue here is the cap of 100 MB + 1 MB per unique user that has played your experience.
Firstly, you won’t even get the full 4 MB that an individual DataStore would normally get:
Each player basically gets a max of 1 MB worth of data. Yes there’s that initial 100 MB buffer that will cover it for a while, and this won’t hurt 99% of experiences, but it hurts this genre a ton. Each player will increase their data size for a given world as they continue to play, but they may also create multiple worlds where this would happen. Now yes, not every player will actively play and may not use their full 1 MB, but relying on things like that is not how you should go about this sort of thing. You should always plan for the worst-case scenario to be prepared for it.
But let’s say you didn’t have those limitations as mentioned above. Firstly, you’ll want to optimize your data as much as possible. The way you have it now, it’s going to fill up very quickly. I’ll use my own project as an example, that’s very similar to Minecraft. Each chunk is 16 blocks wide and 256 blocks tall. That equals a total of 65,536 blocks. The worst case scenario of a chunk is that every single block is filled and uniquely different than the seed. The most optimal way to save a basic block would be to use 5 bytes per block, aka 5 characters. Using an exact amount of characters per block means that no commas will have to be used, which will actually save a lot of space. That would be a total of 327,680 bytes per chunk, leaving room for about 11 more chunks in a single DataStore key.
So, why 5 bytes per block? Well, to answer that, I’ll first have to explain what a byte is. A byte is 8 bits, each of which could be a 0 or a 1. With 8 bits, that leaves a total of 256 possible combinations between 0’s and 1’s. This is calculated by 2 ^ numBits, so 2 ^ 8 = 256. We can store a lot of data with that. The first two bytes of the block will be the positional data. But instead of storing large values for positional data, (as the further a block is away from the world center, the larger its position data will be), we’ll use an offset instead. This means that the block’s positional data can be 0 to 15 from the chunk’s horizontal position, or 0 to 255 from the chunk’s vertical position.
This means that the height can be one byte. The X and Z axis can actually share a byte by splitting it in half. If a byte is 8 bits, half a byte is 4 bits. If you do the math again, you’ll find that half a byte gets you 16 combinations (2 ^ 4 = 16). Which means the X and Z axis both get half a byte each, and that gets combined into a single byte again. The third byte will also be halved, one part for the light level of 0 to 15, and the other half for the block direction (also this is a bit extra as there are only 6 directions a block could face, but this could also be used for block states, and you could use different amounts of bits depending on what you actually want to store). The final two bytes are for the block ID, and 2 bytes gives 16 bits, which means we get 65,536 possible block IDs (2 ^ 16 = 65,536).
All of this means that you could get a total of 12 chunks per DataStore key. However, this doesn’t count for entities, mobs, or even special block types like storage blocks. Those all have their own methods of optimization as well. You actually have 262,144 extra bytes leftover in the DataStore on top of the 12 chunks, so you can use that to store the rest. But you’ll have to impose some sort of limitation so that the max size can’t go over. However, this could be optimized much further. Obviously, you’ll be using a seed so that the only thing you need to save are changes made to the world. But if we do that, then we don’t need to fill up a whole chunk’s worth of data for a single block change. So you could use a dynamic optimization system to dynamically group chunk data together so that you could cover far more chunks at once, highly increasing the number of edited chunks that you could load at once time.
I nearly forgot to clarify that these bytes are done through characters via string.char and string.byte. You will need to handle data larger than 255 (as char accepts numbers 0 to 255) in your own way though.
But do keep in mind of the 2026 DataStore cap limit as it kind of kills this sort of project. Because then you’d only be able to store 3 chunks + 65,536 bytes left over (at least before the dynamic grouping optimization). I really hope Roblox decides to go back on this update as it literally kills this sort of project unless you’re okay with severe limitations. I have ideas on how this could still work, but it’d definitely be a worse experience in comparison.
Now I’ll answer your other questions.
I’ve basically answered this already, but basically, you only need to save chunks that have altered data. Everything else will be generated from the seed. So when the chunk loads from the seed, use the chunk’s modified data to overwrite the previous default data.
All of this is done with Perlin Noise via math.noise. Noise can be used in many different ways. You use it for terrain, you use it for caves, you use it for ore, spawn locations, everything. Also yes, this is a multi-stage process, even Minecraft does it in stages.
I also answered this already, but I’ll explain more clearly. You’ll basically be doing a bit of both. The raw block data would be a single long string containing every modified block with the 5 characters I mentioned. But you’ll use a table structure to identify individual chunks, as well as more specific chunk data, such as pointers to storage containers.