Compression for limited data (Datastore)

So I am tring to make a ‘Minecraft’ block related game with the ability to save the world.
I am tring out my new data saving system as my older one cannot save data within 1 key. I am currently saving the Name of the block and the Position it is within the world.
Scripts related to the data saving is down below.
I currently have not looked into the dev forum to find anyways of compressing the data but I do not think it could be possible with such a limited amount of data.

If anyone has any ideas on how to compress this further that would be great!
Blocks per chunk for grass: 256
Chunks generated for test: 50
256 * (50 * 50) = 640,000‬ Blocks in total.
I have found when getting the world size i have got it to save in 59 keys.
I CAN’T have a players world take up that much space for a single world.
image

World data holder script
local module = {}
GameData = {}
function GetChunkName(Position)
	X2 = -1
	Z2 = -1
	for i=Position.X,1,-16 do
		X2 = X2 + 1
	end
	for i=Position.Z,1,-16 do
		Z2 = Z2 + 1
	end
	--
	X3 = ""
	Z3 = ""
	for i=(3-#tostring(X2)),1,-1 do
		X3 = X3.."0"
	end
	for i=(3-#tostring(Z2)),1,-1 do
		Z3 = Z3.."0"
	end
	X3 = X3..tostring(X2)
	Z3 = Z3..tostring(Z2)
	ChunkName = "Chunk|"..X3.."|"..Z3
	return ChunkName
end
function module.GetData()
	return GameData
end
function module.GetChunk(X,ChunkName)
	return GameData[ChunkName]
end
function module.GetWorldDataSize()
	Size = tonumber(math.floor(#game:GetService("HttpService"):JSONEncode(GameData)))
	return Size
end
function module.SetBlock(X,Block,Data)
	--print(X,Block,Data)
	local BPosition = Data["Position"]
	local BChunkName = nil
	if BPosition ~= nil then
		BChunkName = GetChunkName(BPosition)
		Chunk = GameData[BChunkName]
		if Chunk == nil then
			GameData[BChunkName] = {}
			Chunk = GameData[BChunkName]
		end
		BlockData = {tostring(Block),Data["Data"]}
		Chunk[tostring(BPosition)] = BlockData
	end
	--print("Block Saved!")
end
return module
1 Like

Assuming you’re generating your world with some kind of noise and pseudo-random number generator you should be able to save the seed, with the addition of any changes made by the player. That way you wouldn’t necessarily need to save an entire chunk of data since the seed value should load the same chunk every time given and X, Z coordinate. Just throwing that out there.

Yeah, I was thinking about adding that. But just for a cause incase someone was to actually edit that amount of blocks within the game.

I have found this linked to disk sizes for minecraft worlds.

But it still questions me if im doing something wrong or if its compressed to what it is to the best. It seems i would always use more than one key for a players world data.

1 Like

First a minor-ish improvement:
Store block IDs, not names.

Then a more major one:
Always store a chunk in the same order. This way, you only need to store the blocks, not their position

You could compress this a lot more if you use something similar to a huffman tree. If you don’t know what it is, it is mathematically the most efficient way to compress text. Instead of text, we compress block IDs. Combined with the chunk block order, you would be able to read the terrain again. I’m bad at explaining, so I’m going to link this video. Key parts that you need to know about it:

  • it’s very efficient
  • you only need to store: a single string, and the tree.

If you actually decide to go for this, then please let me know so I can explain it a lot better.

1 Like

So from what you said and that I guess, its basically ID’s for the blocks?
Such as 1 = Grass_Block , 2 = Dirt_Block and so on?

computers count from zero
Lua joined the chat
Yes, that’s indeed how you would store that. To get the blocks back from the data, you can store the blocks in an array and just do blocks[id]

K, Ill try this idea and ill post what the result is. Thanks!

You can use a seed to generate the world and datastore the changes that the player made instead. When the player joins, you can load the seed and apply the changes that the player has made to achieve the same result.

edit: i didn’t see compilererror’s reply xd

Yeah, as what @ CompilerError was saying. Apart from if it has that amount of parts that were edited by the player. It will be the same issue.

This is the result:
image
It reduced the key amount from 59 to 50. Any other ideas to lower this? If its the lowest i can get, I can deal with this untill following on to the mass amount of edited parts.

Make sure you aren’t storing this as JSON. A CSV format or some variation on that would likely be a lot more efficient space wise. Make sure there are no spaces unless you’re using that for separating blocks of data. You should avoid saving strings and also try to mitigate storing large numbers or numbers with lots of decimal places.

If you have any type of data that is repeated then you might benefit from making a way to create references to an object in the save data so you can shrink all of those down too.

1 Like

I use ID’s for the block, such as 1 = grass_block. As for how its saved,
I guess this is what it looks like from what i currently use.

WorldData = {
    ["Chunk|0000|0000"] = {
        ["0, 0, 0"] = {1, "'Data Table'"}
       }
   }

EDIT:
I guess changing "Chunk|0000|0000" would be better as "00000000"

Could you just save the whole thing as a string?

This could be something as simple as:

WorldData = '00000000;0,0,0;1' -- Chuck, coordinates, id

Not sure what the ‘Data Table’ part is for but that could definitely be shortened too. Your save files don’t really need to be easily read.

Wow. That would work alot better. Apart from for loading blocks at a effective speed, I used tables to locate that block. But for shortening it, it should work a whole lot better.
As with the ‘Data Table’, Its what would of hold data for that current item if edited. Such as chest data or furnace data.

I would load it all in and convert it back into a cache table for quick referencing. It’s a bit more process heavy, sure. But if data storage is a bigger issue i think this is a better way to go about it.

Ok, Ill try on what you suggested and ill reply with the edits and results. Thanks!

1 Like

Roblox does have bit lib which Im sure would be a good fit for your use case.

By removing whitespaces within the position and editing the chuck names i reduced it from 50 to 45! Improvements!

What does your data look like? The code you’ve shared looks like it just makes names for keys in a table and the actual data is stored in each key. 12345678;x;y;z;id is the name of the chunk, what’s actually in the chunk? also, what is the purpose of chunks rather than just listing each individual block? how many blocks are you trying to store that’s taking 45 datastore keys?