Compression for limited data (Datastore)

kingdom5 · January 17, 2020, 10:22pm

generally speaking you would not save the whole game but the changes made by the player

lemurmad · January 17, 2020, 10:23pm

As i have already said. I know, But what if a player edited that amount. Such as using TNT? I would run into the same issue.

kingdom5 · January 17, 2020, 10:25pm

You would track the TNT position then remove the blocks?

lemurmad · January 17, 2020, 10:26pm

Still, What if someone edited that mass amount? This is only one layer of each chunk.
Edit:
For one chunk layer, (256) blocks, its 4KB.

kingdom5 · January 17, 2020, 10:27pm

You are not limited to the number of keys you use. It might help to use multiple keys in this case.

There really are a lot of solutions available to you. I would first get a system in place then look at optimization.

1waffle1 · January 17, 2020, 10:28pm

It sounds like the chunks aren’t necessary to the saving process if you can just get which chunk a part belongs to based on its position. Chunks can be re-constructed in the loading process. If you were just storing positions and material types then the data could be pretty concise, the extra information complicates that. If you’re trying to store 640k blocks and the datastore key limit is 260k then you would be lucky to only need 3 keys if you could store one block per character. Depending on how repetitive your data is, actually compressing it should highly reduce the size. I’ve shared a resource for that here https://devforum.roblox.com/t/text-compression/163637

lemurmad · January 17, 2020, 10:28pm

Yes, but its best to try to compress the data to where its efficient and small. Unless this ends up to being the last option.

lemurmad · January 17, 2020, 10:30pm

I can attempt on removing the chunks table and will replay with the results in this messages edit.
EDIT:
from 4KB per chunk for one layer went down to 3KB.
Will edit again with results on 50 by 50.
EDIT 2:
From 45 keys i got 44. I will not be keeping this change as it causes more maths involved for chuck generation and its a small change.

kingdom5 · January 17, 2020, 10:38pm

Compression like @1waffle1 posted works better the larger the data as it builds the word dictionary over time. I would also take a look at LZSS if your data is repetitive or simple have a hard coded wordlist.

lemurmad · January 17, 2020, 10:45pm

I removed the table for data to see the result. 45 Keys to 40!
I am going to attempt to use the string compression and will edit this message with results.
EDIT:
Failed to compress. Game script timeout.
I will fix this within tomorrow as i have to go.

Pavalineox · January 17, 2020, 11:15pm

I’m really late to this thread but if they already haven’t been implemented i would consider the philosophy of a huffman tree, or you could count consecutive blocks and save the id and the number of the consecutive string in order to reduce size.

The second suggestion was applied in this video that may help : https://www.youtube.com/watch?v=fqdTj27xVMM

This youtuber is working on a voxel game and combats problems very similar to the ones you do.

lemurmad · January 18, 2020, 9:03am

So from what im getting is have a string that has values of blocks. Say if it was 0 for air and 1 for grass within a 16 by 16 line?

Dionysusnu · January 18, 2020, 10:05am

Do you need to store info about the blocks? If so, a huffman tree isn’t possible. If not, then it is quite a complicated process, and storing the tree isn’t easy either, so you only really want to do this if it creates massive gains. huffman coding is especially effective if there are very many blocks of the same type, and very few other type blocks in the world. e.g. a completely stone chunk would be very efficient, a very mixed chunk would be less efficient (but still better than without huffman).

lemurmad · January 18, 2020, 10:07am

I can store the infomation of the blocks in a diffrent way. But how can i add this?

lemurmad · January 18, 2020, 10:21am

Update on current size.
59 keys was what I got at the start of this post.
It has basically halved to 33 keys!
I still have not added in compression and most of it remains as tables.

Dionysusnu · January 18, 2020, 10:24am

Are you sure that that other way will not take up too much space then?
Anyways, you can either listen to my (probably bad) explanation
Or watch the video I linked before, if you haven’t done that already.
Basically, huffman is a way to compress characters (or other “single” token stuff, like block IDs) into a long binary string. First you have to construct a huffman tree, I’ll explain that a bit later on. It will look something like this:

How to decode the binary string:

Read from the start, the 1s and 0s.
If it’s a 1, take the right part of the tree, if it’s a 0, take the left part.
Keep reading until you reach a character in the tree.
Add this character to the decoded string. (or for blocks, place it in the world)
Repeat this until you are at the end of the binary string.

How to encode the binary string

For each character (or block), go from its location upwards, until you reach the top.
Every time you get to a “junction” from the left side, write a 0 in your string.
Every time you get to a junction from the right side, write a 1.
Reverse the string, and add it to the encoded binary string.
Repeat until you have encoded all the blocks in the world

(this may not be efficient or easy to implement in lua, I’ll see if there’s a better way)

How to create a huffman tree

For each block, count how many times it is used, and put this count, linked to the block, in a list.
Pick the two lowest items in the list, and connect them to a “junction”.
Count the combined occurences, and put the junction back in the list.
Repeat step 2 until there is only one junction remaining.

Again, Tom Scott (the youtuber I linked) is much better at explaining this.
If you need any help, just ask!

Dionysusnu · January 18, 2020, 10:25am

Just curious, which reductions have you implemented now? How effective were they?

lemurmad · January 18, 2020, 10:29am

Well i started of with tables named Chunk|000|000 but renamed to 000000.
I have also changed to instead saving the blocks name to being 1 as the ID.
Im currently watching the video you linked to see if I can get my head around this.
Im also tring to see if i can get it to just save as one whole string to help size.

Dionysusnu · January 18, 2020, 10:31am

Using huffman coding, you only need to store 2 things:

The encoded string
The huffman tree

lemurmad · January 18, 2020, 10:44am

Just wondering, would it be better if i just saved the ID’s within that layer? As it would be like:

Layer0 = "0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:"

Instead of this:

Table = {
    ["12341234"] = {
        ["0;0;0"] = 0,
        ["0;0;1"] = 0,
        ["0;0;2"] = 0,
        ["0;0;3"] = 0,
        ["0;0;4"] = 0,
        ["0;0;5"] = 0,
        ["0;0;6"] = 0,
    }
}

EDIT:
This is not a string. Its a set of numbers for the block position ranging from 1 - 16.
UPDATE:
from 33 keys to 23 with string conversion.