Compression for limited data (Datastore)

lemurmad · January 17, 2020, 10:30pm

I can attempt on removing the chunks table and will replay with the results in this messages edit.
EDIT:
from 4KB per chunk for one layer went down to 3KB.
Will edit again with results on 50 by 50.
EDIT 2:
From 45 keys i got 44. I will not be keeping this change as it causes more maths involved for chuck generation and its a small change.

kingdom5 · January 17, 2020, 10:38pm

Compression like @1waffle1 posted works better the larger the data as it builds the word dictionary over time. I would also take a look at LZSS if your data is repetitive or simple have a hard coded wordlist.

lemurmad · January 17, 2020, 10:45pm

I removed the table for data to see the result. 45 Keys to 40!
I am going to attempt to use the string compression and will edit this message with results.
EDIT:
Failed to compress. Game script timeout.
I will fix this within tomorrow as i have to go.

Pavalineox · January 17, 2020, 11:15pm

I’m really late to this thread but if they already haven’t been implemented i would consider the philosophy of a huffman tree, or you could count consecutive blocks and save the id and the number of the consecutive string in order to reduce size.

The second suggestion was applied in this video that may help : https://www.youtube.com/watch?v=fqdTj27xVMM

This youtuber is working on a voxel game and combats problems very similar to the ones you do.

lemurmad · January 18, 2020, 9:03am

So from what im getting is have a string that has values of blocks. Say if it was 0 for air and 1 for grass within a 16 by 16 line?

Dionysusnu · January 18, 2020, 10:05am

Do you need to store info about the blocks? If so, a huffman tree isn’t possible. If not, then it is quite a complicated process, and storing the tree isn’t easy either, so you only really want to do this if it creates massive gains. huffman coding is especially effective if there are very many blocks of the same type, and very few other type blocks in the world. e.g. a completely stone chunk would be very efficient, a very mixed chunk would be less efficient (but still better than without huffman).

lemurmad · January 18, 2020, 10:07am

I can store the infomation of the blocks in a diffrent way. But how can i add this?

lemurmad · January 18, 2020, 10:21am

Update on current size.
59 keys was what I got at the start of this post.
It has basically halved to 33 keys!
I still have not added in compression and most of it remains as tables.

Dionysusnu · January 18, 2020, 10:24am

Are you sure that that other way will not take up too much space then?
Anyways, you can either listen to my (probably bad) explanation
Or watch the video I linked before, if you haven’t done that already.
Basically, huffman is a way to compress characters (or other “single” token stuff, like block IDs) into a long binary string. First you have to construct a huffman tree, I’ll explain that a bit later on. It will look something like this:

How to decode the binary string:

Read from the start, the 1s and 0s.
If it’s a 1, take the right part of the tree, if it’s a 0, take the left part.
Keep reading until you reach a character in the tree.
Add this character to the decoded string. (or for blocks, place it in the world)
Repeat this until you are at the end of the binary string.

How to encode the binary string

For each character (or block), go from its location upwards, until you reach the top.
Every time you get to a “junction” from the left side, write a 0 in your string.
Every time you get to a junction from the right side, write a 1.
Reverse the string, and add it to the encoded binary string.
Repeat until you have encoded all the blocks in the world

(this may not be efficient or easy to implement in lua, I’ll see if there’s a better way)

How to create a huffman tree

For each block, count how many times it is used, and put this count, linked to the block, in a list.
Pick the two lowest items in the list, and connect them to a “junction”.
Count the combined occurences, and put the junction back in the list.
Repeat step 2 until there is only one junction remaining.

Again, Tom Scott (the youtuber I linked) is much better at explaining this.
If you need any help, just ask!

Dionysusnu · January 18, 2020, 10:25am

Just curious, which reductions have you implemented now? How effective were they?

lemurmad · January 18, 2020, 10:29am

Well i started of with tables named Chunk|000|000 but renamed to 000000.
I have also changed to instead saving the blocks name to being 1 as the ID.
Im currently watching the video you linked to see if I can get my head around this.
Im also tring to see if i can get it to just save as one whole string to help size.

Dionysusnu · January 18, 2020, 10:31am

Using huffman coding, you only need to store 2 things:

The encoded string
The huffman tree

lemurmad · January 18, 2020, 10:44am

Just wondering, would it be better if i just saved the ID’s within that layer? As it would be like:

Layer0 = "0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0:"

Instead of this:

Table = {
    ["12341234"] = {
        ["0;0;0"] = 0,
        ["0;0;1"] = 0,
        ["0;0;2"] = 0,
        ["0;0;3"] = 0,
        ["0;0;4"] = 0,
        ["0;0;5"] = 0,
        ["0;0;6"] = 0,
    }
}

EDIT:
This is not a string. Its a set of numbers for the block position ranging from 1 - 16.
UPDATE:
from 33 keys to 23 with string conversion.

Dionysusnu · January 18, 2020, 11:25am

I don’t know what you mean, or what that data is… Could you explain how that string is formatted?

lemurmad · January 18, 2020, 11:27am

So i just get all the data within that table. Such as block posion and ID and add it into the string that is saved within the Chunk table
Edit:
1 layer of a chunk is 2KB instead of 4KB.

lemurmad · January 18, 2020, 12:00pm

For everyone thanks for your help! I have reached to the point where I think it should be ideal to save with the least amount of data.

First 3 lines is uncompressed data
Second 3 is my string compression
Third 3 is a module that i used from the link by 1waffle1
And has produced data that used to be 59 keys to 11! Thats 18.6% of what it used to be!
I am still accepting other ideas but they may not be used. Thanks again!

AbstractCo · July 6, 2020, 8:41pm

The Huffman Table is very efficient when a lot of the data is the same. But if a majority of it is varying, then it is not useful.
I recommend taking the whole table, converting it into a string, converting that string into a series of numbers, then taking those numbers and then compressing them using the Huffman table. Every 1 byte can be 3 bytes, as a 1 = 10000000; so a 10000000 can equal 10110011 10110011 10110011 and a 01000000 can equal a 10110011 10110011 10011111
or you can set the ratio to be 2 bytes can equal 4 bytes as 2 bytes have a extremely large amount of data combinations than 1 byte; 8 bits = 256 - 1; 16 bits = 65536 - 1;

AbstractCo · July 6, 2020, 8:44pm

And to take it a step further; if you go through the table, and a majority of it is the same, E.G 2049 instances of 244, or 11110100, you can change the byte byte ratio; a simple 10000000 can equal 2049 bytes; This is a very very extream, nearly impossible instance;
And to take it a step even further; since huffman is mainly for binray, and since we are dealing with strings; we can instead do base 36, as that contains all the alphanumerical characters.
thus a simple “a” can equal 20 bytes, 1 byte to 3 20 bytes, “aa”, 2 bytes can equal 20 bytes, etc…
You will have to make a bizarre huffman table where 10 characters can be stepping stones, just like how no huffman binary number starts with a 0
1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ~ ` ! @ # $ % ^ & * ( ) - _ + = etc…

The only downside is that the data table may be way larger than the actual thing your trying to compress

Dionysusnu · July 10, 2020, 1:37pm

You’re 6 months late ._.
That’s pretty much what I said about the huffman table though…

lemurmad · July 11, 2020, 9:01am

Has it really been 6 months… wow.
I also used this system for my building game. Worked very well.
Also realised that you can save a place state. So i wont need to worry about saving the world.