Serializing values into strings to save DataStore space

Hello everyone! Recently I wrote a module which lets you “compress” different kinds of values (CFrame, Vector3, double, bool) into strings. Why, you might ask? For example, to save DataStore space! Compressed values take way less space. Note that this is only really going to benefit you if you’re using JSON strings to store data (that’s what I tested it on).

Alright, so how much space does it actually save?
It takes one character to store a compressed bool.
It takes 1-8 characters to store a number (double), and the amount of characters is based on the amount of decimals
It takes 5-24 characters to store a Vector3, same deal as with numbers
It takes 17-74 characters to store a CFrame, same deal as with vector3s, the more decimals, the more characters. I only save the position, look and up vectors for CFrames, since these are the only vectors needed to fully reconstruct a CFrame.

Source code:

local Compressor = {}

function Compressor:CompressBool(bool)
    return string.char(bool and 8 or 9)
end

function Compressor:DecompressBool(str)
    return string.byte(str) == 8
end

function Compressor:CompressDouble(item)
    local str = tostring(item)
    return string.len(str) > 7 and string.pack("d", item) or str
end

function Compressor:DecompressDouble(str)
    return string.len(str) > 7 and string.unpack("d", str) or tonumber(str)
end

function Compressor:CompressVector3(vec)
    local strArray = {
        tostring(vec.X);
        tostring(vec.Y);
        tostring(vec.Z)
    }

    local concat = table.concat(strArray, ":")
    return string.len(concat) > 23 and string.pack("ddd", vec.X, vec.Y, vec.Z) or concat
end

function Compressor:DecompressVector3(str)
    local x, y, z
    if string.len(str) > 23 then
        x, y, z = string.unpack("ddd", str)
    else
        local array = string.split(str, ":")
        local numArray = table.create(3, math.pi)
        for i, v in ipairs(array) do
            numArray[i] = tonumber(v)
        end

        x, y, z = table.unpack(numArray)
    end

    return Vector3.new(x, y, z)
end

function Compressor:CompressCFrame(cf)
    local basePos = cf.Position
    local mag = basePos.Magnitude

    local vecArray = table.create(3, Vector3.new())
    vecArray[1] = basePos;
    vecArray[2] = basePos + cf.LookVector;
    vecArray[3] = cf.UpVector; 

    local strArray = table.create(3, "")
    for i, v in ipairs(vecArray) do
        strArray[i] = self:CompressVector3(v)
    end

    return table.concat(strArray, "_")
end

function Compressor:DecompressCFrame(str)
    local vecArray = table.create(3, Vector3.new())

    for i, v in pairs(string.split(str, "_")) do 
        vecArray[i] = self:DecompressVector3(v)
    end

    return CFrame.lookAt(table.unpack(vecArray))
end

return Compressor

Put the code above into a module script.
The functions should be self-explanatory.

If you have and questions or concerns regarding the module, post a reply and I’ll answer asap!

11 Likes

I figure it would be worth the mention that most experiences on Roblox will not need to employ any kind of compression. This would be highly useful for data intensive experiences, almost specifically those that allow plot building since there’s a bit of data associated in that regard.

This is just to inform those who see this module and want to employ it but don’t actually have a use case for it; most likely you won’t miss out on anything by not using it. If you find yourself using too much data in a key-value pair you can start by disassociating information from saveable data (e.g. identify objects by an ID and only save data that can be modified in a session) first. If you need further improvements to your data sizes then compression can help you out.

I run a large RPG game with quite a sizeable data set and the game’s current top grinder with over 50K items still doesn’t reach even 10% of the 4MB limit per value. That’s also before we implement any inventory limiting systems since we don’t have any right now.

Still pretty cool though. Always a good idea to save space where you can and only save information that reasonably needs to be saved. Nice resource, thanks for sharing.

4 Likes

This isn’t really compression. Only serializing. You should change the title

1 Like
function Compressor:CompressBool(bool)
    return string.char(bool and 1 or 2)
end

Be careful there. Those characters (\001 and \002) actually take up more than one byte (1 character = 1 byte) when encoded into JSON, up to 8 bytes , which is worse than if you just left them alone in their original form (which is about 4-5 bytes).

4 Likes

Woah, didn’t know. I’ll change them up soon, thanks!

Like he said, this isn’t compression, except for maybe the “CompressBool” function, but that can get mixed up with numbers, so I wouldn’t do it unless I’m sure that value can only be a boolean, if that’s the case, go for it, it might add some complexity, but it should be fine.

However, if you’re using an overall tool that just converts everything it sees and stores what datatype it is, like “Vector3”, then you might wanna consider USING compression after serialization, as you might have a lot of repeating “Vector3” strings, and so you could have a good amount of space.

1 Like

My main concern with a large volume of data is not so much the total space that will be occupied on the server, but mainly the TRANSFER TIME of this data between the server and the client.
Like a video stream, data is transmitted in a compressed form from the server to the client, and only on the client (Localscript) are they uncompressed and vice versa.
For this I created a dynamic dictionary system, which transforms table keys and values into dictionary indexes; of course, taking care to NOT compress items that will take up more space if they are compressed than if they were sent uncompressed (as in the case of booleans, for example, or numbers and strings that are too small).
Furthermore, I also made a request to create a compression layer several months ago (which I very much doubt that will be done), but in any case, it shows the concern with performance and naturally the space reduction.

1 Like

Interesting concern, though it sounds different than what the OP created this for. You’re concerned with transfer between the server and client while the module’s for DataStores. Ideally that’s not the same scenario no? The backend is faster than our code so you don’t quite need to worry about the time Roblox needs to deliver data to the data service.

In terms of speed, have you done any benchmarking to figure this out that you could share as reference or encountered any live-game scenarios where sending data between two environments has taken an egregious amount of time to send? I looked at the thread but I don’t see any actual analysis of the transfer. I can’t go off of words alone to be able to see what the concern is, I’ll need analytics too that compares compression versus non-compression with both low and high values.

1 Like

Good point. I’ll do it.


1 Like

Updated the boolean serialization to use characters which take less space