String Compression (zlib/deflate)

Introduction

I did some digging and found a pure-lua version of the zlib/deflate compression library. After forking the code and editing it a bit, I managed to get it to work with luau. I have from there created an easy to use compression library which takes an input string and outputs a compressed string.

I won’t go too in-depth about how zlib/deflate works (you can find many articles/such online), but following @1waffle1’s lead, I decided to take the source code of all the chat and camera scripts, concatenated them into one string, and compressed 286748 characters into 58082 characters in 0.370 seconds with level 9 compression. I also managed to compress 286748 characters into 73702 characters in 0.08 seconds with level 1 compression.

With this library, there are varying levels of compression, ranging from 0 (no compression) to 9 (most compression), as you go up in compression level, it will take more time to compress the text, but will also compress it more the higher the level. Determine the best level for your specific use case. Here are my benchmark statistics:

level 0 : 10.4ms : 286748 → 286788
level 1 : 80.2ms : 286748 → 73702
level 2 : 85.5ms : 286748 → 70070
level 3 : 101.8ms : 286748 → 67777
level 4 : 137.0ms : 286748 → 62589
level 5 : 182.2ms : 286748 → 59749
level 6 : 274.1ms : 286748 → 58421
level 7 : 323.3ms : 286748 → 58160
level 8 : 383.8ms : 286748 → 58082
level 9 : 369.9ms : 286748 → 58082

As you can see, the higher the level, the more compressed the string becomes, but the longer it takes. There are also diminishing returns the higher you go, that is to say, the jump from level 0 → 4 is much higher than the jump from level 4 → 9, and there is also a large increase in compression time from level 5 to level 6.

It is important to note, compression relies on repetition, and the amount of characters compressed will depend heavily on what can be found repeating in the string.

Disclaimer: This uses a simplified version of the zlib algorithm, and a pure lua implementation. Higher levels of compression are not guaranteed to take longer, nor are they guaranteed to be smaller than a lower level, though it is highly probable that this will be the case. Also, do not use my benchmarks, it is better for you to benchmark your specific use case for more accurate results.

Installation

This package is a single module script:
https://www.roblox.com/library/5649237524/Compression-zlib-deflate

You can also view the source code on pastebin:

Documentation

Note:

  • “Compression” is the required module found under Installation.

  • The initial library has been edited for ease of use, rather than functionality. That being said, you can still access the initial library through Compression.Library. Additionally, the documentation is also available inside of the Compression ModuleScript.

View method-specific (function-specific) information/usage

Configs table:

{
	level = 0; -- integer 0 -> 9 where 0 is no compression and 9 is most compression
	strategy = "" -- "huffman_only", "fixed", "dynamic"
}

Method: Compression.Deflate.Compress(data, configs?):

  • Description: Compresses a string using the raw deflate format

  • Input:

    • String: data = The data to be compressed
    • table?: configs = The configuration table to control the compression
  • Output:

    • String: compressedData = The compressed data
    • int: paddedBits = The number of bits padded at the end of the output

Method: Compression.Deflate.Decompress(compressedData):

  • Description: Decompresses a raw deflate compressed data.

  • Input:

    • String: compressedData = The data to be decompressed
  • Output:

    • String: data = The decompressed data

Method: Compression.Zlib.Compress(data, configs?):

  • Description: Compresses a string using the zlib format

  • Input:

    • String: data = The data to be compressed
    • table?: configs = The configuration table to control the compression
  • Output:

    • String: compressedData = The compressed data
    • int: paddedBits = The number of bits padded at the end of the output

Method: Compression.Deflate.Decompress(compressedData):

  • Description: Decompresses a zlib compressed data.

  • Input:

    • String: compressedData = The data to be decompressed
  • Output:

    • String: data = The decompressed data

Additional Information

Explanation of algorithm

You can view Mark Adler’s explanation of this algorithm here.

Strategies:

There are 3 strategies:

  • fixed” : uses fixed deflate compression block
  • dynamic” : uses dynamic compression block
  • huffman_only” : uses purely huffman compression, doing no LZ77 compression

Levels of compression:

View information about the various levels of compression

Level 0:

  • uses no lazy evaluation
  • no previous good length
  • no max insert length or max lazy match
  • no nice length
  • no max hash chains

Level 1:

  • uses no lazy evaluation
  • no previous good length
  • max insert length and max lazy match of 4
  • nice length of 8
  • 4 max hash chains

Level 2:

  • uses no lazy evaluation
  • no previous good length
  • max insert length and max lazy match of 5
  • nice length of 18
  • 8 max hash chains

Level 3:

  • uses no lazy evaluation
  • no previous good length
  • max insert length and max lazy match of 6
  • nice length of 32
  • 32 max hash chains

Level 4:

  • uses lazy evaluation
  • previous good length of 4
  • max insert length and max lazy match of 4
  • nice length of 16
  • 16 max hash chains

Level 5:

  • uses lazy evaluation
  • previous good length of 8
  • max insert length and max lazy match of 16
  • nice length of 32
  • 32 max hash chains

Level 6:

  • uses lazy evaluation
  • previous good length of 8
  • max insert length and max lazy match of 16
  • nice length of 128
  • 128 max hash chains

Level 7:

  • uses lazy evaluation
  • previous good length of 8
  • max insert length and max lazy match of 32
  • nice length of 128
  • 256 max hash chains

Level 8:

  • uses lazy evaluation
  • previous good length of 32
  • max insert length and max lazy match of 128
  • nice length of 258
  • 1024 max hash chains

Level 9:

  • uses lazy evaluation
  • previous good length of 32
  • max insert length and max lazy match of 258
  • nice length of 258
  • 4096 max hash chains
70 Likes

I would like to bring some attention to this, because this library is absurdly powerful. I have a question as to if it would work with datastores, and if yes, if that can still have some sort of caviat?

I have tested this before and it does an awesome job compressing everything.

Thankfully at least now we have a way higher limit to datastores, (4 million characters (4mb?)) So that fixes a lot of problems, but might come with delays and such, datastores internally do compress data so that’s good.

9 Likes

If you want to use this for storing in datastores, I recommend converting it to base 93 (accounting for characters which are longer than 1 byte when stored in datastores, and the fact you can’t use characters over 128 in datastores).

Other than that you can definitely use it, though it wasn’t the intended purpose but could easily be converted to do so.

3 Likes

Too laggy even at level 1. bruh

I manage to get around this by converting every symbols of the compressed text into “internal numeric representations”. I thought I’d share this solution just in case anyone wanting to store them in Datastore as finding Roblox Lua base93 encoders/decoders is really rare.

You can use string.byte to convert symbols into internal numeric representation, which means that every character will give a unique id.

For example, the symbol � is 255, while other symbol � is 252. It’s pretty much obvious that their the same symbol so you won’t be able to store them in Datastore as they would result in corrupted texts, so storing them in a unique id would solve the issue.

Edit: I found this post which is really helpful:


I would like to use this for my plugin. I have a large table with lots of numbers and I would like to compress it as much as possible. Without compression, it takes up about 190kb, and with compression, it takes up about 75kb.

However, the problem is that the compressed string cannot be parsed.
image

I tried converting the symbols in the string to the corresponding ASCII codes, but it took up a lot more space, and it was still unable to be parsed.

I know this is pretty old, but it would be incredibly useful if anyone can explain what I can do to get this to work.

I’m not 100% sure what the requirements are for plugin settings (and the dev documentation isn’t very much help here either), but I’d assume the problem is that this uses ascii characters from 0 → 255. You’d probably want to do some sort of encoding, (base128/93 if possible, base64 I’m almost confident will work).

I’ll look into this later when I get a chance and get back to you. For now I’d look into doing the above though, or if the article you listed helps, I’d use that instead.

1 Like

Saving in a plugin is similar to a datastore, it requires a value that can be converted to JSON format.

For anyone wondering, you can use this with a datastore if you base64 encode the compressed string before storing it, or you can use base91/base93 but I have not been able to find modules for those.
Encoding a compressed string makes it somewhat bigger but not very much so its still worth it.

2 Likes

Should i leave credits for using this?