Text compression

This was very informative. Thank you for writing this.

1 Like

I never considered compressing strings before. Since I save a lot of my data as strings this will probably come in handy sooner of later…

3 Likes

I found that some characters are not really a single byte when they’re stored in a datastore because they’re encoded in JSON.

here's a table of encodings
1 \u0001
2 \u0002
3 \u0003
4 \u0004
5 \u0005
6 \u0006
7 \u0007
8 \b
9 \t
10 \n
11 \u000B
12 \f
13 \r
14 \u000E
15 \u000F
16 \u0010
17 \u0011
18 \u0012
19 \u0013
20 \u0014
21 \u0015
22 \u0016
23 \u0017
24 \u0018
25 \u0019
26 \u001A
27 \u001B
28 \u001C
29 \u001D
30 \u001E
31 \u001F
34 \"
92 \\

I’ve updated the program to avoid using these characters. It will escape these characters before compression and replace them with \127 followed by a different character, and interpret the escaped characters upon decompression. The initial dictionary size is 93 instead of 127 now, so the compression rate is slightly worse (although technically it’s better now than it was before). @ColdSmoke’s example now compresses to 33.2% of the original size, or 182,258 characters, which is 70% of a datastore key’s maximum capacity.

6 Likes

I coded functions to change base of an unsigned integer, they looked so similiar to yours that I considered posting them!

local encode, decode do
	
	--offset=33 base=92 --default
	--offset=48 base=10 --decimal
	function encode(int, base, offset)
		local codepoints = {}
		local i = 0
		while int + 1 >= base^i do
			local code = math.floor(int/base^i % base + offset)
			table.insert(codepoints, 1, code)
			i = i + 1
		end
		return utf8.char(unpack(codepoints))
	end
	
	function decode(s, base, offset)
		local int = 0
		local length = #s
		for i, code in utf8.codes(s) do
			int = int + (code-offset)*base^(length-i)
		end
		return int
	end
end

Anyone interested feel free to use it.
Note: if you would like to use smaller bases, you may consider using
Variant tonumber(Variant arg, int base = 10).

1 Like

Can’t get it to work or am I doing something wrong?

local MS = require(script.ModuleScript)

local HttpService = game.HttpService

local Dictionary = {

Cash = 500,

Level = 1,

EXP = {0,50},

Kills = 0,

Deaths = 0,

Rank = 'Rookie',

Membership = 'None',

}

print(#HttpService:JSONEncode(Dictionary))

local Compressed; Compressed = MS.compress(Dictionary)

print(#Compressed)

attempt to call method ‘gsub’ (a nil value)

Stack Begin

Script ‘ServerScriptService.Script.ModuleScript’, Line 17 - upvalue escape

Script ‘ServerScriptService.Script.ModuleScript’, Line 66 - field compress

Script ‘ServerScriptService.Script’, Line 17

Stack End

1 Like

You need to compress the JSON string, not the table.

Something like:

local JSONstring = HttpService:JSONEncode(Dictionary)

local Compressed = MS.compress(JSONstring)
5 Likes

Let me try that and I’ll get back to you.

(Edit)

Okay it works!

But I did this and the results was

print(#JSONstring)

print(#Compressed)

92

166

print(JSONstring)

print(Compressed)

"Cash":500,"EXP":[0,50],"Deaths":0,"Membership":"None","Level":1,"Rank":"Rookie","Kills":0}

1,80|{! A C a s h!# : 5 0 0 ,!# E X P!) [!-!+ ]!. A D e a t h s!)!-!# M e m b e r!’ i p!)!# N o n e!#!8 L e v e l!) 1!8 R a n k!L A R o o k i!Q A!8 K i l l!? A : 0 }


Seems like it still needs some work

@1waffle1

2 Likes

The compression works better with longer strings that have a lot of repeating patterns.

For smaller strings, I’d just leave them as they are.

7 Likes

It doesn’t. It is impossible for any compression algorithm to map every input to a smaller output. Your string is already so small that it doesn’t need to be (and can’t be) compressed.

14 Likes

So, lets say I have a minecraft-like game, and I want to save block changes. An example of one block changes saved to my string format would be something like “B1/1000/100/1000/3/1/3999/a/b/c,”

  • B1 means the block type. Ex: B1,B2,B3
  • 1000/100/1000 is the grid coordinate (1grid=4studs)
  • 3/1/3999 would be the amount of times the block is repeated on respective axis (x,y,z)
  • a, b and c would be special properties if needed (ex: door would have a property as 1 or 0, to indicate if its opened)
    Would this be already too compressed for your code?

Information that has patterns or biases can be simplified. It looks like your format uses lots of / and numbers, which is an attribute which can be shortened by using more different types of characters to represent common repeated substrings like /1/. If you try feeding it a large example then you can see for yourself how much it can be compressed.

1 Like

Yeah, I was thinking that I could compress this even more. I thought of doing a dictionary with a lot of subdivisions. For example, a lot of “air” was created starting on the same x coordinate 300, but with different y and z. Instead of writing “Air/x1/y1/z1,Air/x1/y1/z2,Air/x1/y1/z3” over and over I do “Air={x1={y1={z1,z2,z3}},y2={…}},x2={…},x3={…}}”.

Combined with your compressor, I guess it will take a lot of time for some minecraft like game to fill the datastore with map changes.

I’ve been loving this module! Thank you so much for sharing this.
It can be a bit slow when dealing with large data, so I decided to try optimizing.

Decompression is now 34.8% faster and compression is 6.6% faster in my test.

Bench
local default = require(script.default)
local optimized = require(script.optimized)

local HttpService = game:GetService("HttpService")

return {

	ParameterGenerator = function()
		local data = {}

		for i=1, 30 do
			local characters = table.create(100)
			for n=1, 100 do
				characters[n] = math.random(40, 120)
			end

			data[HttpService:GenerateGUID(false)] = string.char(table.unpack(characters))
		end

		return HttpService:JSONEncode(data)
	end;

	Functions = {
		["optimized"] = function(Profiler, Input)
			Profiler.Begin("compress")
			local compressed = optimized.Compress(Input)
			Profiler.End()

			Profiler.Begin("Decompress")
			local decompressed = optimized.Decompress(compressed)
			Profiler.End()
		end;

		["default"] = function(Profiler, Input)
			Profiler.Begin("compress")
			local compressed = default.Compress(Input)
			Profiler.End()

			Profiler.Begin("Decompress")
			local decompressed = default.Decompress(compressed)
			Profiler.End()
		end;
	};
}

Changes:

  • Cached base10 and base93 conversions
  • Used string global functions instead of string methods (ie string.sub(s) instead of s:sub())
  • Used table.insert(t) instead of t[#t+1] since this is faster in latest Luau as it avoids checking table length
  • Used ipairs instead of numeric+indexing to loop over groups
  • Used Luau’s += where possible
  • Fixed shadowed variables
  • Used StyLua 0.11.3 for standardized formatting

Edit: go use the fixed version shared later in this thread

38 Likes

Thank you so much for providing this optimisation!

I was inefficiently sending a gigantic table of data over remote events between the server and client that must’ve been several megabytes large and was incredibly slow. I fixed this by converting the table into JSON using HttpService’s JSONEncode then compressing the JSON using this optimised module, then decompressing and JSONDecode back into a table again. It’s now lightning fast, with little noticeable changes in overall performance, thank you so much!

2 Likes

For some weird reason string.char(3) causes it to break

image

local test = require(script.ModuleScript)

local compressed = test.Compress("hey")

print(compressed)

local compressedbroken = test.Compress("hey" .. string.char(3))

print(compressedbroken)

Is there a way to prevent it from breaking the script?

image

3 Likes

excellent for datastore map saving systems.

Thanks for this!

It works great,with number compression from suphi,This was able to reduce the size by 60 percent and on one duplicated data case,94 percent!

This module works great on most data and compresses it just fine to a small size. Thank you for that! However, there are quite a few edge cases where the module outright breaks, which I ran into when trying to save string keys of a number (positive and negative), and also the one that rickje139 said earlier. Not sure how to fix these, in my case it fails in the tobase10 function when multiplying (line 71).

Edit: Looks like it was my own script’s fault :sweat_smile: however, rickje139’s issue is still present, and byte 127 is also unuseable.

1 Like

Out of curiosity, what’s the use case? I doubt any sort of compression algorithm is going to make a negligible difference unless there is a weird edge case.

1 Like

In my game I normally use only < 1 percent, But as every game itll get bigget, I grabbed this module, Try it on my game and it works, And the point is at that point I had alot of loop hole of duplicated data in my game, Instead of 12 percent, It compress that 12 percent down by 96 percent