Text compression

The compression works better with longer strings that have a lot of repeating patterns.

For smaller strings, I’d just leave them as they are.

7 Likes

It doesn’t. It is impossible for any compression algorithm to map every input to a smaller output. Your string is already so small that it doesn’t need to be (and can’t be) compressed.

14 Likes

So, lets say I have a minecraft-like game, and I want to save block changes. An example of one block changes saved to my string format would be something like “B1/1000/100/1000/3/1/3999/a/b/c,”

  • B1 means the block type. Ex: B1,B2,B3
  • 1000/100/1000 is the grid coordinate (1grid=4studs)
  • 3/1/3999 would be the amount of times the block is repeated on respective axis (x,y,z)
  • a, b and c would be special properties if needed (ex: door would have a property as 1 or 0, to indicate if its opened)
    Would this be already too compressed for your code?

Information that has patterns or biases can be simplified. It looks like your format uses lots of / and numbers, which is an attribute which can be shortened by using more different types of characters to represent common repeated substrings like /1/. If you try feeding it a large example then you can see for yourself how much it can be compressed.

1 Like

Yeah, I was thinking that I could compress this even more. I thought of doing a dictionary with a lot of subdivisions. For example, a lot of “air” was created starting on the same x coordinate 300, but with different y and z. Instead of writing “Air/x1/y1/z1,Air/x1/y1/z2,Air/x1/y1/z3” over and over I do “Air={x1={y1={z1,z2,z3}},y2={…}},x2={…},x3={…}}”.

Combined with your compressor, I guess it will take a lot of time for some minecraft like game to fill the datastore with map changes.

I’ve been loving this module! Thank you so much for sharing this.
It can be a bit slow when dealing with large data, so I decided to try optimizing.

Decompression is now 34.8% faster and compression is 6.6% faster in my test.

Bench
local default = require(script.default)
local optimized = require(script.optimized)

local HttpService = game:GetService("HttpService")

return {

	ParameterGenerator = function()
		local data = {}

		for i=1, 30 do
			local characters = table.create(100)
			for n=1, 100 do
				characters[n] = math.random(40, 120)
			end

			data[HttpService:GenerateGUID(false)] = string.char(table.unpack(characters))
		end

		return HttpService:JSONEncode(data)
	end;

	Functions = {
		["optimized"] = function(Profiler, Input)
			Profiler.Begin("compress")
			local compressed = optimized.Compress(Input)
			Profiler.End()

			Profiler.Begin("Decompress")
			local decompressed = optimized.Decompress(compressed)
			Profiler.End()
		end;

		["default"] = function(Profiler, Input)
			Profiler.Begin("compress")
			local compressed = default.Compress(Input)
			Profiler.End()

			Profiler.Begin("Decompress")
			local decompressed = default.Decompress(compressed)
			Profiler.End()
		end;
	};
}

Changes:

  • Cached base10 and base93 conversions
  • Used string global functions instead of string methods (ie string.sub(s) instead of s:sub())
  • Used table.insert(t) instead of t[#t+1] since this is faster in latest Luau as it avoids checking table length
  • Used ipairs instead of numeric+indexing to loop over groups
  • Used Luau’s += where possible
  • Fixed shadowed variables
  • Used StyLua 0.11.3 for standardized formatting

Edit: go use the fixed version shared later in this thread

38 Likes

Thank you so much for providing this optimisation!

I was inefficiently sending a gigantic table of data over remote events between the server and client that must’ve been several megabytes large and was incredibly slow. I fixed this by converting the table into JSON using HttpService’s JSONEncode then compressing the JSON using this optimised module, then decompressing and JSONDecode back into a table again. It’s now lightning fast, with little noticeable changes in overall performance, thank you so much!

2 Likes

For some weird reason string.char(3) causes it to break

image

local test = require(script.ModuleScript)

local compressed = test.Compress("hey")

print(compressed)

local compressedbroken = test.Compress("hey" .. string.char(3))

print(compressedbroken)

Is there a way to prevent it from breaking the script?

image

3 Likes

excellent for datastore map saving systems.

Thanks for this!

It works great,with number compression from suphi,This was able to reduce the size by 60 percent and on one duplicated data case,94 percent!

This module works great on most data and compresses it just fine to a small size. Thank you for that! However, there are quite a few edge cases where the module outright breaks, which I ran into when trying to save string keys of a number (positive and negative), and also the one that rickje139 said earlier. Not sure how to fix these, in my case it fails in the tobase10 function when multiplying (line 71).

Edit: Looks like it was my own script’s fault :sweat_smile: however, rickje139’s issue is still present, and byte 127 is also unuseable.

1 Like

Out of curiosity, what’s the use case? I doubt any sort of compression algorithm is going to make a negligible difference unless there is a weird edge case.

1 Like

In my game I normally use only < 1 percent, But as every game itll get bigget, I grabbed this module, Try it on my game and it works, And the point is at that point I had alot of loop hole of duplicated data in my game, Instead of 12 percent, It compress that 12 percent down by 96 percent

Looks like this module doesn’t work with emoji.
Screenshot_485

I’m currently using boatbomber’s version, but Waffle’s version also doesn’t work with emoji.

Any idea how can i make emoji work with this?

The only idea i can think about is having a table with ALL the emoji and assign them special code (for example: [:grinning:] = “HappyEmoji”). Then, before I compress string, I convert all emojis to this code and then compress them using this module. And after i decompress my string i can just convert code back to emoji. BUT the issue with this method is that I would have to include a TON of emoji because there are a lot of them and i just don’t like it.

1 Like

Pretty much yeah. U have to. B

Emoji’s seem to work fine for me with both versions.
image

Its just that string.char(3) creates an error.

In order to have emoji’s work with the compressor you have to increase this number from 127 to 255
image

2 Likes

I noticed an issue with this compressor and it caused a duplicate character to appear which made it unuseable for me.

image
image
as you can see hey changed to heyy at the end.

The original version from Waffle does not do this though.
image

2 Likes

That worked! Thank you so much! :heart:

Please read my latest post, emoji’s have an impact on the result i think so you should still be careful with implementing it.

Waffle’s version of the compressor works correctly with emoji’s if you do the same trick.

1 Like

Yeah i already moved back to Waffle’s version because boatbomber’s version for some reason didn’t compressed (or decompressed?) properly and it caused my strings to become not the ones i originally saved.