Text compression

This module works great on most data and compresses it just fine to a small size. Thank you for that! However, there are quite a few edge cases where the module outright breaks, which I ran into when trying to save string keys of a number (positive and negative), and also the one that rickje139 said earlier. Not sure how to fix these, in my case it fails in the tobase10 function when multiplying (line 71).

Edit: Looks like it was my own script’s fault :sweat_smile: however, rickje139’s issue is still present, and byte 127 is also unuseable.

1 Like

Out of curiosity, what’s the use case? I doubt any sort of compression algorithm is going to make a negligible difference unless there is a weird edge case.

1 Like

In my game I normally use only < 1 percent, But as every game itll get bigget, I grabbed this module, Try it on my game and it works, And the point is at that point I had alot of loop hole of duplicated data in my game, Instead of 12 percent, It compress that 12 percent down by 96 percent

Looks like this module doesn’t work with emoji.
Screenshot_485

I’m currently using boatbomber’s version, but Waffle’s version also doesn’t work with emoji.

Any idea how can i make emoji work with this?

The only idea i can think about is having a table with ALL the emoji and assign them special code (for example: [:grinning:] = “HappyEmoji”). Then, before I compress string, I convert all emojis to this code and then compress them using this module. And after i decompress my string i can just convert code back to emoji. BUT the issue with this method is that I would have to include a TON of emoji because there are a lot of them and i just don’t like it.

1 Like

Pretty much yeah. U have to. B

Emoji’s seem to work fine for me with both versions.
image

Its just that string.char(3) creates an error.

In order to have emoji’s work with the compressor you have to increase this number from 127 to 255
image

2 Likes

I noticed an issue with this compressor and it caused a duplicate character to appear which made it unuseable for me.

image
image
as you can see hey changed to heyy at the end.

The original version from Waffle does not do this though.
image

2 Likes

That worked! Thank you so much! :heart:

Please read my latest post, emoji’s have an impact on the result i think so you should still be careful with implementing it.

Waffle’s version of the compressor works correctly with emoji’s if you do the same trick.

1 Like

Yeah i already moved back to Waffle’s version because boatbomber’s version for some reason didn’t compressed (or decompressed?) properly and it caused my strings to become not the ones i originally saved.

i just checked the same exact code again today and for some reason it appears the duplicate character is gone now on boatbombers version, it might have been some weird instability or a roblox issue :man_shrugging:

2 Likes

works perfectly, completely halves build saves in my game

1 Like

This error occurs with the space character (" ") as well

local text = "hello there"
local comp = compress(text)
local decomp = decompress(comp)
print(comp)
print(decomp)
print(text == decomp)

1Waffle1’s code:
→ 7,3|hello t!! r e
→ hello there
→ true

@boatbomber’s code:
→ 7,3|helloot!! r e
→ helloothere
→ false

I think I have a fix for the bug in @boatbomber code until he notices. Inside the base93 function that caches information replace local value = b93Cache[n] with local value = b93Cache[n+1]
@nick_hz @rickje139

3 Likes

Thanks for finding it! I appreciate the effort and contribution!

2 Likes

Hi, I just found out about this. Is there any way we could have the same text compression algorithm for JavaScript. I want to be able to communicate with my server from a website.

After spending two days scrambling my head on why yours and @1waffle1’s implementation could not encode emojis,

I decided to fix (and optimize!) your script even further!

EDIT
Use this.

Don't use this
-- Module by 1waffle1 and boatbomber, optimized and fixed by iiau
-- https://devforum.roblox.com/t/text-compression/163637/37

local dictionary = {}
do -- populate dictionary
	local length = 0
	for i = 32, 127 do
		if i ~= 34 and i ~= 92 then
			local c = string.char(i)
			dictionary[c], dictionary[length] = length, c
			length = length + 1
		end
	end
end

local escapemap_126, escapemap_127 = {}, {}
local unescapemap_126, unescapemap_127 = {}, {}

local blacklisted_126 = { 34, 92, 126, 127 }
for i = 128, 180 do
	table.insert(blacklisted_126, i)
end

do -- Populate escape map
	-- represents the numbers 1-31, 34, 92, 126 and 127 (35 characters)
	-- https://devforum.roblox.com/t/text-compression/163637/5
	for i = 1, 31 + #blacklisted_126 do
		local b = blacklisted_126[i - 31]
		local s = i + 31

		-- Note: 126 and 127 are magic numbers
		local c = string.char(b or i)
		local e = string.char(s + (s >= 34 and 1 or 0) + (s >= 92 and 1 or 0))

		escapemap_126[c] = e
		unescapemap_126[e] = c
	end

	for i = 1, 255 - 181 do
		local c = string.char(i + 180)
		local s = i + 34
		local e = string.char(s + (s >= 92 and 1 or 0))

		escapemap_127[c] = e
		unescapemap_127[e] = c
	end
end

local function escape(s)
	-- escape the control characters 0-31, double quote 34, backslash 92 and DEL 127 (34 chars)
	-- escape characters 128-180 (53 chars)
	return string.gsub(string.gsub(s, '[%c"\\\127-\180]', function(c)
		return "\126" .. escapemap_126[c]
	end), '[\181-\255]', function(c)
		return "\127" .. escapemap_127[c]
	end)
end
local function unescape(s)
	return string.gsub(string.gsub(s, "\127(.)", function(e)
		return unescapemap_127[e]
	end), "\126(.)", function(e)
		return unescapemap_126[e]
	end)
end

local b92Cache = {}
local function tobase92(n)
	local value = b92Cache[n]
	if value then
		return value
	end

	local c = n
	value = ""
	repeat
		local remainder = n % 92
		value = dictionary[remainder] .. value
		n = (n - remainder) / 92
	until n == 0

	b92Cache[c] = value
	return value
end

local b10Cache = {}
local function tobase10(value)
	local n = b10Cache[value]
	if n then
		return n
	end

	n = 0
	for i = 1, #value do
		n = n + math.pow(92, i - 1) * dictionary[string.sub(value, -i, -i)]
	end

	b10Cache[value] = n
	return n
end

local function compress(text)
	assert(type(text) == "string", "bad argument #1 to 'compress' (string expected, got " .. typeof(text) .. ")")
	local dictionaryCopy = table.clone(dictionary)
	local key, sequence, size = "", {}, #dictionaryCopy
	local width, spans, span = 1, {}, 0
	local function listkey(k)
		local value = tobase92(dictionaryCopy[k])
		local valueLength = #value
		if valueLength > width then
			width, span, spans[width] = valueLength, 0, span
		end
		table.insert(sequence, string.rep(" ", width - valueLength) .. value)
		span += 1
	end
	text = escape(text)
	for i = 1, #text do
		local c = string.sub(text, i, i)
		local new = key .. c
		if dictionaryCopy[new] then
			key = new
		else
			listkey(key)
			key = c
			size += 1
			dictionaryCopy[new], dictionaryCopy[size] = size, new
		end
	end
	listkey(key)
	spans[width] = span
	return table.concat(spans, ",") .. "|" .. table.concat(sequence)
end

local function decompress(text)
	assert(type(text) == "string", "bad argument #1 to 'decompress' (string expected, got " .. typeof(text) .. ")")
	local dictionaryCopy = table.clone(dictionary)
	local sequence, spans, content = {}, string.match(text, "(.-)|(.*)")
	local groups, start = {}, 1
	for span in string.gmatch(spans, "%d+") do
		local width = #groups + 1
		groups[width] = string.sub(content, start, start + span * width - 1)
		start = start + span * width
	end
	local previous

	for width, group in ipairs(groups) do
		for value in string.gmatch(group, string.rep(".", width)) do
			local entry = dictionaryCopy[tobase10(value)]
			if previous then
				if entry then
					table.insert(dictionaryCopy, previous .. string.sub(entry, 1, 1))
				else
					entry = previous .. string.sub(previous, 1, 1)
					table.insert(dictionaryCopy, entry)
				end
				table.insert(sequence, entry)
			else
				sequence[1] = entry
			end
			previous = entry
		end
	end
	return unescape(table.concat(sequence))
end

return { compress = compress, decompress = decompress }

This library works with emojis now.

In order to escape the ascii values 128-255, I decided to use the tilde/ascii 126 character as another magic number. So this script actually encodes in base92 in contrast to the base93 that you two had, making it just a tiny bit more inefficient in compression but it still does its job well.

Furthermore, I also included the optimization math.pow instead of the ^ operator and fixed up a cache issue with tobase93, which has become tobase92.

Disclaimer: If you already used 1waffle1/boatbomber’s implementation to compress saved data, my implementation is not compatible if you try to decompress that same data.

2 Likes

great job :+1: . Looks really cool, do you mind sending performance benchmarks.

I made very small changes that should slightly increase performance.

What I changed:

  • removed ipairs, luau natively supports for loops over tables now
  • defined all built in functions at the start of script, having variables locally is faster

I didn’t change anything big but it still is faster!

-- Module by 1waffle1 and boatbomber, optimized and fixed by iiau
-- https://devforum.roblox.com/t/text-compression/163637/37

local dictionary = {}

-- save builtin libraries 
local char = string.char
local insert = table.insert
local gsub = string.gsub
local pow = math.pow
local sub = math.sub
local concat = table.concat
local rep = string.rep
local clone = table.clone
local gmatch = string.gmatch
local byte = string.byte

do -- populate dictionary
	local length = 0
	for i = 32, 127 do
		if i ~= 34 and i ~= 92 then
			local c = char(i)
			dictionary[c], dictionary[length] = length, c
			length = length + 1
		end
	end
end

local escapemap_126, escapemap_127 = {}, {}
local unescapemap_126, unescapemap_127 = {}, {}

local blacklisted_126 = { 34, 92, 126, 127 }
for i = 128, 180 do
	insert(blacklisted_126, i)
end

do -- Populate escape map
	-- represents the numbers 1-31, 34, 92, 126 and 127 (35 characters)
	-- https://devforum.roblox.com/t/text-compression/163637/5
	for i = 1, 31 + #blacklisted_126 do
		local b = blacklisted_126[i - 31]
		local s = i + 31

		-- Note: 126 and 127 are magic numbers
		local c = char(b or i)
		local e = char(s + (s >= 34 and 1 or 0) + (s >= 92 and 1 or 0))

		escapemap_126[c] = e
		unescapemap_126[e] = c
	end

	for i = 1, 255 - 181 do
		local c = char(i + 180)
		local s = i + 34
		local e = char(s + (s >= 92 and 1 or 0))

		escapemap_127[c] = e
		unescapemap_127[e] = c
	end
end

local function escape(s)
	-- escape the control characters 0-31, double quote 34, backslash 92 and DEL 127 (34 chars)
	-- escape characters 128-180 (53 chars)
	return gsub(gsub(s, '[%c"\\\127-\180]', function(c)
		return "\126" .. escapemap_126[c]
	end), '[\181-\255]', function(c)
		return "\127" .. escapemap_127[c]
	end)
end
local function unescape(s)
	return gsub(gsub(s, "\127(.)", function(e)
		return unescapemap_127[e]
	end), "\126(.)", function(e)
		return unescapemap_126[e]
	end)
end

local b92Cache = {}
local function tobase92(n)
	local value = b92Cache[n]
	if value then
		return value
	end

	local c = n
	value = ""
	repeat
		local remainder = n % 92
		value = dictionary[remainder] .. value
		n = (n - remainder) / 92
	until n == 0

	b92Cache[c] = value
	return value
end

local b10Cache = {}
local function tobase10(value)
	local n = b10Cache[value]
	if n then
		return n
	end

	n = 0
	for i = 1, #value do
		n = n + pow(92, i - 1) * dictionary[sub(value, -i, -i)]
	end

	b10Cache[value] = n
	return n
end

local function compress(text)
	local dictionaryCopy = clone(dictionary)
	local key, sequence, size = "", {}, #dictionaryCopy
	local width, spans, span = 1, {}, 0
	local function listkey(k)
		local value = dictionaryCopy[k]
		if not value then
			warn(byte(k))
		end
		value = tobase92(value)
		local valueLength = #value
		if valueLength > width then
			width, span, spans[width] = valueLength, 0, span
		end
		insert(sequence, rep(" ", width - valueLength) .. value)
		span += 1
	end
	text = escape(text)
	for i = 1, #text do
		local c = sub(text, i, i)
		local new = key .. c
		if dictionaryCopy[new] then
			key = new
		else
			listkey(key)
			key = c
			size += 1
			dictionaryCopy[new], dictionaryCopy[size] = size, new
		end
	end
	listkey(key)
	spans[width] = span
	return concat(spans, ",") .. "|" .. concat(sequence)
end

local function decompress(text)
	local dictionaryCopy = clone(dictionary)
	local sequence, spans, content = {}, match(text, "(.-)|(.*)")
	local groups, start = {}, 1
	for span in gmatch(spans, "%d+") do
		local width = #groups + 1
		groups[width] = sub(content, start, start + span * width - 1)
		start = start + span * width
	end
	local previous

	for width, group in groups do -- removed ipairs, that slows stuff down
		for value in gmatch(group, rep(".", width)) do
			local entry = dictionaryCopy[tobase10(value)]
			if previous then
				if entry then
					insert(dictionaryCopy, previous .. sub(entry, 1, 1))
				else
					entry = previous .. sub(previous, 1, 1)
					insert(dictionaryCopy, entry)
				end
				insert(sequence, entry)
			else
				sequence[1] = entry
			end
			previous = entry
		end
	end
	return unescape(concat(sequence))
end

return { compress = compress, decompress = decompress }
2 Likes

I tested it out and I was able to compress a 268064 character file down to 52776 in 0.095766 sec.

Edit: I optimized it a little bit more and was able to bring the time down to an average of 0.0678s compression and 0.0223s decompression. I’m running a pretty crappy laptop too and running it through the command bar in studio so I believe it should be faster on actual Roblox servers.

1 Like

I edited my previous comment with a modified version of your code, I didn’t make big changes like you did but still improvements.

But nice changes, I’ll probably use it in future projects!

1 Like