How do base 64, 91, and 93 encoders/decoders work?

au_tk · May 16, 2023, 12:37am

~~Hi. Recently I’ve been looking into methods to compress data, and I came across Zlib/deflate, which immediately caught my intention.~~
~~However, in order to properly function with datastores it needs to be encoded in apparently Base 64, 91, or 93.~~
Now I know nothing about manipulating the bytes of characters for text compression and base encryption etc. My assumption is that Base93 is the most compact out of the bunch since it has a bigger number after the base, (meaning a wider range of characters I assume?) and more characters for more dense combinations of characters.
~~The problem with me not knowing anything about byte stuff, is that I have to rely a module made by someone else that does the byte stuff for me.~~
~~Unfortunately, I couldn’t find a proper module that encoded/decoded strings to and from Base93. Leaving me with two options; learn it or ask for code.~~
Well, I couldn’t find any good tutorials or explanations (at all) about manipulating bytes and stuff for newbies of the topic like me. (Even if I did it would probably be too much for my potato brain to handle-

So basically, could any of you explain byte manipulation to me, and how to actually uh, do it?

If anyone has any good tutorials or explanations please link me it. Would be very much appreciated.

Prototrode · May 16, 2023, 1:01am

I’m gonna guess that you’re talking about BIT manipulation. Because Roblox doesn’t run on Lua 5.4 which has native support for bitwise operations, Roblox gives you the bit32 library instead. The documentation explains to you what all of the functions and operations do. You can also look up the individual bitwise operations online.

But essentially, you give it base10 numbers and it will change bits in base2 and then return the new composite number back out as base10.

For example, the base10 number 8 in binary is 1000. When you shift the bits one place to the left, it becomes 10000 with an extra zero. This translates to 16 in base10.

print(bit32.lshift(8, 1)) --> 16

Another example, The XOR operation will compare the bits of two numbers and the bit for each place will only be 1 if the bits are different. Lets say the XOR of 7 and 3:
7 in binary is 0111
3 in binary is 0011
Only the 2nd place has different bits, so the result is 0100, which translates to 4.

print(bit32.bxor(7, 3)) --> 4

Everything else you can look up yourself.

au_tk · May 16, 2023, 10:42pm

Sorry for the late response, but I was not referring to bit manipulation. Error on my bad for poorly writing the topic. I realized that “byte manipulation” wasn’t the correct term for string.byte and string.char.

To rephrase my my original question: How do Base 64, 91, 93 etc encoders/decoders work, and how can I make my own?

Prototrode · May 17, 2023, 10:51pm

Let’s go over the basics first. There are various number systems used to count… numbers. Humanity is most familiar with the decimal system, or base10, meaning there are 10 different digits (including zero) used to represent a value for each position of the number. 0, 1, 2, 3, up to 9, before you have to increment the next position and then continue counting. Different bases will just have a different amount of digits used to represent each position of the number.

Counting in binary (base2) goes like: 00, 01 10, 11 …
Counting in quaternary (base4): 00, 01, 02, 03, 10, 11, 12, 13, 20, 21 …
Counting in hexadecimal (base16): 00 … 09, 0A, 0B, 0C, 0D, 0E, 0F, 10 …
And counting in base64 will use the entire English alphabet, both upper and lower case, plus (usually) the characters - and _. Think of it like this: 10 decimal digits + 26 uppercase letters + 26 lowercase letters + 2 auxiliary characters = 64 total digits.

Converting between bases explanation

How do you convert between bases?

Let’s count in binary again.
Observe the times when it moves onto the next position and increments it once.

0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000

Do you notice a pattern?

You need to count twice as much as the previous time before arriving at a new position. You can also say that adding a new position to the number doubled its total capacity. But why double? Because the base of this number system means there’s two digits. The math for this looks like this:

capacity = base ^ position

And we can test it. Say what is the maximum capacity of a binary with 8 places?

capacity = 2 ^ 8 = 256

And to verify, we take the binary 11111111 and convert it to decimal using some online converter:

Reminder: Zero is also included, which is why it says 255 and not 256!

Finding the capacity of each position in a specific number system is the basis of base conversion. Because if you know how much each position can store, you know when you should increment the next one, and it lets you formulate the entire number that way.

Now, let’s think of it differently. Instead of each position representing a capacity, say that each position, if it was all alone with no other values, is exactly what value it holds.

10 -> 2
11 --> 3
100 -> 4
111 --> 7
1000 -> 8
1111 --> 15

value = base ^ (position - 1)

And as an example, the binary number 10000000 where the 1 is at the 8th place:

value = 2 ^ (8 - 1) = 2 ^ 7 = 128

Let’s try it out! We will be converting the decimal number 123 into binary. First, what’s the biggest value of any position that can still fit into 123? It should be 64, which happens to be the 5th place of our number:

01000000

After giving it the 5th place, the remaining number is 123 - 64, which is 59. We then repeat this process, finding the largest position who’s value can still fit into the decimal.

32 = 00100000
59 - 32 = 27
16 = 00010000
27 - 16 = 11
8 = 00001000
11 - 8 = 3
2 = 00000010
3 - 2 = 1
1 = 00000001
1 - 1 = 0
When we combine all these positions into the final number, we get 01111011. We can check if this is correct by converting it back to decimal with the converter tool:

But what if you want to work backwards yourself and convert binary back into decimal? Well, remember the value of each position? If they are filled, that means the value constitutes to the number.

0   1   1   1   1   0   1   1
.   64  32  16  8   .   2   1

If you add all of the values up, 64+32+16+8+2+1, you’ll end up with 123 again.

However, things will get more complicated when you work with higher bases. Let’s now try converting 1234 into a hexadecimal. Now, you’re trying to find the largest number that’s a multiple of the largest position’s value that can still fit into the decimal. Don’t understand? Watch:

First, here are the values of the smallest digits of 4 positions:

4096   256   16   1

The largest position who’s value can still fit into 1234 is the 3rd position.
The value of the 3rd position is 256.
What’s the largest multiple of 256 that can still fit into 1234?
floor(1234 / 256) = 4
256 * 4 = 1024
1234 - 1024 = 210 remainder
The largest position who’s value can still fit into 210 is the 2nd position.
The value of the 2nd position is 16.
What’s the largest multiple of 16 that can still fit into 210?
floor(210 / 16) = 13
16 * 13 = 208
210 - 208 = 2 remainder
The largest position who’s value can still fit into 2 is the 1st position.
The value of the 1st position is 1.
What’s the largest multiple of 1 that can still fit into 2?
floor(2 / 1) = 2
1 * 2 = 2
2 - 2 = 0 remainder, we are done!

We now know that the values of the hexadecimal positions are 4, 13, 2. But these are still in base10 and 13 cannot be represented by a single position! Luckily, you can easily convert these into hex digits since the math have already confirmed that they will each fit into a single position. Many CS students memorize that 13 is the digit D in hexadecimal.

So the final number is: 4D2. And checking it…

And once again, let’s work in reverse on our own.

4   D   2
256 16  1

We have to multiply the digits by their position’s value. Before that, translate each position back into decimal:

4   13  2
256 16  1

Now we multiply:

4 * 256 = 1024
13 * 16 = 208
2 * 1 = 2

And adding them up, 1024+208+2 = 1234.

Coding

We can devise a simple recursion formula to convert decimal into binary. We just need to write the logic with code:

local function decimalToBinary(n) 
    local largest = math.floor(math.log(n, 2)) --find the largest position that can fit into the number
	local result = table.create(largest+1)
    for i = largest, 0, -1 do --starting from the largest position:
        local value = 2^i --the value of the position
        table.insert(result, math.floor(n / value)) --record the number of times that value can fit
        n %= value --find the remainder and repeat
    end
    return table.concat(result)
end

print(decimalToBinary(12345)) --> 11000000111001

In fact, we can get it to convert decimal to ANY base if we provide the digits of that new base:

local function decimalToBase(n, baseDigits) 
	local base = #baseDigits
	local largest = math.floor(math.log(n, base))
	local result = table.create(largest+1)
    for i = largest, 0, -1 do
        local value = base^i
        table.insert(result, math.floor(n / value))
        n %= value
    end
    for k, v in result do
		result[k] = baseDigits[v+1]
	end
	return table.concat(result)
end

local hexList = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',}
print(decimalToBase(12345, hexList)) --> 3039

And here’s the way of converting it back:

local function baseToDecimal(n, baseDigits)
    local base = 0
	local length = #n
	for _ in baseDigits do base += 1 end
	
    local result = 0
    for i = 1, #n do
        local value = base^(i-1)
		local position = length - i + 1
        local digit = baseDigits[string.sub(n, position, position)] --convert the digit to decimal
        result += value*digit
    end
    return result
end

local hexList = {['0']=0, ['1']=1, ['2']=2, ['3']=3, ['4']=4, ['5']=5, ['6']=6, ['7']=7, ['8']=8, ['9']=9, ['A']=10, ['B']=11, ['C']=12, ['D']=13, ['E']=14, ['F']=15}
print(baseToDecimal('3039', hexList)) --> 12345

au_tk · May 19, 2023, 2:03am

The explanation you provided was very thorough and informative, so thank you for that. But I was referring to string conversion, not number (again, my fault for not specifying further AGAIN). If you don’t get what I mean see the image below:

I tried to code my own base93 string encoding/decoding thingy (it didn’t go very well).

Here’s the code if you’re interested, it’s super messy though since I just wrote it as a proof of concept:

--In a modulescript
local digits = {} 
for i = 33, 93+33 do --start at 33 because we will use space for padding, and everything before space is weird so we won't use it either
	local char = string.char(i)
	digits[i - 33], digits[char] = char, i - 33 --make it so if you index the ascii code you get the character, and vice versa (index character -> ascii code), NOTE: the 'ascii' code will actually be 33 less than it actually is for the character since i did "i - 33" to account for the i starting at 33
end
local base93 = {}

base93.encode = function(str: string): string
	local encoded = {}
	for i = 1, #str do
		local result = i ~= 1 and " " or "" -- padding
		local char = string.sub(str, i, i)
		local code = string.byte(char)
		local remainder = code
		while remainder ~= 0 do
			local quotient = math.floor(remainder/93)
			if quotient ~= 0 then
				remainder %= 93
				result ..= digits[quotient]
			else
				remainder %= 93
				result ..= digits[remainder]
				break
			end			
		end
		table.insert(encoded, result)
	end
	return table.concat(encoded)
end

base93.decode = function(str: string): string
	local decoded = {}
	for _, s in pairs(str:split(" ")) do
		local result = ""
		for i = 1, #s do
			local code = string.byte(string.sub(s, i, i)) - 33
			result ..= string.char(code)
		end
		table.insert(decoded, result)
	end
	return table.concat(decoded)
end

return base93
--This entire module is garbage (and doesn't work)

I’m not sure how I would incorporate the decimalToBase and baseToDecimal into the thingy since it can sometimes produce numbers way outside the ASCII character code range, but then again it’s in a different base so if you interpret it as if it’s in base10 of course it’s going to be massive. But if you convert it back to base10 then there’s just no point in converting it in the first place.

Prototrode · May 19, 2023, 3:00am

Converting strings to base64 is when you get the ASCII value of each character (string.byte, returns 0 to 127) and then converting that number to base64 with the base conversion I talked about. Two base64 characters will be used to represent one string character. It’s very straightforward.

nosgnal · May 19, 2023, 3:27am

In binary (base 2), information is represented using 2 symbols: 0 and 1. In base N representation, you use N symbols. A sequence of symbols is a string.

An example symbol set used for base 64:

Image taken from Base 64 Wikipedia

One bit has 2 unique configurations: 0 or 1.
Two bits have 4 unique configurations: 00, 01, 10, or 11.
.
.
.
N bits have 2^N unique configurations, because each bit can take on one of two symbols (0 or 1).

So for base N (N symbols), you need ceil(log_2 N) bits per symbol.
To convert string S in base A to base B:

Convert S to a number (for ASCII see tonumber)
Calculate how many bits you need per symbol in base B
Extract those number of bits
Convert those bits to your symbol in base B
Repeat 3-4 until you have converted all bits

Here’s a visualization of base 64 conversion from the Wikipedia page. With respect to the previous image,
Sextets = index
Character = char
Bits = binary

The algorithm is straightforward and generalizes to other bases. It’s intended to best explain the ideas, so you will have to figure out your own implementation. I hope this helps.

system · June 2, 2023, 3:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.