Bitpacking And Why Strings Suck

CoderHusk · December 19, 2021, 9:42pm

Capture2

A lot of you out there may have tried to make an art game where you’d basically bypass the moderation system by using GUIs and the UIGradient trick. However, you stopped as you couldn’t figure out a way to store this data. You would need to convert this image to a string so that you could store it on a datastore. Even if you had an array literal representing each color value and used a 1d array to minimize space complexity that literal for a 100x100 image would span well over the 200kb limit. Even if you are to partition the string into multiple chunks you will run into another issue, assuming you just stored the number and not the color3 you will figure out the horrible truth that Color3s do not have cache locality. It means that if you are spamming the creation of them, your GC will not be able to clear it immediately like a number or a vector3. So however shall we solve this?

It should be also important to note that strings are “interned” in Lua. Basically, it means that if you create a string it will search through the heap of the VM for a string with the same buffer (binary data). For example…

local a = "example"
local b = "ex".."ample"

a and b refer to the same buffer in memory, which is good as it helps reduce memory but the problem is that operation to check if it belongs somewhere in the heap and more specifically throwing it away and saying it refers to this buffer is expensive. If you are writing a tool for the naive implementation of image storage where you are writing out an array literal of the data you will not want to write them as color3s as some color3s may be the same. So one solution could be to preload a “palette” of color3s. However, you can’t just do something like saving their hash key (dictionary) to be like

local address = string.format("%0.2f", pixelVector.X).."a"..string.format("%0.2f", pixelVector.Y).."a"..string.format("%0.2f", pixelVector.Z)

as you run back to the original problem, since strings are interned and you create a string about half a million times a second (assuming you wanna spam load it like if you were writing a ray tracer or something). The solution is to not use strings. But how are we supposed to encode these numbers from 0 to 1 to an unsigned integer? RGB 24bit encoding literally states that each channel is 8 bits, or in other words has a range of 0-255. We could simply map that value from 0 to 1 to 0 to 255 by multiplying the double by 255 and rounding it. Moreover, with how the encoding schema works are, it follows this specification.
Capture
What bit packing is, is although sure, a Lua number does take 64bits, if the number is from 0 to 255 with no decimals, because of how the bit32 library works it will encode it as an integer. The bit32 library is how at least in Roblox you use bitwise operators. If you have Roblox-ts doing bitwise operands you’d see in other languages like python will convert it to the bit32 functions.

The bitwise operators in-general are

& : bitwise AND
| : bitwise OR
~ : bitwise XOR
>> : right shift
<< : left shift
~ : unary bitwise NOT

let’s just start off with what the bitwise or does. As you know with if statements an or works by returning true if either one of the given booleans are true. Consider 2 buffers
Capture

Think of a bit as a boolean, 1 means true, and 0 means false. You can think of a buffer as an array of booleans. It goes through each corresponding bit and does the bitwise operation as if it was an if statement and the outcome of that if statement is the value. Remember, 1 = true, 0 = false. You can extend this to bitwise XOR and bitwise AND. the bitwise not will simply flip all of the buffers bits, so 1001 becomes 0110. You could take advantage of that to figure out the math.huge value if you will for integers, it is just shorthand for ~0. Bit shifting refers to well, shifting your bits. If I shifted the buffer a with binary contents

0000 0000 0000 0000 0000 0000 0000 1011

by a displacement of 2 to the left it would transform the buffer to be

0000 0000 0000 0000 0000 0000 0010 1100

if I were to overflow the bits they would be “discarded” like if I had

1010 0000 0000 0000 0000 0000 0000 0000

and shifted it once to the left it would be

0100 0000 0000 0000 0000 0000 0000 0000

the same goes for if I were to underflow it by shifting them to the right. Now, how does this all come together? Well we have 3 numbers from 0 to 255 which in their binary form might be

r = 1010 0101
g = 0000 0110
b=  0110 0000

and we want to pack them into this format
Capture
we first need to make space for g and b and then shift r to its corresponding location. same for the rest.

local rDisplacement = r << 16
local gDisplacement = g << 8
local bDisplacement = b << 0 --simplfies to just b

we then want to combine them using the or bitwise operator. writing over the 0s essentially, simplifying down to

local bitPackedData = (r<<16) | (g<<8) | b

In this case, the compressed buffer is 10815072 in base 10. So that would be the address of that given color3. Now to sample or in other words extract the bit data from this packed data is relatively simple. You have to apply what is called a “bitmask” to it. Say I have

a=1011 0110

if i want to get 1011 i’d have to create a bitmask that spans that size, 1111 or 0x0F in hex. Then I would bitwise and it with the buffer.

a = a & 0x0F

However, because of how numbers work in the Arabic system the little-endian, or in other words the first number has to be at the end of the buffer. So we would have to find its displacement and shift it accordingly. Since it is 4 bits away we would shift it to the right by 4 and then apply the bitmask

a = (a>>4) & 0x0F

now our format specifies a span of 8 bits, so the bitmask would be 0xFF and we would have to shift it 8 * itemDisplacement times. Giving us

local extractedR = (bitPackedData >> 16) & 0xFF
local extractedG = (bitPackedData >> 8) & 0xFF
local extractedB = bitPackedData & 0xFF --simplification of  (bitPackedData >> 0) & 0xFF

Now, this is great as itself, however, we could squeeze even more out by treating the image as a collection of heightmaps, one for r, another for g, another for b. and since heightmaps are really just collections of numbers it makes it super easy to bit pack it. And once it is bit packed, just encode the array literal using base64 and upload each part to Roblox datastore.

Also, if you dont use typescript the corresponding functions can be found here.

doytnr · January 17, 2022, 6:57pm

Can it be optimized to bitmask strings? I suppose it would work if the strings are converted to binary.

CoderHusk · January 17, 2022, 7:01pm

So what you really want to use this for is to get around dictionaries when you are calling it several hundreds of thousands of times. suppose it was in the format of

this would be n00b as you have to check if the string exists everytime you index it. So you could get those floats and convert them to a range of integers from 0 to 255, manipulate and bitpack them knowing that they are unsigned integers.

doytnr · January 17, 2022, 7:07pm

Not saving colour3s as strings, but saving strings in bitpacked form.

such as 'Hello World' into a sequence of 8 bits.

And would storing numbers that go beyond 255 (such as parts of user’s data, like currency) in bitpacked form?

CoderHusk · January 17, 2022, 7:23pm

if you just want to shorten a string its better to just use base 64 at that point and dont bitpack at all. You can store numbers that go beyond 255/8bits but you’ll just need to accomadate your storage to the 32 bit limit

doytnr · January 17, 2022, 7:36pm

Thanks! I will sure have to look into base64

Also now I get what the Bitpacking actually does, it converts 3 bytes of data between 0 and 255 into a 32bit number, between 0 and 2^32.

AkaiHellerium · January 17, 2022, 8:09pm

Just for fun i tried this using robloxs bit32 operation, when I try to put the displaced numbers together into “bitPackedData” and print it, it returns 496365280 instead of 10815072. Any idea why? im assuming it has something to do with the bitwise or.

CoderHusk · January 17, 2022, 8:20pm

I’d have no clue unless I knew the source code, i should note that if you are doing what i showed in like actual lua source code for the numbers it wont work as they are encoded as base10 literals. their base 10 forms are

local r = 165
local g = 6
local b = 96

AkaiHellerium · January 17, 2022, 8:35pm

From what it seems its not but, Is it possible to bitpack text strings/ text?

CoderHusk · January 17, 2022, 9:14pm

I mean I guess you could if you treated them as ascii and used their decimal placement but at that point its better to just encode it into base64 as its a stream of data

doytnr · January 18, 2022, 5:30pm

Since 32 Bits can fit 4 bytes, it is possible to store 4 numbers from 0 to 255, puting the total number storage to 1020.

Incase anyone was wondering, this will make it possible to store decimals:
If we had, for example, byte A as 19 and byte B as 5:

0001 0010 0000 0101

If the purpose was B*(A/10), then that would mean that the output is 9.5, making it possible to (sort of) store decimals.

doytnr · January 18, 2022, 8:17pm

I also made a bit-pack function in my SimpleBit module.

(My module takes in four parameters, but if you are saving RGB, leave the forth parameter at 0)