Stravant's BitBuffer Module - Compact Storage Of Data [Tutorial]

Note: This is long, there are TL;DR notes put in here if you hate reading.

The BitBuffer module (Original post can be found here) is a module that was created by Stravant that allows for the reading and manipulation of binary based data storage, which is an essential module if you have a lot of data you need to store compactly.


The Problem With JSON and Other String Based Formats
For those who don’t know, strings are formed by sets of 8-bit ascii, each set of 8 bits (or 1 byte). In DataStores, you can have 64 Kilobytes max per key, so you can have 65,792 characters (1 Kilobytes = 1,024 Bytes) in a key. If you are saving a little bit of data, then JSON will work 100% fine, but if you are saving a whole farm, a series of ships, a tycoon, or hundreds of armor, then you will run out space without needing another key. The way you can save bits is by storing in numbers with a custom set of bits, but if you store “100” in a string, you will be saving as 3 characters, which is 24 bits, while saving as the minimum amount of bits is 7 bits. Basically, you can save storage space by storing numbers as bits rather than Ascii characters.

TL;DR: With JSON, numbers are stored as string, not numbers. This causes a lot of wasted data.


What Is The BitBuffer Module
The BitBuffer module is a simplified library that you can tell it to write X bits or read X bits and assemble it into whatever data you specify, such as a string, a positive integer, etc. For example, if I wanted to write a 9 bit number, I would have to tell it to read 9 bits to get the same number back. You save space already for not having any identifiers for when values separate, but you know how many bits to look for. After writing, you can get a string back in the Base64 format, which can be saved in a DataStore, and put back into the BitBuffer module and read from.

TL;DR: Library to write bits, read bits, and give strings to save and load as, all without you having to make a library to encode numbers and such.


Functions
BitBufferModule.Create()
To prevent the need of managing all of the read and writes that would lead to multiple scripts reading and writing files, which would cause them to be messed up, you have to create a buffer which will contain the needed functions and keep all of the data to only said buffer. It returns the following functions:

Buffer:WriteUnsigned(NumberOfBits,Number)
This writes a series of 1s and 0s for the length of the NumberOfBits, and store the given number in those number of bits. The number that is accepted has to be an integer and has to be positive. The numbers that can be stored with the given number of bits it 0 to 2^NumberOfBits - 1. To get the number back, you will need to use Buffer:ReadUnsigned(NumberOfBits).

Buffer:WriteSigned(NumberOfBits,Number)
This is like the unsigned numbers, but in this case, it allows for negative integers with the first bit being set for the positive or negative sign (so 1 bit is used for sign, and the rest for the number as if it has no sign). With this, the number you can store is -((NumberOfBits - 1)^2) to (NumberOfBits - 1)^2 - 1. Not 100% pretty but the reason that the negative number doesn’t have to subtract 1 at the end is because there is a built in system to prevent a -0 case (ex: if the signed bit was only written and the number was changed, you would have 0 and -0). Like unsigned bits, the reverse of it or reading it is Buffer:ReadSigned(NumberOfBits).

Buffer:WriteFloat(NumberOfMantissaBits,NumberOfExponentialBits,Number)
Floats are pretty much just any type of number, including decimals. This function allows you to store floats with a custom amount of bits for the decimal and exponent numbers. Floats are best described as like scientific notation (ex: 6.022 x 10^23). In the case, the first part would be stored in the Mantissa bits, and the number of decimal places to move would be stored in the amount of Exponential Bits stored. This should only be used over the Float32 or Float64 functions if you want a more precise number (but more bits stored) or a more compact number for less bits. To reverse this, you would use Buffer:ReadFloat(NumberOfMantissaBits,NumberOfExponentialBits)

Buffer:WriteFloat32(Number) and Buffer:WriteFloat64(Number)
These are for the more commonly used floats, 32 bit (floats) and 64 bit (doubles). If you don’t want to put in and remember the NumberOfMantissaBits and NumberOfExponentialBits values, you can just store as the built in 32 bit and 64 bit numbers. You don’t need to input a number of bits to read or write because it is automatically 32 or 64 bits with each respective function. To reverse them, you use Buffer:ReadFloat32() and Buffer:ReadFloat64(). If you need to save space and use lower numbers, use 32 bit floats. If you need more precise numbers such as those in Lua scripts, use 64 bit floats.

Buffer:WriteBool(Boolean)
Very simple function, it just stores a boolean as 1 bit, which is either a true or a false. The reverse of this is simply Buffer:ReadBool().

Buffer:WriteString(String)
This function is simple to use function that writes strings as 7-bit characters, then writes a 7-bit character as a null character at the end to have the buffer to stop reading the string. So, it automatically reads 7 bits per character, and 7 more as a null character, as long as your characters fit in 7 characters. There is a future-proof set of logic to store as 8 bit if 8 bits are needed. To reverse this, just use Buffer:ReadString().

Buffer:WriteRotation(CFrame)
This function is used to store the rotation matrix of a CFrame, compactly, stores the rotation matrix in 64 bits, rather than storing 3 32-bit numbers for the axis angles. To reverse this to get the rotation matrix back, use Buffer:ReadRotation(). This does not store the X, Y, and Z parts of a CFrame, so those have to be stored differently.

Buffer:WriteBrickColor(BrickColor)
This function stores a brick color as a 6 bit unsigned integer which contains the pallete id for the given brick color. Do not use this function for future-proof code, the pallete changes over time in size and in what each color is, results will vary over time. To undo this function, use Buffer:ReadBrickColor().

Buffer:ToString()
This function returns a string that can be printed out by what is currently stored in the buffer, and can be used to save the data, and loaded into a new buffer using Buffer:FromString(String), but don’t use this in DataStores because they don’t store and return the returned strings since the strings can’t be printed out properly.

Buffer:ToBase64()
This function is like the ToString() method above, but returns a string as Base64 (typable characters via a keyboard), which can be printed out in full and saved in a DataStore. To load from this string, you would use Buffer:FromBase64(String).

Buffer:ResetPtr()
This function is only used for reading. It resets the pointer to the start of the “file” to read from.

Buffer:Reset()
This function resets all of the stored data in the buffer to an empty buffer to write or read from again.
TL;DR: You can read and write data given the functions above. Unsigned = Positive Integer, Signed = Integer, Float = Decimal, Float32 = 32 bit float, Float64 = 64 bit float, Rotation = CFrame rotation matrix


Using The BitBuffer Module
The way you use it for saving is you write your data then get the Base64 string, and to read, you load from the Base64, then read everything in the exact same order with the correct number of bits if needed, which is probably the hardest thing to figure out in cases. An example code can be found in the original post linked above, but here is a small example:

local Wins,Money,Title,WasAlive
local SavedString
local BitBuffer = require(game.Workspace.BitBuffer) --Requires bit buffer
function Save(Wins,Money,Title,WasAlive)
	local Buffer = BitBuffer.Create() --Creates a buffer to use
	Buffer:WriteUnsigned(32,Wins) --Writes a 32 bit integer that has no sign, so max is over 4 billion
	Buffer:WriteSigned(32,Money) --Writes a 32 bit signed integer, 1 bit for sign (can be in debt), 31 bits for number, so over 2 billion
	Buffer:WriteString(Title) --Writes a string as given, nothing else to do
	Buffer:WriteBool(WasAlive) --Writes a 1 bit for the bool
	SavedString = Buffer:ToBase64() --Saves the string as a Base64
end

function Load()
	local Buffer = BitBuffer.Create() --Creates a buffer to use
	Wins = Buffer:ReadUnsigned(32) --Reads an integer with no sign that was written as 32 bits
	Money = Buffer:ReadSigned(32) --Reads an integer with a sign that was written as 32 bits
	Title = Buffer:ReadString() --Reads 7 bit characters until a null character is read, no bit width to give
	WasAlive = Buffer:ReadBool() --Reads 1 bit to represent as a true or false
end

Save(1000,-20,"> Title Goes Here <",true)
Load()

TL;DR: Read binary data in the same order as you write them


Minimum Bits To Store A Number
This section is more for those good at math. If you want to find the max unsigned number you can store a number, you can use this equation:
MaxNumber = 2^NumberOfBits - 1

Now, to invert this to get the number of bits, you need to get the NumberOfBits alone. To start, add 1 to both sides.
Number + 1 = 2^NumberOfBits

Ok, so this is where it gets complicated. It is a late Algebra topic known as logarithms. This tutorial isn’t about them, so I suggest finding a separate tutorial, but this is how it will end.
log2(Number + 1) = NumberOfBits

Unfortunately, in Lua 5.1, we don’t have a “log of base” equation, so the final function turns into this in lua, and you also need to round up.
NumberOfBits = math.ceil(math.log10(Number + 1)/math.log10(2))

TL;DR: NumberOfBits = math.ceil(math.log10(Number + 1)/math.log10(2))



How Does This Save Space For Numbers?
This goes back to the topic of store 1s and 0s for a number vs a set of 8 for each base-10 character. Here is a picture to help clarify it:
In this case, it is comparing the bit usage for an ascii string containing the number 218 and storing as a binary number.





Conclusion
This module can be extremely useful for storing a lot of numbers, but if you are just storing strings, then you will need to optimize how data is stored. Also, if you want to store in a harder to read format, you can just write a file and store it, but don’t store anything that one shouldn’t be able to de-code (look into key encryption for that). This module becomes easier as you practice.

TL;DR: Use it if you are storing a lot of numbers. Strings aren’t that much smaller without optimizing (below).


Optimizing And Tips
Over a lot of time of developing and testing with it, I have figured out a few ways you can optimize encoding, and other safety measures for future-proofing code.

Future-proofing
If you create a game fully centered on this, there is a good chance that updates will come and the format will need to change, so my suggestion is to store a version number at the beginning of all formats. I suggest 6-8 bits in case you need to change it a lot. Roblox Battle Remastered has had to change the format 3 times with live versions existing, so the version is read first and then loaded via the respected decoder.

TL;DR: Store a version at the beginning.

Optimizing Bit Widths
For safety, it is common to just store every integer as 32 bits. However, if all your numbers are small, then there is a lot of wasted space. To fix this, you can calculate what is the minimum amount of bits to store the given numbers, store that number at the beginning (6 bits is enough, it allows to store 63 as the max number), and then read/write the numbers with that bit width.

TL;DR: Store a number at the beginning to read/write the numbers with the bit width. Number should be smallest amount of bits to store given numbers.

Optimizing string encoding
Bits are stored in 7 bits per character to allow for the first 128 characters, but some cases are a lot more predictable. In the case of Roblox Battle Remastered, indexes can store “[“, “]”, “,”, “0”, “1”, “2”, “3”, and “4”. For this, I store a table for each used character, and write an unsigned number for each key that has a matching value. However, this can only be used with a more limited form of characters, since you need 26 characters for lowercase characters, 26 characters for uppercase characters, 10 for numbers, and 36 or so for each other typable character on the standard qwerty keyboard. Personally, I store a number at the beginning for how many characters to read, but can take a good amount of bits to store if it is long enough.

TL;DR: Store a table for each character you store, save the key. Must be stored in 6 bits or less to be more compact than strings.

Ok, hope this is good and useful. This took 4 hours to type and a 6 page google docs document. :stuck_out_tongue:
Here is a link to an image that you can publicly share that is the entire post, in case you want: http://imgur.com/EfQCiNJ

52 Likes

This is extremely compact, but it is also extremely slow.

3 Likes

For my use cases, speed hasn’t exactly been an issue.
This is pretty much a common point with all forms of compressing data - Trade disk space for computing time.

5 Likes

I started working on an efficient serializer a while back, but I stopped working on it because data store doesn’t accept certain bytes/certain orders of bytes, and I ended up doing something completely different.
Here’s the code I had though:

local string_char = string.char

local function foo()
	
	local strings = {}
	local stringCount = 0
	
	local bytes = {}
	local byteCount = 0
	
	local bits = 0
	local bitLevel = 256
	
	local function writeBit(bit)	
		bitLevel = bitLevel * 0.5
		if bit then
			bits = bits + bitLevel
		end
		if bitLevel <= 1 then
			bitLevel = 256
			byteCount = byteCount + 1
			bytes[byteCount] = bits
			bits = 0
		end
	end
	
	local function resetBytes()
		if byteCount == 0 then
			return
		end
		stringCount = stringCount + 1
		strings[stringCount] = string_char(unpack(bytes))
		bytes = {}
		byteCount = 0
	end
	local function writeByte(byte)
		byteCount = byteCount + 1
		bytes[byteCount] = byte
	end
	
	local function writeString(str)
		resetBytes()
		stringCount = stringCount + 1
		strings[stringCount] = str
	end
	
	-- do stuff
	
	resetBytes()
	return table.concat(strings)
	
end
2 Likes

What you probably want to do is encode it to Base64, since all of characters in Base64 are store-able in DataStores. That is how the BitBuffer works.

I think something like characters 1-127 can be used

Sounds about right. Stravant told me about 8 months ago that 7-bit ascii (so yes, 1-127, 0 being a null character) characters should work.

This is great. I’ve been using BitBuffer for quite a while, and I have some recommendations/notes for anyone trying it for the first time.

  • Loading to, and reading from BitBuffer is expensive. Use BitBuffers with caution.
  • You can trade some security for speed (significant boosts in optimization)
  • Specifically, optimizing the Base64 read/write is a significant boon for this optimization

All in all, BitBuffer’s are a good way to go about saving large amounts of data, in my opinion. If you want to learn more, studying ROBLOX’s binary file encoding is a great way to go about doing it.

This is what kills this for me. Large files hang servers, and files have a tendency to get corrupted if the server runs out of memory mid-save.

I just wanna save a 500 KB build someone made as a .rbxm like DataPersistence, but without DP’s ludicrously low cap (60 KB LOL) and with DataStore’s reliability over DP.

Here’s the rub, a BitBuffer is just an easier way to write to a binary file. In the end, a binary file is what is getting written to DP or DataStore in terms of pure computer architecture.

We’ve just abstracted it to the point of ridiculousness.

4 Likes