Buffers, save more with less

metatablecatmaid · May 3, 2025, 12:19am

Buffers are a low-level way of representing data using a raw chunk of memory. You interface with it using the buffer library.

Understanding buffers

Buffers represent a raw chunk of memory, for this reason, it would make sense to understand how the methods operate on that chunk.

Think of each buffer as an array of numbers between 0 and 255. Each method in buffer takes an offset (0-based) that tells where to start reading or writing the data from, then a length property (most of the time this is internally defined based on the function) of how much data to copy.

buffer.readu8(b, 3)

For this, we just get 72, which is the ascii code for H.

The write methods work pretty much the same way, except you’re telling it where to write the data, not store it. Here’s the single (f32) encoding for pi (i dont have enough space to encode it as a double)

buffer.writef32(b, 1, math.pi)

Buffers do length checks to prevent buffer overflows, so make sure you allocate the correct amount of data before working on them (resizing is very expensive).

Figuring out how large data is ahead of time

Because buffers are much lower level than a string, you need to know how much data to use ahead of time. The advantage with this is that the size of numbers is known ahead of time, and strings are trivial to get the size of. It gets a bit harder with tables, but again, if you know the shape of the struct ahead of time, it’s trivial to get the total size of the buffer.

Here’s the code to get a buffer length from HWIWriter.luau

local function allocateWriterBuffer(
	pkgName,
	entryName,
	s1: number, s2: number,
	contentSizes: {number},
	sumOffset: number?
)
	-- computes the size to allocate, then we can just write data to it
	local pkl, enl = string.len(pkgName), string.len(entryName)

	local sum = (sumOffset or 0)
		+ BasicTypes.ULEB128Size(pkl) + pkl
		+ BasicTypes.ULEB128Size(enl) + enl
		+ BasicTypes.ULEB128Size(s1)
		+ BasicTypes.ULEB128Size(s2)
	for _, num in contentSizes do
		sum += BasicTypes.ULEB128Size(num) + num
	end

	return buffer.create(sum)
end

Obviously here, we know some data sizes ahead of time, but we still need to figure out the sizes of the rest of the data

ULEB128 is a variable length integer, and out of the scope of this guide, we’re creating a length-prefixed string here which is explained in the guide, although we use a different type for it

Storing numbers

In buffers there are five different ways to store a number in a buffer, they all have different bytewidths, and its up to you to decide how large a number needs to be. Additionally, most methods store an integer, NOT a float. For floats, you need to use f32 and f64.

The following table shows the range of each number (excluding f32 and f64 because their range is a little fuzzy)

Number Type	Width	Upper Limit
`u8`	`1 byte`	255
`u16`	`2 bytes`	65,535
`u32`	`4 bytes`	4.294.967,295
`f32`	`4 bytes`
`f64`	`8 bytes`

`u` and `i`

Buffers use the Rust naming convention for integers, this means u refers to an unsigned int and i refers to a signed integer.

A signed integer allows for negative numbers, this is much outside the range of this guide, but the best way to calculate the lower bound is to add 1, half the limit and negate it. The higher bound is the positive limit - 1 (to allow for 0).

This makes u8 become -128 > 127

For most usecases, u32 is more than enough, but if you can guarantee that a number does not go out of it’s bound, you can save data by decreasing the byte width.

f32 and f64 are floats. These are allow you to store decimals, since other methods store as ints.
They are always signed.

Strings

Strings are a little more complex because they dont have a defined length, in most low level languages, string is actually a char array. Because of this, you need to figure out how to tell a reader when the string ends.

There are two agreed upon ways of doing this

Length-Prefixed (better)

Length prefixed means you put a length just before the string, which tells the reader how many bytes to copy after it as a string.

If we store a String as the struct

u32 size
char[] data

We can first read the size, then copy everything in data to a string.

NULL Termination (C style)

In C and C++, strings are terminated by a NUL byte, \00. This still works, but doesn’t allow you to store NUL bytes as a string if you absolutely need to. This is a completely valid way to store strings if you can guarantee ahead of time, that NUL bytes will never appear in the data.

To read these, you simply read characters until you hit a NUL then stop (dont include the NUL byte in the string).

Writing them is much easier, copy the data then allocate one extra byte for a \00.

buffer.writestring(b, 0, "Hello World!")
buffer.writeu8(b, string.len("Hello World"), 0)

Bitflags

Bitflags allow you to store multiple boolean values in one byte. The idea is each bit acts as 0 (false) or 1 (true) value.

You store them by shifting bits into their correct bit position using bit32, and read them by extracting bit values and comparing them. Lua provides a function bit32.btest which makes this trivial. It returns true if all bits in the given range are 1.

Enums

Sometimes, you know every state that a string is going to be. Its much more optimal in this case to store a u8 (or u16 if you’re Enum.Material in the RBXM spec, for some reason) that points back to that string value. In Lua, you can do this with a mixed table that stores both the numeric reference to the string value, and the string value to the number

{
  -- string bindings
  None = 0,
  Single = 1,
  Double = 2,

  [0] = "None",
  [1] = "Single",
  [2] = "Double"
}

Usage with Roblox APIs.

DataStores accept buffers, on their own. When you save a buffer, Roblox compresses it and then stores it in a special JSON format before sending it to their servers. You can get this format yourself by using JSONEncode and JSONDecode, which will also take a buffer.

Yarik_superpro · May 3, 2025, 8:12pm

Very good tutorial
However, I do think that part explaining offsets needs to be explained further (in my opinion)

metatablecatmaid · May 4, 2025, 2:16am

Buffers start at an offset of 0, my personal way of thinking about them is the number of bytes away from the start of the memory region.

So 2, means 2 away from the start, which is the third byte in the buffer. If you write a u32 (4 bytes) at offset 3, it fills the byte offsets 3-6 starting from the smallest byte of that number to the largest byte (look up endianness, Luau uses little),