Introducing Luau buffer type [Beta]

arbitiu · February 24, 2024, 11:00pm

is this related to why MemoryStoreHashMaps are currently unavailable?

ZarsBranchkin · February 25, 2024, 7:30pm

Someone recently pointed out to me this new Luau feature and I couldn’t resist but to use the buffer type in my Luau msgpack library.

The speed gains are impressive. Previously I did manage to decode msgpack data quicker than JSONDecode could decode equivalent dataset, which mostly comes down to msgpack being simpler to decode as all objects have fixed size, but I couldn’t provide faster encode implementation. Now with buffer type I can encode msgpack data on average 4.5x faster than the native JSONEncode function.

What’s even more shocking is that the encode function has very consistent execution time statistics which is most likely because I can get away with single dynamic allocation while JSONEncode probably produces more temporary dynamic allocations.

Here is the project for anyone interested: GitHub - cipharius/msgpack-luau: A pure MessagePack binary serialization format implementation in Luau

Another interesting route to explore with this new datatype is pure Luau implementation of structs.

By itself that wouldn’t be anything too interesting, but once you’re dealing with many structs that are laid out contiguously in a long binary buffer, we can actually start talking about effects of cache locality even in Luau! For certain applications this would give dramatic performance improvements, perhaps 10x-20x faster.

This idea came up in context of Entity-Component-System architecture and I think binary buffer backed component storage could bring the ECS performance benefits to Luau instead of just helping with data modeling.

index_nil · February 27, 2024, 10:52am

Are there any plans to use a more efficient buffer storage method in datastores? For example, base93.

WheretIB · February 27, 2024, 12:29pm

Sorry, but I don’t know what you are referring to.

We have no plans to change the current representation.

GamEditoPro · March 11, 2024, 8:16pm

Hm, does that means that if I need to store more than 3MB, I need to use old “string” method? Or strings are also has their cuts which lower limit so much?

WheretIB · March 11, 2024, 8:18pm

Yes, if you need to store up to 4MB, you have to continue using strings.

Tomarty · March 28, 2024, 11:01pm

I use string.pack extensively and I starting experimenting with this recently. Overall it’s a great addition. I appreciate that static methods are used instead of a metatable; It’s the right solution for low level functionality in a gradually typed language.

These features would be helpful:

Range parameters for buffer.writestring. Sometimes I need to write part of a string to the buffer, and looping over each byte is clunky.
Vector3 read/write (3 f32s.) The name could be buffer.readvec3, buffer.readvector3, or buffer.readVector3. This would be useful for simulations, and go nicely with native Vector3 optimizations.

I’m on the fence about whether i24/u24 should be included. While it doesn’t seem “correct”, it would still be a practical feature. Use cases for byte arrays tend to be low level, and bit32.readu24(buf, p) seems like it would perform better than bit32.bor(bit32.lshift(buffer.readu16(buf, p + 1), 8), (buffer.readu8(buf, p))). It could be a slippery slope that leads to supporting i48/u48 as well.

I don’t think we need bit-level control. The buffer data type already gets us 90% of the way there. In my opinion it’s better to work with byte alignment, and pack bits together into shared bytes when it makes sense.

Buffer attributes would be useful, simply for fewer allocations when reading/writing data. It would be more ideal to be able to read the memory without needing to make a copy though (even if it’s read only.) Unless access like this is definitely impossible, it might be worth waiting.

itsDenDenn · April 16, 2024, 7:58pm

Does the buffer get compressed if it’s being sent in a dictionary to the client through a RemoteEvent?

local dictionary = {
	["OtherData"] = "QWERTYUIOP"
	["buffer"] = buf
}

bigtheangry · April 16, 2024, 8:09pm

u should just write all the data as a buffer, it’ll still get compressed, but you’ll lose bytes by not writing the entire thing as a buffer

WheretIB · April 17, 2024, 10:35am

Yes, buffer is compressed even it’s an element of another type.

iiau · May 19, 2024, 3:55am

I really hope that Roblox adds bit-level manipulation for buffers.

I’ve been trying to write my own Huffman coding algorithm on Roblox and this is where buffers fall short as this algorithm requires you to write into a buffer a sequence of bits corresponding to its character based on its frequency then dumping it as a string.

LucidBits · May 20, 2024, 6:13am

I’m starting to really love buffers. After some playing around, I discovered how to create, write, and read buffers. I also discovered how to convert or mirror table data to a buffer (serializing and deserializing). I figured they both served a purpose - tables and buffers. Using a combination of both seems to be an approach I’m taking. Such as items and inventory management.

It seems more practical to use tables/OOP directly in the script for manipulating the object/item. Updating the buffer version of it when any changes occur, but it’s probably better to update the buffer when you need to send it across the network. Going off the note that buffers take less space when sent across the network, that’s an advantage I’m looking forward to using.

I also learned that it’s far better to understand the actual space your game’s data will take instead of basically “freestyling” your game together. You gain better insight on how impactful a specific buffer or data has on memory (in terms of space) and it helps you understand and use the proper signed and unsigned integers. Even if you were unsure about the max size certain data will hold, using a standard reasoning would help you limit it. Something like an item name shouldn’t require too many bytes. At the most about 50 bytes but that seems even a bit too much.

But I enjoy the buffers for the fact when I had a large data structure using tables and nested tables inside the main table, I noticed a distinct delay/lag when sending the table across the network. This would only get worse once your inventory started to build up. Now I can create an inventory that can store 256 items with each item having properties to make the item’s byte size sum up to 580+ bytes.

That’s 256 items x 580 bytes = 148, 480 bytes which translates to a mere 0.1416 megabytes!

That’s a significant size reduction in memory to store that many items with each item having a variety of properties.

Keeping up with the position of where you want to edit the memory is crucial in making buffers flow easier. You can create functions that automatically track or update the offset. I’m still in the stage of automating how I interact with buffers so that It’s not a tedious process.

local offset = (index - 1) * self.buffer_size

Managing offsets seems to be the most important thing when you want to build around using buffers. I like to use an index to create my offset that is based on the total buffer size. I then keep track of each previous offset size then I use a table that stores the buffer size for each property and use the index to increment and add to the offset. This keeps the code clean and easy to manage.

self.offset_count += 1
prev_offset = prev_offset + self.buffer_size_table_i[self.offset_count]
return prev_offset

If you have lots of properties or a complex system, you will be adding a lot of times just to get to the correct position in memory. Make a function to automate that!

LucidBits · May 25, 2024, 1:21am

I tried submitting my response from my phone but ran into issues. Now that I’m home in front of my PC I can post it without a hitch ( hopefully )

Accessing certain parts of a buffer to read/write can be made easier if you take a planned approach. You wish to be able to write part a string to a buffer so that can be easily resolved by defining where the string is located in the buffer memory by using tables.

Key Concepts:

Predefine Data Sizes: Set fixed sizes for each data type.
Calculate Offsets: Determine memory offsets based on these sizes.
Centralize Buffer Operations: Use a dedicated module to manage these operations.
Abstract Access: Create functions for easy data access.

Simplified Framework:

Define Data Sizes: Start by defining the sizes for each type of data in your buffer. For example, if you have an item with a name and a description, you might define their sizes like this:

--Array for ipairs and other usage
local sizes = {
    ["item"] = {
        {name = "name", size = 20},       -- 20 bytes for name
        {name = "description", size = 200} -- 200 bytes for description
    }
}

--Dictionary based for lookups
local buffer_size_tbl = {
["item"] = {
    name = 20,	 -- 20 bytes for name
    description = 200, 	-- 200 bytes for description
}
}

Calculate Offsets: Calculate the offsets for each piece of data based on their sizes. This allows you to know exactly where each piece of data is located in the buffer and keep track.

local offsets = {}

function calculateOffsets(structure)

    local current_offset = 0

    for name , segment in ipairs(structure) do
        offsets[name] = current_offset
        current_offset = current_offset + segment.size
    end
end

calculateOffsets(sizes["item"])

Create Read/Write Functions: Develop functions in the buffer module that use the calculated offsets to read and write data. These functions should take the buffer, the name of the data, and the type of operation as arguments.

function readstring(buff, name, offset_type)
    return buffer.readstring(buff, offsets[name], buffer_size_tbl[offset_type][name])
end

function writestring(buff, name, data, offset_type)
--we fill the portion of the buffer because if we don't and write data to it that is smaller (e.g. "Toy" vs "SomeLongName" you would get "ToyeLongName" because you didn't fill the buffer
    buffer.fill(buff, offsets[name], 0, buffer_size_tbl[offset_type][name])  -- Clear the buffer space first
    buffer.writestring(buff, offsets[name], data)
end

Use the Functions: With these functions in place, reading and writing to the buffer becomes straightforward. Simply call the appropriate function with the name of the data you want to access.

local buffer = some_buffer_object

writestring(buffer, "description", "This new item description", "item")

print(readstring(buffer, "description", "item")) --would print the new description

This example only shows a basic overview of a deeper system to give you an idea. I use arrays and dictionaries for different purposes. But I hope this leads you in the right direction to expand upon.

LucidBits · May 25, 2024, 7:37pm

Is it better to use buffer.tostring and send it over the network or just send the buffer itself?

WheretIB · May 27, 2024, 10:05am

It’s best to send the buffer directly.

Acetul · June 7, 2024, 10:40pm

Please add buffer.writeu4(), buffer.readu4(), buffer.writei4(), buffer.readi4() for nibbles

Acetul · June 10, 2024, 5:38pm

Actually now that I think about it, why can we not have the option to read and write any bitwidth we want? Take rtsk’s bitbuffer module for example, you can do BitBuffer:WriteUInt(bits, value) and it will write that value within those bits at wherever the cursor is.

WheretIB · June 10, 2024, 6:20pm

There might be a bit read/write function in the future for convenience, proposal here rfcs/docs/function-buffer-bits.md at 5194dcc12604237acf15d1647e8650d90808c3f7 · luau-lang/rfcs · GitHub

We don’t have a priority for it right now, but it’s on our radar.

One thing to note is that developers that use bit packing are often aiming for maximum performance of their code as well and manual bit packing using bit32 library followed by a full 8/16/32 write will likely be faster than even a native buffer.writebits function because writebits has to do bounds checking each time.
It will still be more convenient though.

goofycoolelijah3314 · June 23, 2024, 11:28pm

wowza! Lot’s of optimizations incoming!

mhmdsndb3 · June 24, 2024, 3:55am

its currently 7 am i havent slept for a while now
can someone simplify this thing because my brain isnt working properly right now