Introducing Luau buffer type [Beta]

Dekkonot · February 6, 2024, 12:57am

DataStore buffer support? As in, we can write binary data to a DataStore now without issues?

kevyn1020 · February 6, 2024, 1:03am

Are there plans for MemoryStoreService to support buffers for its data structures?

WheretIB · February 6, 2024, 1:06am

No, the new feature is that you can have ‘buffer’ objects in the value field of SetAsync (directly or inside a table).
And you will get ‘buffer’ objects back from GetAsync etc.

You still can’t have non-utf8 strings as values.

WheretIB · February 6, 2024, 1:09am

Yes, it is also supported in MemoryStoreService (MemoryStoreHashMap).
I forgot to put it in the list.

Dekkonot · February 6, 2024, 2:00am

I think someone worked out the answer in the Open Source Discord server but I want to ask just for everyone else’s sake: what does the expansion rate for buffers look like in datastores then? I assume they’re encoded in some capacity, so what does that look like?

Sorry if I’m asking a question that will be answered by the documentation, I’m just sure I’m not the only one who’s excited for this and I want to immediately dash everyone’s hopes and dreams with reality.

WheretIB · February 7, 2024, 3:27am

Right now, the expansion rate is approaching 4/3, so for every 3 buffer bytes, 4 bytes of the DataStore value is used.
There are also a few bytes of additional overhead.

To put it simply, buffer value should be kept slightly below 3MB.

Buffer data is still being compressed, so the absolute maximum buffer size that can be stored is 50MB, but only if it can be compressed below 3MB by the engine.
Because compression ratio depends on the data being stored, we recommend keeping the uncompressed buffer size below 3MB to avoid unexpected failures.

And as DataStore documentation mentions, you can use JSONEncode function to check how large the value being stored actually is.

bigtheangry · February 8, 2024, 3:38pm

will we ever be able to write instances to a buffer? or get have each instance have a unique integer id to send with buffers?

WheretIB · February 8, 2024, 6:03pm

No, it will not be possible to write an Instance into a buffer directly.

There also no plans to have unique ids in-game at this time. If we do add them, they will most likely be represented as 36 byte strings and that’s unlikely to be used for efficient networking.

RootEntry · February 19, 2024, 6:50pm

What if you want to store booleans?
Since the smallest datatype we can do is 8-bit and a boolean can be represented in 1 bit wouldn’t that mean if we wanted a buffer to have booleans we’d have to use a 8-bit integer.

WheretIB · February 19, 2024, 6:58pm

You can store 8 booleans together in one 8-bit byte by combining them using bit32 library methods.

Tomarty · February 21, 2024, 1:59am

Could we use buffers for HttpService:RequestAsync’s body?

Can there be an option to receive HttpService:RequestAsync’s response body as a buffer?

Does MessagingService encode buffers to base 64 internally, or is it kept as a sequence of bytes? Internal JSON conversion with a limit of 1KB seems perilous.

Would it be possible to have a special frozen buffer that shares memory across threads/Actors, while keeping the improved access speed? The idea is that you have big chunks of static data that lots of actors need to be able to read efficiently. Use cases include:

Multithreaded animation systems with lots of animation data.
Behavior trees for NPC AI.
Simple machine learning demos.

For GetAttribute/SetAttribute, I’d like to see either:

Support for buffers as a type.
Option for GetAttribute to result in a buffer instead of a string. SetAttribute(name, buffer) can cast to a string automatically.

String attributes are already practically buffers internally. It may be best to keep them as strings, unless there’s some way to make the instance’s attribute actually share memory with the buffer (which would be awesome, but I’m not sure how it would work with respect to ChangeHistoryService and AttributeChanged.)

Maybe GetAttribute returns a frozen buffer that actually shares memory with the attribute (unless the attribute changes, in which case new data would be allocated for the instance.) Imagine someone’s in studio working with >10MB chunks of data in attributes; You wouldn’t want to copy this around more than needed.

Side note: It’s disappointing that instance:GetAttributes() returns a dictionary instead of an array of keys. This is quite slow for instances with lots of data.

These are just ideas. What’s important is performance, simplicity, and the ability to use it with existing Roblox APIs.

WheretIB · February 23, 2024, 7:14pm

That’s not possible right now, but buffer library has functions to convert to string and back.
While that causes an extra allocation and copy, those operations should be pretty fast even on large sizes.
This API request can be posted on the feature request forum.

Yes, base64 is used today, sometimes buffer is compressed beforehand (if it’s compressible). That does limit reliable buffer size to 700 bytes.

Unfortunately no, that would be incompatible with how Luau VM data is organized.

Buffer will also be unable to point to the Instance attribute data without a copy, so this does raise a question of how much buffer attributes will be useful if they can’t cover use-cases that should avoid copies.

Tomarty · February 24, 2024, 6:43pm

I’d really like to see Roblox start using a proprietary JSON format internally. It could be as simple as checking if the data starts with an illegal JSON byte (there are >200 to choose from.) If it does, try decoding using a simple-but-cleverly-optimized binary format.

It would save a fraction of file storage costs, but more importantly take load off of busy servers that would otherwise waste time checking bytes one at a time to escape strings, instead of just storing the length before the data so it can be skipped over.

If I were designing it, a value/object would be preceded by a byte, and byte ranges could be reserved for short string/buffer/array/dictionary lengths, with special cases for compact values:

0: false
1: true
2: null
3: f64
-- There are loads of ways to optimize number cases. It would need to be tuned using real data.


128-156: string (lengths 0 to 28)
157: string (u8) (lengths 28 to 284)
158: string (u16) (lengths 285 to 65820)
159: string (u32) (lengths 65821 to 2^32-1)

160-188: array (lengths 0 to 28)
189: array (u8) (lengths 28 to 284)
190: array (u16) (lengths 285 to 65820)
191: array (u32) (lengths 65821 to 2^32-1)

192-220: dictionary (lengths 0 to 28)
221: dictionary (u8) (lengths 28 to 284)
222: dictionary (u16) (lengths 285 to 65820)
223: dictionary (u32) (lengths 65821 to 2^32-1)

224-252: buffer (lengths 0 to 28)
253: buffer (u8) (lengths 28 to 284)
254: buffer (u16) (lengths 285 to 65820)
255: buffer (u32) (lengths 65821 to 2^32-1)

Some binary JSON alternatives do string deduplication, which can make sense to do for dictionary keys.
If enforcing valid unicode, there are various ways codepoints can be compressed. Otherwise developers could be allowed to send or store arbitrary binary strings.

HttpService:JSONEncode/Decode could still use the "m":null trick to produce human a readable version. There could even be support for Vector3.

arbitiu · February 24, 2024, 11:00pm

is this related to why MemoryStoreHashMaps are currently unavailable?

ZarsBranchkin · February 25, 2024, 7:30pm

Someone recently pointed out to me this new Luau feature and I couldn’t resist but to use the buffer type in my Luau msgpack library.

The speed gains are impressive. Previously I did manage to decode msgpack data quicker than JSONDecode could decode equivalent dataset, which mostly comes down to msgpack being simpler to decode as all objects have fixed size, but I couldn’t provide faster encode implementation. Now with buffer type I can encode msgpack data on average 4.5x faster than the native JSONEncode function.

What’s even more shocking is that the encode function has very consistent execution time statistics which is most likely because I can get away with single dynamic allocation while JSONEncode probably produces more temporary dynamic allocations.

Here is the project for anyone interested: GitHub - cipharius/msgpack-luau: A pure MessagePack binary serialization format implementation in Luau

Another interesting route to explore with this new datatype is pure Luau implementation of structs.

By itself that wouldn’t be anything too interesting, but once you’re dealing with many structs that are laid out contiguously in a long binary buffer, we can actually start talking about effects of cache locality even in Luau! For certain applications this would give dramatic performance improvements, perhaps 10x-20x faster.

This idea came up in context of Entity-Component-System architecture and I think binary buffer backed component storage could bring the ECS performance benefits to Luau instead of just helping with data modeling.

index_nil · February 27, 2024, 10:52am

Are there any plans to use a more efficient buffer storage method in datastores? For example, base93.

WheretIB · February 27, 2024, 12:29pm

Sorry, but I don’t know what you are referring to.

We have no plans to change the current representation.

GamEditoPro · March 11, 2024, 8:16pm

Hm, does that means that if I need to store more than 3MB, I need to use old “string” method? Or strings are also has their cuts which lower limit so much?

WheretIB · March 11, 2024, 8:18pm

Yes, if you need to store up to 4MB, you have to continue using strings.

Tomarty · March 28, 2024, 11:01pm

I use string.pack extensively and I starting experimenting with this recently. Overall it’s a great addition. I appreciate that static methods are used instead of a metatable; It’s the right solution for low level functionality in a gradually typed language.

These features would be helpful:

Range parameters for buffer.writestring. Sometimes I need to write part of a string to the buffer, and looping over each byte is clunky.
Vector3 read/write (3 f32s.) The name could be buffer.readvec3, buffer.readvector3, or buffer.readVector3. This would be useful for simulations, and go nicely with native Vector3 optimizations.

I’m on the fence about whether i24/u24 should be included. While it doesn’t seem “correct”, it would still be a practical feature. Use cases for byte arrays tend to be low level, and bit32.readu24(buf, p) seems like it would perform better than bit32.bor(bit32.lshift(buffer.readu16(buf, p + 1), 8), (buffer.readu8(buf, p))). It could be a slippery slope that leads to supporting i48/u48 as well.

I don’t think we need bit-level control. The buffer data type already gets us 90% of the way there. In my opinion it’s better to work with byte alignment, and pack bits together into shared bytes when it makes sense.

Buffer attributes would be useful, simply for fewer allocations when reading/writing data. It would be more ideal to be able to read the memory without needing to make a copy though (even if it’s read only.) Unless access like this is definitely impossible, it might be worth waiting.