Introducing Luau buffer type [Beta]

This likely wouldn’t be faster since it’s faster to pass a larger amount of information through the reflection once rather than passing a small amount of information through many times. Having to call into Lua for each pixel individually would end up being pretty slow.

2 Likes

That’s unfortunate. I wonder how other programs like Photoshop manage to do this stuff. Or real shaders? They’re so quick somehow, even when it’s just changing each pixel and potentially the pixels around it as well.

Do you think that EditableImages will ever be fast enough to do this type of thing? There’s a limit to how much I can optimize my own scripts!

Luau buffer type and library are now available for use in live experiences.
Buffer replication over the network is also available.

Support for buffer type in DataStores, MessagingService and TeleportData is coming early next year.
Replication of the buffer type will receive updates to reduce data size even further.

Based on the feedback we received, additional library functions and support for buffer type in SetAttribute are being discussed internally and we will bring you updates on future developments here and in Luau Recaps.

20 Likes

Luau code is bound to run on the CPU since its specialized for sequential logic. Photoshop and shaders make use of the GPU either through carefully designed bulk API calls or shader languages which are compiled to run on the GPU. Since GPUs are extremely optimized for parallel workloads (like doing the same set of operations per pixel) it will perform extremely well at this task.

There isn’t really a fix for this, we can’t run Luau on GPUs because they’re not optimal for heavy sequential logic and we can’t get access to shader languages for compatibility reasons. Maybe one of those fancy post-processing graphs like some other 3D engines have could work but I don’t know if Roblox’s engineering team has made any comments about it before.

Yeah, I suspected that using the GPU would fix this. But, that would be so hard to implement well. If they did, though, I’d imagine that it would be like this:

task.desyncronize() --> Switch to Parallel using CPU cores
task.syncronize() --> Switch to Series using CPU cores
task.desyncronize(true) --> Switch to Parallel using GPU cores

Using the GPU would probably be more unsafe to read/write to instances and stuff, so perhaps only writing to local variables would be allowed in that state.

It’s not that simple; the GPU is not made for running code the same way as the CPU. Additionally, it’s a separate device with separate memory. One is for sequential, conditional, and logical processing, and the other is for parallel, pure number computation.

It would require at the very least Luau to be compiled to GPU code, which is probably impossible due to the fact it’s not at all designed for this. On top of that, you’d need to either copy the entirety of Luau related memory to the GPU every time you switch over contexts or queue up changes to later be sent back. Both of these are probably very slow in the context of a GPU. Lastly, you’d somehow need to synchronize the Lua state across 2 separate devices of 2 different architectures, which is also not going to happen.

4 Likes

Will there be a BufferValue Instance that would act like NumberValue or StringValue?

3 Likes

This is the first time someone mentioned that kind of class (even internally) so I’m not sure a lot of people are interested in that.

3 Likes

I uh, hmm. Is there a bug here? Is this accidentally encoding to utf16 or something? In my test of sending 30 byte buffers, the actual transmitted size was ~60 bytes. When I just transmitted the tostring() version, it was the expected size?

It’s probably covered by:

But I can always use an example to check things out. You can just list the 30 bytes as numbers if they are not secret.

Waaait what - is there a layer of compression running on this?

Do you perhaps have a more accurate release date for this? I am planning to implement this into my game and would plan to have it out with this feature by end of January / mid February

BINARY DATASTORES!

The next step is being able to serialize Instances into and from an RBXM buffer, so we can save Instances in game to datastores, that would be amazing, or if thats too low-level, just the ability to directly save them (and have it silently convert under the hood).

Would bit-level access be possible at some point in the future?

1 Like

Do you have a use case for a value when the attribute option is being discussed? I’ve moved away from values in favor of attributes because require less code to set values (4 for object creation, or more with checking for existing instances vs SetAttribute) and getting values (1-3 lines with nil check vs GetAttribute).

1 Like

I thought i could store voxel data in it but i forgot that attributes existed. Althought it would be cool if this worked differently to attributes. Maybe it could store compressed data the same way that it compresses data before sending it to client/server.

For a quaternion you probably want to normalize it, then you can get the maximum amount of precision by encoding the components as integers. Using a float16 would waste a lot of potential precision.

Thanks for this info @tnavarts, this wasn’t something I had considered


For anyone else wanting to do the same:

P.S. Please do note that I’m not completely certain this is correct, but based on what Stravant suggested I’m somewhat certain the method would look something like the below

local FP_EPSILON = 1e-6
local I16_PRECISION = 32767                 -- int16 range { -32,786, 32,767 }
local BUFF_CFRAME_SIZE = (3*4) + (1 + 3*2)  -- i.e. 3x f32, 1x u8 and 3x i16

local function getNormalisedQuaternion(cframe)
  local axis, angle = cframe:ToAxisAngle()
  axis = axis.Magnitude > FP_EPSILON and axis.Unit or Vector3.xAxis

  local ha = angle / 2
  local sha = math.sin(ha)

  local x = sha*axis.X
  local y = sha*axis.Y
  local z = sha*axis.Z
  local w = math.cos(ha)

  local length = math.sqrt(x*x + y*y + z*z + w*w)
  if length < FP_EPSILON then
    return 0, 0, 0, 1
  end

  return x / length,
         y / length,
         z / length,
         w / length
end

local function compressQuaternion(cframe)
  local qx, qy, qz, qw = getNormalisedQuaternion(cframe)

  local index = -1
  local value = -math.huge

  local sign
  for i = 1, 4, 1 do
    local val = select(i, qx, qy, qz, qw)
    local abs = math.abs(val)
    if abs > value then
      index = i
      value = abs
      sign = val
    end
  end
  sign = sign >= 0 and 1 or -1

  local v0, v1, v2
  if index == 1 then
    v0 = math.floor(qy * sign * I16_PRECISION + 0.5)
    v1 = math.floor(qz * sign * I16_PRECISION + 0.5)
    v2 = math.floor(qw * sign * I16_PRECISION + 0.5)
  elseif index == 2 then
    v0 = math.floor(qx * sign * I16_PRECISION + 0.5)
    v1 = math.floor(qz * sign * I16_PRECISION + 0.5)
    v2 = math.floor(qw * sign * I16_PRECISION + 0.5)
  elseif index == 3 then
    v0 = math.floor(qx * sign * I16_PRECISION + 0.5)
    v1 = math.floor(qy * sign * I16_PRECISION + 0.5)
    v2 = math.floor(qw * sign * I16_PRECISION + 0.5)
  elseif index == 4 then
    v0 = math.floor(qx * sign * I16_PRECISION + 0.5)
    v1 = math.floor(qy * sign * I16_PRECISION + 0.5)
    v2 = math.floor(qz * sign * I16_PRECISION + 0.5)
  end

  return index, v0, v1, v2
end

local function decompressQuaternion(index, v0, v1, v2)
  v0 /= I16_PRECISION
  v1 /= I16_PRECISION
  v2 /= I16_PRECISION

  local d = math.sqrt(1 - (v0*v0 + v1*v1 + v2*v2))
  if index == 1 then
    return d, v0, v1, v2
  elseif index == 2 then
    return v0, d, v1, v2
  elseif index == 3 then
    return v0, v1, d, v2
  end

  return v0, v1, v2, d
end

local function write(buf, offset, input)
  buffer.writef32(buf, offset + 0, input.X)
  buffer.writef32(buf, offset + 4, input.Y)
  buffer.writef32(buf, offset + 8, input.Z)

  local qi, q0, q1, q2 = compressQuaternion(input)
  buffer.writeu8(buf, offset + 12, qi)
  buffer.writei16(buf, offset + 13, q0)
  buffer.writei16(buf, offset + 15, q1)
  buffer.writei16(buf, offset + 17, q2)

  return BUFF_CFRAME_SIZE
end

local function read(buf, offset)
  local x = buffer.readf32(buf, offset + 0)
  local y = buffer.readf32(buf, offset + 4)
  local z = buffer.readf32(buf, offset + 8)

  local qi = buffer.readu8(buf, offset + 12)
  local q0 = buffer.readi16(buf, offset + 13)
  local q1 = buffer.readi16(buf, offset + 15)
  local q2 = buffer.readi16(buf, offset + 17)

  local qx, qy, qz, qw = decompressQuaternion(qi, q0, q1, q2)
  return CFrame.new(x, y, z, qx, qy, qz, qw),
         BUFF_CFRAME_SIZE
end

return {
  readCFrame = read,
  writeCFrame = write,
}

4 Likes

Yippeee! Data compression les go!

Have they been tested and debugged enough for them to be safe and reliable to use for replicating data?
What’s the current compression ratio for big buffers?

Sorry if this has been asked before but how efficient is this “on the wire” compression of buffers and how does it work?

If I write a 32-bit number to a buffer that looks like 00000000 00000000 00001010 00000111 does it like recognize the zeroes and compress them?

1 Like

I could think of a few ways to do this with buffers actually.

I would probably manually serialize instances into buffers instead since it gives more control over what you want to store.
Storing every single property of an instance would likely be… perhaps a bit impractical.

I don’t think you want to store what the name of an instance was (1 byte per character) or all of it’s physical properties.
That is, if you want to keep the data small and as efficiently compressed as possible.

RBXM already compresses quite well and minus its implementation quirk hell, its a decent format for what it needs to do

Ports of this aren’t practical in Lua because there’s a lot of stuff in the file that it cant access without Roblox throwing security errors, or there being no feasible API to utilise (meshparts)

1 Like