Compressing CFrames for Client over RemoteEvent

RoBoPoJu · June 19, 2024, 7:43pm

What do you mean with “function specially”? Do you mean rotation interpolation because the code I wrote indeed doesn’t have it so you’ll need to add the code for it in that function if you want it.

Nickaladormz · June 19, 2024, 7:45pm

Aside from including orientation,

RoBoPoJu · June 19, 2024, 7:50pm

Why do you need such checks? I thought your waypoints (which I assume you have at the start of the path and at the end of each path segment) contain the vertical position as well so the interpolation should gradually move the enemy upwards along the slope if the end point of a segment is higher than its start point.

Nickaladormz · June 19, 2024, 7:51pm

Nickaladormz · June 19, 2024, 7:51pm

They do not (as you can see in the video)

My goal with this was to make it as dynamic as possible, so if for some reason I in the future or anything have the maps adjust or change abruptly, this will not be an issue.

RoBoPoJu · June 19, 2024, 7:56pm

Well, yes, looks like you’ll need to modify the logic I wrote for calculating the position and rotation.

Nickaladormz · June 19, 2024, 7:58pm

But wouldn’t I only need to do it on the server…? Or would I need to do it on both.

RoBoPoJu · June 19, 2024, 8:00pm

You’d need to do it on both the server and the client if you do the replication in the way that I suggested. I assumed that the normalized position (one number that is repeatedly replicated) and the path waypoints (which I believe are constant and thus don’t need to be replicated repeatedly) were enough information for calculating the CFrame. But apparently they are not.

Nickaladormz · June 19, 2024, 8:03pm

There is no easy way to have the client figure out the Y value in it’s position without having the same logic on the client then?

RoBoPoJu · June 19, 2024, 8:06pm

No, I don’t think so. Is having the same logic (raycasts?) on the client a problem for some reason?

Nickaladormz · June 19, 2024, 8:08pm

Anyway at all?
And no, not exactly. I mean, I’d rather avoid giving the client a lot more work to do, aside from other things that do not matter as much. Usually, i’d be okay with interpolating the enemies to their next waypoint exactly, but then It makes it look less realistic, it becomes a hassle to layout waypoints, etc. I need something that looks somewhat natural.

RoBoPoJu · June 19, 2024, 8:47pm

If the server can’t calculate the CFrame knowing only the waypoints and the normalized position, and the client needs to be able to calculate approximately the same CFrame with the same amount of data as the server, then I don’t think there’s any way to avoid having the same logic on the client.

If doing hundreds of raycasts every frame is a problem, then I suppose you could do some kind of preprocessing step before the game starts (and this could be done on both server and client). For a great number of sample normalized positions (maybe thousands) at constant intervals along the path, you’d do the raycast and/or other logic and store additional data that will reduce the amount of work that needs to be done every frame. I can think of two options.

For all these normalized positions you’d calculate the position (or entire CFrame?) and store these in a table. Since the normalized positions for which this data is stored are at constant intervals, you can later calculate indices in this array from the normalized position of an enemy. When calculating the position of an enemy, you can get the closest earlier position (or CFrame) and the closest later position (or CFrame) from the array and interpolate between those. This obviously does take some memory. This is basically a way to add a huge number of new waypoints so that interpolating between waypoints gives almost exactly the same result as doing the raycast logic would (since these additional waypoints are calculated using the raycast logic).
Perhaps, instead of storing data for a huge number of waypoints, you could somehow use the results calculated for two consecutive sample normalized positions to figure out whether you should add a new waypoint. For example, if the CFrame calculation logic for two consecutive sample normalized positions results in approximately the same y-value, you could probably conclude that you don’t need to add an additional waypoint but if there is a y-change, then you’d add a new waypoint. That way you could automatically add additional waypoints to locations where the y-position changes and avoid adding unnecessary waypoints to areas where the movement is just linear horizontal movement.

The preprocessing step should probably be done in the course of multiple frames, maybe even a few seconds (hundreds of frames), so that it doesn’t cause a lag spike.

But if you can keep the network traffic at a viable level while sending the entire position (and rotation?) then maybe all this is unnecessary and you can just do the replication as you did earlier instead of doing it in the way I suggested.

I also agree with @DominusHeburius’s suggestion. Replicating the positions every frame (tens of times per second) is probably unnecessary. Doing the replication only a few times per second as he suggested will greatly reduce network traffic and if you interpolate between replicated positions as he suggested, the reduced replication rate probably won’t make a noticeable visual difference.

tralalah · June 19, 2024, 10:24pm

Jumping in late but it’s worth considering using euler angles/quaternions + position as a compression step, which uses about half as much data. Combined with the other solutions in this thread, this would provide a significant bandwidth boost.

Nickaladormz · June 19, 2024, 11:03pm

RoBoPoJu:

If the server can’t calculate the CFrame knowing only the waypoints and the normalized position, and the client needs to be able to calculate approximately the same CFrame with the same amount of data as the server, then I don’t think there’s any way to avoid having the same logic on the client.

If doing hundreds of raycasts every frame is a problem, then I suppose you could do some kind of preprocessing step before the game starts (and this could be done on both server and client). For a great number of sample normalized positions (maybe thousands) at constant intervals along the path, you’d do the raycast and/or other logic and store additional data that will reduce the amount of work that needs to be done every frame. I can think of two options.

For all these normalized positions you’d calculate the position (or entire CFrame?) and store these in a table. Since the normalized positions for which this data is stored are at constant intervals, you can later calculate indices in this array from the normalized position of an enemy. When calculating the position of an enemy, you can get the closest earlier position (or CFrame) and the closest later position (or CFrame) from the array and interpolate between those. This obviously does take some memory. This is basically a way to add a huge number of new waypoints so that interpolating between waypoints gives almost exactly the same result as doing the raycast logic would (since these additional waypoints are calculated using the raycast logic).

Perhaps, instead of storing data for a huge number of waypoints, you could somehow use the results calculated for two consecutive sample normalized positions to figure out whether you should add a new waypoint. For example, if the CFrame calculation logic for two consecutive sample normalized positions results in approximately the same y-value, you could probably conclude that you don’t need to add an additional waypoint but if there is a y-change, then you’d add a new waypoint. That way you could automatically add additional waypoints to locations where the y-position changes and avoid adding unnecessary waypoints to areas where the movement is just linear horizontal movement.

The preprocessing step should probably be done in the course of multiple frames, maybe even a few seconds (hundreds of frames), so that it doesn’t cause a lag spike.

But if you can keep the network traffic at a viable level while sending the entire position (and rotation?) then maybe all this is unnecessary and you can just do the replication as you did earlier instead of doing it in the way I suggested.

I also agree with @DominusHeburius’s suggestion. Replicating the positions every frame (tens of times per second) is probably unnecessary. Doing the replication only a few times per second as he suggested will greatly reduce network traffic and if you interpolate between replicated positions as he suggested, the reduced replication rate probably won’t make a noticeable visual difference.

I appreciate your efforts to assist me; you’ve been extremely helpful. Many of your solutions, such as the initial idea of using normalized positions, address numerous common issues which’ll probably be helpful for many others. However, I am still very interested in achieving maximum compression while ensuring the client receives the necessary data to unpack, it was the main reason for the post. I want to keep little, if not all movement logic off of the client other than for positioning the enemies where told. I know I’m setting an absurdly and possibly even potentially unreasonable standard, but I’m open to other ideas still. As a last resort if all else fails, I’m likely to consider what we’ve discussed. If anyone would like to see the means I currently undergo for serializing and compressing CFrames, I’d be glad to share.

Yes, I’m aware, and so I’ve heard. I somewhat already break up the CFrame removing unnecessary precision. But I also believe it is important to note I lack familiarity with these concepts.

DevSynaptix · June 22, 2024, 7:17pm

In most of my use cases there’s no clean way of compressing the positional data aside from Delta Encoding, so the biggest cost saving measure for me tends to be compressing the rotation matrix, e.g. …

Quaternion compression


--[[
  i.e. compress CFrame rotation and write to a buffer with a size of either:
         a) 1x uint8
     or; b) 1x uint8 + 3x int16
]]

-- forward
local fsin = math.sin
local fcos = math.cos
local fabs = math.abs
local fsqrt = math.sqrt
local floorf = math.floor


-- const
local INF = math.huge
local EPSILON = 1e-6
local I16_PRECISION = 32767


-- impl
local function cfToNormalisedQuaternion(cf)
	local axis, angle = cf:Orthonormalize():ToAxisAngle()

	local d = axis.X*axis.X + axis.Y*axis.Y + axis.Z*axis.Z
	if fabs(d) > EPSILON then
		axis *= (1/fsqrt(fabs(d)))
	end

	local h = angle / 2
	local s = fsin(h)
	axis = axis*s

	local x, y, z, w = axis.X, axis.Y, axis.Z, fcos(h)
	d = x*x + y*y + z*z + w*w

	if fabs(d) > EPSILON then
		d = fsqrt(d)

		return x/d, y/d, z/d, w/d
	end

	return 0, 0, 0, 1 -- identity
end

local function compressQuaternion(cf)
	local qx, qy, qz, qw = cfToNormalisedQuaternion(cf)

  local index = -1
  local value = -INF

  local element, v0, v1, v2, val, abs
  for i = 1, 4, 1 do
    val = select(i, qx, qy, qz, qw)
    abs = fabs(val)
    if abs > value then
      index = i
      value = abs
      element = val
    end
  end

  if fabs(1 - value) < EPSILON then
    return index + 4
  end

  local sign = element >= 0 and 1 or -1
  if index == 1 then
    v0 = floorf(qy * sign * I16_PRECISION + 0.5)
    v1 = floorf(qz * sign * I16_PRECISION + 0.5)
    v2 = floorf(qw * sign * I16_PRECISION + 0.5)
  elseif index == 2 then
    v0 = floorf(qx * sign * I16_PRECISION + 0.5)
    v1 = floorf(qz * sign * I16_PRECISION + 0.5)
    v2 = floorf(qw * sign * I16_PRECISION + 0.5)
  elseif index == 3 then
    v0 = floorf(qx * sign * I16_PRECISION + 0.5)
    v1 = floorf(qy * sign * I16_PRECISION + 0.5)
    v2 = floorf(qw * sign * I16_PRECISION + 0.5)
  elseif index == 4 then
    v0 = floorf(qx * sign * I16_PRECISION + 0.5)
    v1 = floorf(qy * sign * I16_PRECISION + 0.5)
    v2 = floorf(qz * sign * I16_PRECISION + 0.5)
  end

  return index, v0, v1, v2
end

local function decompressQuaternion(qi, qa, qb, qc)
  if qi > 4 then
    if qi == 5 then
      return 1, 0, 0, 0
    elseif qi == 6 then
      return 0, 1, 0, 0
    elseif qi == 7 then
      return 0, 0, 1, 0
    elseif qi == 8 then
      return 0, 0, 0, 1
    end
  end

  qa /= I16_PRECISION
  qb /= I16_PRECISION
  qc /= I16_PRECISION

  local d = fsqrt(1 - (qa*qa + qb*qb + qc*qc))
  if qi == 1 then
    return d, qa, qb, qc
  elseif qi == 2 then
    return qa, d, qb, qc
  elseif qi == 3 then
    return qa, qb, d, qc
  end

  return qa, qb, qc, d
end


-- example usage
local function testExample(cf)
  if typeof(cf) ~= 'CFrame' then
    cf = CFrame.lookAlong(Vector3.zero, Random.new():NextUnitVector())
  end

  -- compress & write to buffer
  local qi, qa, qb, qc = compressQuaternion(cf)

  local buf
  if qi > 4 then
    buf = buffer.create(1)
    buffer.writeu8(buf, 0, qi)
  else
    buf = buffer.create(7)
    buffer.writeu8( buf, 0, qi)
    buffer.writei16(buf, 1, qa)
    buffer.writei16(buf, 3, qb)
    buffer.writei16(buf, 5, qc)
  end

  -- read from buffer & decompress
  local qx, qy, qz, qw

  qi = buffer.readu8(buf, 0)
  if qi > 4 then
    qx, qy, qz, qw = decompressQuaternion(qi)
  else
    qa = buffer.readi16(buf, 1)
    qb = buffer.readi16(buf, 3)
    qc = buffer.readi16(buf, 5)

    qx, qy, qz, qw = decompressQuaternion(qi, qa, qb, qc)
  end

  local out = CFrame.new(0, 0, 0, qx, qy, qz, qw)
  local success = (
    out.RightVector:FuzzyEq(cf.RightVector, 5e-2)
    and out.UpVector:FuzzyEq(cf.UpVector, 5e-2)
    and out.LookVector:FuzzyEq(cf.LookVector, 5e-2)
  )

  print('Success?', success)
end

textExample() --> Success? true

Though, in your case, there’s a bunch of cost saving measures you could implement, for example:

1. Throw away everything

Note: Just to be clear, this method will produce the smallest packet and will be the least annoying to work with if you’re adamant about having the server replicate each of the NPC’s current positions instead of syncing the client-server time

First, I would think that the biggest question should be: “Do I really need to send a transform here?”

All of your NPCs follow a linear track, i.e. we can always describe the NPC as being somewhere along Node_a and Node_b

Assuming the client has access to the node graph that describes the track, you could send the NPC’s transform via a single f32 which describes the distance that NPC has traveled along Node_a and Node_b

This means that it would be as simple as interpolating between both nodes to derive both the position and rotation, e.g. Node_a:Lerp(Node_b, alpha), where alpha = the f32 described above

Then, whenever the NPC changes to a new track, you send a single integer - I would imagine u8 would do in your case - to describe the next Node_a and Node_b line the NPC is situated upon

Note: if the track length is out of range for a u8 then you could perform 7 bit encoding to avoid having to deal with it in the future, see here

Packet size would be determined from what’s changed, so we’re looking at something along the lines of:

No changes → No packet to send
Track changed but no distance change → 1x u8 i.e. 1 byte
Distance along the track changed, but no track change → 1x f32 i.e. 4 bytes
Track and distance changes → 1x u8 and 1x f32 i.e. 5 bytes

2. Throw away the excess

Looking at your image examples: your NPCs will always be upright, so we can scrap all rotation components aside from Yaw.

Even better, alongside delta encoding, you can throw away positional precision by using an f16 + measuring the NPC’s position relative to the map’s origin since your turret map is likely to be fairly small

You could additionally encode the yaw by remapping the angle to a range of 0 - 1 and writing it as a i16 but this all depends on how much precision you’re willing to sacrifice

Assuming you delta encode + use half precision + encode the yaw …

No changes → No packet to send
Positional changes but no rotational changes → 3x f16 i.e. 6 bytes
Rotational changes but no positional changes → 1x i16 i.e. 2 bytes
Positional and rotational changes → 3x f16 and 1x i16 i.e. 8 bytes

3. Keep everything

If you don’t want to throw any of the transform away and will just delta encode:

Packet size would then be determined by what’s changed:

No changes → No packet to send
Positional changes but no rotational changes → 3x f32 i.e. 12 bytes
Rotational changes but no positional changes → either (a) best case: 1x u8 i.e. 1 byte or (b) worst case: 1x u8 + 3x i16 i.e. 7 bytes
Positional and rotational changes → either (a) best case: 3x f32 + 1x u8 i.e. 13 bytes; or (b) worst case: 3x f32 + 1x u8 + 3x i16 i.e. 19 bytes