Summary
Hello! This is a summary of some of the bandwidth optimizations we made in Astro Force, our Roblox Accelerator project (link here)! Astro Force is a real-time strategy (RTS) game @loravocado and I are working on, and the goal is to have a system efficient enough to handle hundreds of independent units. While a lot of these bandwidth optimizations are very specific to our game, I thought it’d still be cool to share what we did! This is also my first time doing bandwidth optimizations, so feel free to share your thoughts and suggestions
Here’s a quick video of 400 units moving around: Roblox RTS - 400 Unit Stress Test - YouTube
First, let’s start off with some stats using 100 moving units to benchmark!
The bandwidth was measured by using the menu that pops up when you press Shift+F3. As a general rule of thumb, bandwidth usage below 40-50 KB/s is exceptional. For comparison, a game of Phantom Forces or Arsenal typically uses around 50-80 KB/s. As you can see from the image above, we brought our bandwidth usage down about 60 times since V1. Here’s how we did it!
Version 1
Version 1 involved representing units using a part on the server and having the server CFrame them to a goal position. The client then rendered the character models on top of this part. Essentially, we let Roblox take care of all the replication for us. This was simple, but took an enormous amount of bandwidth as seen in the previous image. In general, CFraming parts is extremely costly in terms of bandwidth.
Furthermore, the core loop of the game ran each heartbeat cycle, which meant parts were being CFramed about 60 times a second. This contributed to the insane bandwidth usage we saw earlier.
Version 2
We completely scrapped V1 and rebuilt the game from the ground up. We completely threw out using parts on the server, instead opting to store positions of units on the server inside of a script and manually sending positioning data to each client. We also created a fully custom collision system specifically adapted for our game, which resulted in the massive reduction in CPU usage going from V1 to V2. With this new system, we had a much higher degree of control over everything, including replication.
Reducing rate of replication
One of the biggest ways V2 saved bandwidth was by reducing the number of times replication happened. Instead of CFraming (and sending the corresponding data) 60 times a second, we made replication only happen 10 times a second, and smoothed out the movement on the client using linear interpolation.
Sending less data
A huge part of the bandwidth savings also came from not sending data we can determine on the client. For example,
- The Y-coordinate of each unit can be determined on the client. Since the terrain in our game conforms to a grid and there aren’t any places where two heights are possible, we were able to make a heightmap where the Y-coordinate of the terrain can be queried based on the X and Z coordinate of the unit. Raycasting would have also worked; however, our heightmap is about 10x faster than raycasting.
- We only need one coordinate for the orientation of the unit rather than all three since units in our game only rotate about the Y axis. So we don’t even need to send the X and Z coordinates of orientation!
So, instead of sending 6 numbers to update the position of a unit (XYZ coordinates and XYZ orientation), we only need to send three numbers: the XZ coordinates and the Y orientation. This allows us to send 2x less data, and hence reduces bandwidth usage by 50%!
Only sending needed position updates
In order to avoid sending unnecessary data, the server only sends data about units that are currently moving. The replication loop for positioning data looks something like this:
local function SendPositioningData()
-- This function is called about 10 times a second.
local Packet = {}
for _, Unit in pairs(Units) do
if Unit.PositionChanged then
Packet[#Packet + 1] = Unit:GetPositionData()
Unit.PositionChanged = false
end
end
if #Packet > 0 then
PositionChangedEvent:FireAllClients(Packet)
end
end
where Unit:GetPositionData()
returns a table that looks something like this:
function Unit:GetPositionData()
return {
self.Hash,
self.Orientation,
self.X,
self.Z
}
end
Essentially, if the unit’s position changed this replication loop, we send the unit’s positioning data, consisting of the unit’s hash (a unique identifier for the unit so that the client knows which unit the server is trying to move), XZ coordinates, and orientation – a total of 4 numbers.
Using Vector2int16s and Vector3int16s
The last thing we did to optimize bandwidth in V2 was using some obscure Roblox types: Vector2int16
and Vector3int16
(credit to this post for this idea!). This is definitely a micro-optimization, but it brought another 70% bandwidth reduction. The 100 unit moving test consumes 35 KB/s without this optimization, and 10 KB/s with.
By default, when we send numbers using RemoteEvents, they’re sent as 64-bit floating point numbers. So, for each position update, the total number of bits for the positioning data is around 256 bits plus a bit of overhead for the table (for the four numbers from Hash, X, Z, and Orientation). How do we reduce the number of bits we send?
Our answer lies with the use of Vector2int16
s and Vector3int16
s. A Vector2int16
is capable of storing two int16
s (which are 16-bit signed integers). A Vector3int16
is the same thing except it stores three int16
s. The range of a single int16
is [-32,768, 32768)
, and can only store integers.
So, we can probably store the hash of each unit inside a Vector2int16
since the hash is always an integer. We also don’t expect there to be more than a couple thousand unique units in the game at once, so the hash should always fit in the range of an int16
. But what about the X, Z, and orientation, which are more than likely decimal values?
In Astro Force, all computations such as distance checks, collisions, etc. that require accurate numbers are done on the server. The client doesn’t technically need to be super accurate with the position or orientation. So we’re okay if the unit’s position on the client is accurate within 0.2 studs and the orientation is accurate within 0.01 radians.
What we can do is multiply each of the numbers we want to send on the server by some multiplier, get the floor of the number (hence making it an integer), send it to the client, and then simply have the client divide the number it receives by that same multiplier! To make this process easy, we wrote an encoder and decoder function just for this purpose. It looks something like this:
local COORD_MULTIPLIER = 5 -- Numbers have accuracy within 0.2
local ORIENTATION_MULTIPLIER = 100 -- Numbers have accuracy within 0.01
function Encoder.EncodePositioningData(Hash, Orientation, X, Z)
-- Orientation has range [0, 2pi)
Orientation = math.floor(ORIENTATION_MULTIPLIER * Orientation + 0.5)
X = math.floor(COORD_MULTIPLIER * X + 0.5)
Z = math.floor(COORD_MULTIPLIER * Z + 0.5)
return {
Vector2int16.new(Hash, Orientation),
Vector2int16.new(X, Z)
}
end
function Encoder.DecodePositioningData(PositioningData)
local Block1 = PositioningData[1]
local Block2 = PositioningData[2]
local Hash = Block1.X
local Orientation = Block1.Y / ORIENTATION_MULTIPLIER
local X = Block2.X / COORD_MULTIPLIER
local Z = Block2.Y / COORD_MULTIPLIER
return Hash, Orientation, X, Z
end
As you can see, instead of sending four 64-bit numbers, we now send four 16-bit numbers. In theory, this is a reduction from 256 bits to 64 bits – a 75% theoretical reduction! In-game, we saw this as a 70% bandwidth reduction – we assume that it’s not a perfect 75% reduction due to a bit of overhead from the Vector2int16
s we used.
Of course, doing this has some limitations and downsides:
- This is not a good way to achieve accurate numbers. In our case, we can get away with it since we don’t need accuracy.
- Tthe range of valid positions is now limited to
[-32,768 / COORD_MULTIPLIER, 32768 / COORD_MULTIPLER)
, which in our case is equivalent to about[-6553, 6553)
. This works for Astro Force since most maps never exceed around 3000x3000 studs, but may not work for many other games. - Overall readability of code goes a bit down.
- If Roblox removes
Vector2int16
s andVector3int16
s, then we’ll be sad.
Despite the downsides, at the end of the day, a 70% bandwidth reduction outweighed the downsides and we decided to use this hacky method to our advantage.
Version 3
We were already quite happy with the bandwidth usage of V2: 100 units moving around only consumed 10 KB/s. But we wanted to take it to the next level with another 50% reduction in bandwidth.
Warning: this section of optimizations requires a bit of knowledge about how integers are represented.
Thanks to the bit32 library now being on Roblox, we used a technique called bit packing in order to save data by manipulating bits.
In V2, each positioning update for a unit looks something like this on the bit level:
But we can do much better. We came to a few logical conclusions to save bits:
- We don’t expect to have more than 2000 unique units in a single game. We could just use 11 bits (allowing 2^11 = 2048 unique units) and save 5 bits.
- Our orientation doesn’t need so much precision – we’re happy if the orientation is within a few degrees of it’s actual value on the server. Let’s dedicate 7 bits to the orientation, giving us accuracy within 2.8 degrees (2^7 = 128 unique values; 360/128 gives us intervals of 2.8 degrees). This saves 9 bits.
However, we run into a problem with the X and Z coordinates: we have little room to shave off bits without losing even more precision or severely limiting map size. But what if we can somehow reduce the size of the position coordinates? Instead of sending the global position of each unit for a position update, what if we just send the displacement from some given point – say, the displacement from the corner of a grid cell a unit is currently inside? This could result in sending much smaller numbers!
To do this, we first place a grid over the map where each cell spans 8x8 studs. We then enumerate each of the grid cells with a unique ID. This process is done on both the server and client.
Now, the range of a position coordinate’s displacement from the corner of its grid cell is between [0, 8).
If we multiply the displacement of the coordinate by 8 and take the floor, we’re guaranteed that the value is 63 or less (if the displacement is greater than 8, then the unit would be in the next grid cell). This is perfect – the range of a 6-bit number is [0, 63]
! For example, if the unit’s X displacement from the corner of the grid cell is 6.51, we multiply this value by 8 to get 52.08. We then take the floor and get 52 – which can be represented by just 6 bits.
We can transmit this number (52) over to the client, have the client divide 52 by 8, and we get 6.5 (which is indeed very close to the originally intentioned 6.51)! Using this method, we achieve a precision of 0.125 studs, which is actually more precise than V2!
Now we can bring everything together. If we dedicate 11 bits to the hash, 7 bits to the orientation, and 6 bits to the X and Z displacement from the corner of the grid cell the unit is currently in, we have the following:
As we can see, we now use 32 bits for a position update instead of 64 bits. Instead of using two Vector2int16
s, we can use a single Vector2int16
, resulting in half the bandwidth usage!
The last thing we need to deal with is how to detect which grid cell the unit is currently in. We essentially wrote some code to detect when a unit changes grid cells. When a unit does change grid cells, we simply append the ID of the grid cell the unit is going in to. Since changing grid cells happens relatively infrequently, we rarely have to do this – a vast majority of position updates do not involve grid changes. When there is a grid cell change, we send a Vector3int16
instead of the usual Vector2int16
for a position update:
As visible, we dedicate 18 bits to the grid ID if there is a grid change. This means we can have maps with up to 2^18 = 262,144 grid cells. Assuming our map is square, this means our map can be up to 4096x4096 studs large – plenty large enough for all our maps in Astro Force. If we ever have larger maps, the grid cell size can easily be tuned to be 10x10 studs or even 16x16 studs at the cost of precision.
Bit packing example
To end off this thread, here’s an example of how you could pack two 8 bit integers into a larger 16 bit integer!
Let’s say we have two 8-bit unsigned integers, x = 32 and y = 145. In our 16 bit result, we can dedicate the first 8 bits to x and the last 8 bits to y. Here’s a visualization of what that looks like:
So we can now set up some initial constants! We can use the 0b
prefix on a number to tell Luau that we’re inputting a binary number.
local X_BITS = 0b0000_0000_1111_1111
local Y_BITS = 0b1111_1111_0000_0000
First, let’s deal with x. Since x occupies the first 8 bits, we don’t need to shift the bits for x. All we need to do is perform bit32.band
to ensure x does not overflow into bits dedicated for y! This is what this looks like:
local x = 32
local y = 145
local result = 0
result += bit32.band(x, X_BITS)
print(result) -- Prints 32!
Now we need to deal with y. It’s a little bit more work now since we need to “shift” the bits of y over 8 bits. We use bit32.lshift
for this! Essentially, this will shift the bits of y over 8 places so that the bits of y occupy the upper 8 bits of the result. After we shift the bits of y, we also make sure to perform bit32.band
on y’s shifted bits to ensure we only affect the bits dedicated to y.
result += bit32.band(bit32.lshift(y, 8), Y_BITS)
print(result) -- Prints 37152!
Now, we can send this to the client inside a Vector2int16
for example. If we had something else to store inside the Vector2int16
, we could also store it in the second int16
slot.
exampleEvent:FireClients(Vector2int16.new(
result,
0 -- We could store something else here! :)
))
Now the fun part, decoding the result! We first retrieve the result from the Vector2int16 as follows:
exampleEvent.OnClientEvent:Connect(function(Data)
local result = Data.X
print(result) -- Prints -28384!
end)
Notice the number is negative. This is because the range of an int16
is [-32,768, 32768), so the original result (which was bigger than 32767) wraps around. However, this doesn’t matter since all we care about are the bits. Essentially, we’ll treat the last bit of the integer (which is typically used as the sign bit) as just a regular bit as if it were an unsigned int16
.
We can now “unpack” the bits. For this, we use the bit32.extract
method! To extract bits, we need three arguments:
- The result,
- The starting position to extract bits, and
- The number of bits to extract from the starting point.
For example, we know that within the result, x starts at bit 0 and takes up 8 bits. So, extracting x looks something like this:
local x = bit32.extract(result, 0, 8)
print(x) -- Prints 32! :D
To finish up, let’s extract y! We know y starts at bit 8 and also takes up 8 bits, so:
local y = bit32.extract(result, 8, 8)
print(y) -- Prints 145! :)))
Hopefully this example gave you a good idea on how to perform bit packing! In my bit arrangement from earlier, you may have noticed that the orientation bits are split over two int16
s. To split the bits over two integers, all you need to do is use bit32.extract
and extract the parts of the bits you want to store in each integer.
Conclusion
Hope you enjoyed reading! We did a lot of hacky stuff, but the bandwidth savings were definitely worth it. Hopefully in the future, Roblox will implement more control over networking – such as being able to specify that you’re sending an int16
instead of always sending numbers as 64-bit floating point numbers!
Definitely let me know if you think of any more optimizations or if you see a mistake!
Also, feel free to check out this article on how we’re implementing fog of war: Fog of War in Astro Force (RTS)