Remote Packet Size Counter - accurately measure the amount of bytes for remotes!

PysephDEV · May 5, 2023, 5:02pm

Update

With the release of UnreliableRemoteEvents, that have a payload limit of 900 bytes, this module has gained a pretty good usecase for ensuring max sizes! I’ll keep updating this module for the foreseeable future in the case of new datatypes & possible inconsistencies in calculating datatype sizes. If you find any bugs - please report them!

Introduction

Originally, this code was a part of the Packet Profiler plugin, but has now been separated into a separate resource to allow for usage in analytic environments. Additionally, this will make it easier to add requested or new unsupported types.

Functionality

The packet size counter calculates the size of remote packet data in bytes, given the packet data. It can calculate the byte size of a whole packet, but also individual data types. Examples:

-- Use the GetDataByteSize function to get the size of a single argument from a remote
PacketSizeCounter.GetDataByteSize(Value)

-- Use the GetPacketSize function to get the size of all arguments from a remote
PacketSizeCounter.GetPacketSize({
    RunContext = "Client", -- Client or Server, remotes send different data based on context
    RemoteType = "RemoteEvent", -- RemoteFunctions have an additional size offset
    PacketData = {...} -- Array of remote packet data, supports most types
})

The full API can be found here:

Calculation

Roblox doesn’t offer any way to measure incoming packets, and so the only way to find out how large they are is by manually calculating the size based on input data.
There’s some incorrectness towards how some data types such as strings are calculated, but it becomes hard to fully replicate undocumented behavior.

The byte size for every supported data type has been measured as such:

Send data through remotes and log the outcome with Roblox’s Network panel data
Perform the measurement three times, and then average the result
Do this measurement on three data amounts (10, 100, 1000), then perform linear regression to figure out the scale coefficient.

image743×820 180 KB

There exist edge-cases and differing behaviors of data types such as CFrames and strings, and as such the Rojo binary model was additionally used as reference. Through this, it can be found that CFrames have 24 special cases where axis-aligned CFrames are encoded as only 13 bytes instead of the usual 21. These edge-cases can be found here:
https://dom.rojo.space/binary.html#cframe

Installation

The resource’s source code can either be copied from the GitHub repository, or installed through Wally:

Contribution

You may look at how to contribute to the resource here:
https://github.com/Pyseph/RemotePacketSizeCounter/blob/main/CONTRIBUTING.md

blue_king789 · May 5, 2023, 5:32pm

I believe the way your calculating datasize is incorrect based on some research I did, I also at one point needed to measure data for a project and I refered to a different set of rules that believe are more accurate, correct me if I am wrong, reference here.

An ASCII character in 8-bit ASCII encoding is 8 bits (1 byte), though it can fit in 7 bits.
An ISO-8895-1 character in ISO-8859-1 encoding is 8 bits (1 byte).
A Unicode character in UTF-8 encoding is between 8 bits (1 byte) and 32 bits (4 bytes).
A Unicode character in UTF-16 encoding is between 16 (2 bytes) and 32 bits (4 bytes), though most of the common characters take 16 bits. This is the encoding used by Windows internally.
A Unicode character in UTF-32 encoding is always 32 bits (4 bytes).
An ASCII character in UTF-8 is 8 bits (1 byte), and in UTF-16 - 16 bits.
The additional (non-ASCII) characters in ISO-8895-1 (0xA0-0xFF) would take 16 bits in UTF-8 and UTF-16.

Now this is open for discussion, not saying I am right, not saying you are wrong, looks like you put in some work on this. I noticed you didnt reference any sources so I am curious how you came up with your calculations, based on simple computer knowledge, a Vector3 value’s datasize would be based on each character, not the Vector value itself, if that makes sense.

PysephDEV · May 5, 2023, 5:40pm

I’m not exactly sure how these data references apply here onto Roblox, because Luau’s string length operator (#) provides the raw amount of bytes in a string, not characters. This is different from counting the amount of characters, as each character may have more than 1 byte as you wrote.
Roblox has support for the utf8 library which allows to count the amount of characters rather than bytes, which can show this difference:

local TestString = "😎" -- This is written as "\240\159\152\142" in numerical representation

print(#TestString) -- 4
print(utf8.len(TestString)) -- 1

As such, using the Lua length operator on strings will provide the correct amount of bytes which the string is packed as.

As elaborated in the post, I used a combination of analytical benchmarking as shown in the Calculation section of the post, along with consulting the Rojo binary model format, which is the documentation for Roblox’s official .rbxm and .rbxl binary files.

Regarding your last section of the post about Vector3s, I am unsure what you mean.

blue_king789 · May 5, 2023, 5:45pm

If we are discussing datasize, An emoji for example, written as, "\240\159\152\142" would be: 16 bytes, I understand what you are saying, but I just don’t understand how you could get anything else when talking about RAW data… Yes the emoji is one character, but when you see the emoji, you dont see the formatting behind the scenes. I guess in my experience the way I was measuring data, per character, was pretty accurate and worked fine for me. I am curious to learn more about what you are talking about…

Edit: I beleive we also may be discussing two different things, the way network handle packets and packetize their data, is very different due to adding protocols and additional formatting, but as far as data, I dont see how it could be any different from what I explained.

PysephDEV · May 5, 2023, 5:50pm

To clarify, it would be 4 bytes because Lua converts "\x" into the associated character from the linked number. So, "\240\159\152\142" in reality is:

string.char(240) .. string.char(159) .. string.char(152) .. string.char(142)

Which is 4 bytes.

The string length operator returns the ‘raw’ byte size of the string, which is what matetrs when calculating how many bytes a string will take up when being sent through remotes.

This is correct when e.g. printing out the emoji in studio or displaying it on a TextLabel, but the ‘raw size’ of the emoji is 4 bytes which describe the emoji, rather than 1.

I’m not exactly sure what you mean - yes, Roblox may or may not have some internal compression applied to strings like it does with axis-aligned CFrames, but overall the calculation used is correct. This can be verified by sending strings of varying sizes through remotes and measuring how much the Receive rate in the Network tab increases, and then comparing this increase with this resource’s theoretical increase.

blue_king789 · May 5, 2023, 5:56pm

Interesting to note, You are correct about the emoji byte count, just looked that up, Well very interesting but why do you have to do this:

Data doesn’t change, you can read data and translate it to the correct byte size, I dont understand why you need to use these three increments and then get the scaled coefficient, assuming your sending the same thing for example, Vector3.new(100, 1, 100), it should not change, it should be the same size of data every single time, correct or? In short I am wondering why you are doing this?

PysephDEV · May 5, 2023, 5:58pm

To clarify, I am sending the exact same value 10, 100, and 1000 times through a remote (eg calling FireServer different amount of times). This is to make sure there that if there is a discrepancy in one of the measurements, the margin of error would hopefully be eliminated with the help of the two other measurements.

blue_king789 · May 5, 2023, 6:02pm

Wait wait wait… your calling FireServer() a 1000 times… To my knowledge, that is very bad for performance, in my opinion this still seems very unnecessary when you can literally calculate the data size yourself without even firing an event to verify, or was this just for your testing purposes?

PysephDEV · May 5, 2023, 6:03pm

It was performed only to measure the data sizes. It is of course not recommended in any production environment.

PysephDEV · June 9, 2023, 9:10pm

2.1.0

added support for nil datatype (byte size of 0)

PysephDEV · July 19, 2023, 3:12pm

2.2.0

Added two necessary fields in CounterData for GetPacketSize function; RunContext and RemoteType.
RunContext must be either "Client" or "Server", and changes the remote byte size overhead based on this due to the additional user info.
RemoteType must be either "RemoteEvent" or "RemoteFunction", and also changes byte size overhead due to RemoteFunctions needing more bytes for packet order processing.
CounterData.IgnoreRemoteOffset has been removed; to get the byte size of raw data, use GetDataByteSize instead.
Renamed RemoteOverhead field to BaseRemoteOverhead.
Added fields for RemoteFunctionOverhead and ClientToServerOverhead.
Update Wally package version to v2.2.0.

PysephDEV · July 20, 2023, 9:55am

v2.3.0

Change calculation method for string byte size to support VLQ length format (fixes #7)
Change calculation method for tables to reflect array & mixed-table behavior in remotes, and add support for VLQ length format (fixes #8)
Append type overhead byte in PacketSizeCounter.GetDataByteSize to better reflect a data type’s true size over remotes

PysephDEV · July 20, 2023, 4:55pm

v2.3.1

Update non-axis-aligned CFrames to use 6 bytes of rotational data instead of 8 (fixes #9)

PysephDEV · November 30, 2023, 2:54pm

Update

With the release of UnreliableRemoteEvents, that have a payload limit of 900 bytes, this module has gained a pretty good usecase for ensuring max sizes! I’ll keep updating this module for the foreseeable future in the case of new datatypes & possible inconsistencies in calculating datatype sizes, if you find any bugs - please report them!

PysephDEV · December 2, 2023, 3:56pm

v2.4.0

Add rudimentary support for the buffer datatype. Note: buffers compress data after a size threshold, and as such may not be accurate past a certain size!

PysephDEV · December 2, 2023, 4:04pm

v2.4.1

Mention the name of the module if the module warns when passed with an unsupported datatype. Makes for easier debugging and less confusion, as there’s no stack trace to point towards it otherwise.

PysephDEV · January 16, 2024, 11:35am

Small Update

Fixed a broken link. I changed my GitHub username from PysephWasntAvailable to just Pyseph (it became available), which broke links using my old name. Let me know if there’s any other links broken!