How to compress or send less data through a remote event?

Pretty interesting knowledge.

Now the 10 bits was more of an example, because the largest 10-bit number seems to be 1024 and I don’t think that’s big nor precise enough for bullets or raycast data in a game.

I will most likely be using relative positions or something since local space numbers tends to be smaller than world space numbers.
Not sure what would happen if I gave bullets/raycasts in a game let’s say… 1 decimal of precision.
And some weapons might shoot really far, possibly 3000+ studs.

Now let’s say I really don’t want to use too much unnecessary bandwidth in remote events so I do a little bit of light-weight compression.

Should I try to combine 4 x 16-bit numbers into a single 64-bit number or should I put 2 x 16-bit numbers into a 32-bit number?
I’m a bit unsure of what goes through Roblox’ remote events.
Maybe Roblox internally already compresses some data before it is send but I don’t know, can’t find much about it online.

I do know for a fact that most weapons will likely send a start and end position through a RE + whatever might get hit on the way.

I suppose I could also try to combine 2 Vector3s into a single Vector3, Vector3s use 32-bits for X, Y and Z, right?
But 16-bit numbers also can’t get super large so that would become a problem if the map of a game is any larger than 600 studs or you must be willing to only use 1-decimal of precision but this would also cripple you by limiting map size to only about 5000 studs or so and 1/10th of a stud might become really noticeable.

Well first in this post I would like to say that nothing in roblox is truly 32,16 or 64 bits,
But like I said, You can always find some way to convert a single float to a string with 4 characters, and use those,
I do not think it is good to combine 2 vector3s into 1 vector3

also I do not think data is compressed by remote events, by what I know it used JSON, which is not really meant to compress data, and usually makes data more bigger.

also in my code you can just add more 1’s after the 0b11_1111_1111
and for the 0x3FF just the exact same thing but in hexademical

But I Think NO map should be 2^52 studs because at the end of the map, all the precision is basically gone.
but atmost be something like 2^14 or 8k studs

so probably something like 22 bits, and 16 bits for the height of the map. Because nothing really goes beyond 1024 studs+1/64 or decimal precision.

So maybe, my compression method is still valid just a bit modified.

1 Like

I see, thank you!
I shall look into and explore some of these methods and see which one will eventually be the best choice.

If you have any tips such as for compressing big arrays (shotgun that fires 20 pellets per shot for example or designing systems in a way where less data is required for the server to know what a player fired at, etc) I’d love to hear!

I see that there is a lot of confusion here about integers and floats.

Integer

First of all, unlike decimal numbers, integers are radix 2 (power of 2) numbers because each position in an integers can only be 0 or 1. For decimal numbers (radix 10), each position in a number can be 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. So if you have a number 106 That is equivilent to 1,000,000. For an integer, 216 is 65,536. 232 = 4,294,967,296. These are unsigned numbers. For signed numbers, the formula is 2x - 1-1. So for a signed 32-bit integer, the value range is -2,147,483,648 to 2,147,483,647. The maximum positive value for any unsigned integer is 2x - 1 because you still have to represent 0.

Note that the exponent represents the hard limit as to the maximum values that a integer can hold. Unpredictable results can occur if the limit is exceeded.

As for the OP’s question, you can split and combine integers if you can guarantee that they will within 8, 16, or 32 bits. Roblox does not support 64-bit integers at this time. The way to do this is as follows:

-- Splits a 32-bit integer into two 16-bit integers.
local function split32to16(x)
	local low = bit32.band(0x0000FFFF, x)
	local high = bit32.band(0x0000FFFF, bit32.rshift(x, 16))
	return low, high
end

-- Combines two 16-bit integers into a 32-bit integer.
local function comb16to32(low, high)
	return bit32.bor(bit32.band(0x0000FFFF, low), bit32.band(0xFFFF0000, bit32.lshift(high, 16)))
end

-- Splits a 16-bit integer into two 8-bit integers.
local function split16to8(x)
	local low = bit32.band(0x000000FF, x)
	local high = bit32.band(0x000000FF, bit32.rshift(x, 8))
end

-- Combines two 8-bit integers into a single 16-bit integer.
local function comb8to16(low, high)
	return bit32.bor(bit32.band(0x000000FF, low), bit32.band(0x0000FF00, bit32.lshift(high, 8)))
end

Disclaimer: There’s a few things that you need to keep in mind when using these.

  • There might be some errors to this since I did this from memory. I wrote these routines and quite a few others some time ago in C/C++.
  • If you try to combine numbers greater than what it’s looking for, those extra bits will be masked off, so you may get a number you weren’t expecting.
  • No error checking is done.

Another thing to consider is endianness, or byte order. Although LUA insulates us from this, in other languages it can be a concern when dealing with CPUs that are not Intel/AMD/Cyrix (Little Endian). ARM CPUs (most, if not all mobile devices) have the ability to set the byte order to either 1234 (Big Endian) or 4321 (Little Endian). Other CPUs such as MIPS, Sparc, and IBM’s Z-Processor are big endian devices. Furthermore, byte order on the network is also big endian. Endian has to do with the order bytes are stored in memory for multi-byte integers in respect to increasing memory addresses. For instance, the 32-bit number 0x12345678 is stored as 0x12, 0x34, 0x56, 0x78 in memory for big endian machines. For little endian machines, it’s backwards: 0x78, 0x56, 0x34, 0x12. So make sure you get your byte order right.

Floating Point

Now the floating point specification is the IEEE-754 standard. It specifies the layout of floating point numbers in 16, 32, 64, 128, and 256 bit formats, also known as precision (someone did mention that). For all the formats, the basic layout is the same regardless of the width of the fields.

  1. The sign bit. When it’s a 1, the number is negative.
  2. The exponent. The exponent is encoded using offsets, so a 0 exponent is not 0 but another value. So for a double, its 0xb0111111111 (0x3ff). 0 and 0x7ff have special meanings which are mentioned in the double document on Wikipedia.
  3. The mantissa or fraction. The leading 1 is always assumed, but the first bit of the mantissa is 1/2, the second is 1/4, the third is 1/8, and on down the line for however many bits the mantissa is.

A word of warning though. LUA does not support direct manipulation of floating point types at the bit level. I written code in C/C++ that does do this for a big number library (numbers that are so big they do not fit into a native CPU register). It can get quite complicated depending on what you are trying to do.

Another way you can shoot yourself in the foot with floats is comparison. It is not recommended to directly compare two floats using == or !=. In fact, C/C++ compilers will warn you of this. The best way to handle this is as follows:

local x = 0.33298575
local y = 0.33298243

if math.abs(x - y) < 0.0000000001 then
	-- Do something
else
	-- Do something else
end

Hopefully this helps people.

4 Likes

I might of foreshadowed it, but I still forgot about overflowing of integers like about how I said with a signed 10 bit integer goes over 1023 it will go back to -1024, which may be something people could exploit to shoot someone across the map, But probably a way to defend this is by something like Checkimg if it goes over or using math.clamp

1 Like

@apictrain0 @C_Corpze

It’s not necessarily 64-bits. It usually is in the normal course of things, but I’ve ran into situations where I had numbers like 21024 be properly represented with complete precision. Variable in LUA and Roblox are variant types, so what I think it’s doing is setting the data type of the variable on the fly to meet the needs of the data. A variant type like in PHP and JavaScript is something like the following.

  1. One or two bytes to denote the type.
  2. One to four bytes to denote the size.
  3. The data.

The value of field 2 depends on what the value of 1 is. Although on the command line, when I do print(2^1022), it prints 4.49423283715579e+307. But if I copy an inf value from a constraint in the workspace, I get this:

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

That number is 21024 as I confirmed it on a big number calculator. I have an open bug report about that because the constant values are missing from the documentation of the math library.

And math.huge = 21024 = inf

2 Likes

Thanks for correcting me, but I realized that in the numbers document it says there are 3 types of numbers and missing a type where it could reach 2^1024

But also this 2^1024 type number doesn’t appear to be anywhere in the doc.

and In the doc It says it is a number is a double
I tried using type() and typeof() but both returns number.
I have always assume even in the doc the int and int64 would convert to a double during runtime but I appear to be wrong as a 2^1024 number could exist.

But so far, full representation seems to only be in the workspace. So something about how it’s represented in the workspace is different than how it’s represented in the code. You can do constraint.force = math.huge and it will show as inf in the constraint when you view it in explorer. If you happen to copy the inf value (which is what I did) you get that big number in the previous post.

To fully represent 1024 bits requires 128 bytes, which is in the big number arena (and that is an old standard for RSA crypto back in the early 1990’s). So what’s represented in the constraint is for the physics engine to use and it may require the full 128 bytes. Either way, it’s not the same datatype as what’s used in the scripts.

I think math.huge is the full constant and is a big number datatype because if you type this in the console, you get this:

  19:18:48.517  > print(2^1024)  -  Studio
  19:18:48.518  inf  -  Edit
  19:19:25.834  > print(math.huge)  -  Studio
  19:19:25.835  inf  -  Edit
  19:19:35.788  > print(math.huge == 2^1024)  -  Studio
  19:19:35.789  true  -  Edit
  19:19:39.752  > print(math.huge == 2^1023)  -  Studio
  19:19:39.752  false  -  Edit

It’s definitely a unique situation.

EDIT: On a hunch, I did this:

print(179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368)  -  Studio
  19:21:52.387  1.7976931348623157e+308  -  Edit

Yes, I told it to print that big number and it came back with what appears to be the maximum value that you can have for a double before it codes to infinity.

3 Likes

Actually afaik a number takes up 9 bytes i.e 72 bits. And yes as @apictrain0 pointed out, they aren’t stored as integers but IEEE signed doubles, which should technically make them 8 bytes however in other posts, its stated as 9 bytes so guess we have to take their word.

For additional info, check out the wiki page.

Now coming back to your question, from the post I linked we know that a Vector3 takes up 13 bytes but if we were to send the components as numbers, it’d take up 9 * 3 bytes which is 27 bytes so that isn’t an option. However if you looked at the post, you will notice that a string of length 1 is 1 + 2 = 3 bytes which is pretty good. So what if we encode the components of Vector3 to a character?

1 Like

So I should basically just encode vectors into strings to save up on bandwidth if I for some reason had to replicate a ridiculous amount of vectors in a array?

To be honest I haven’t done a lot of number <> character conversions before.

I know some string manipulation like filtering, formatting, splitting, etc for admin command systems and whatnot.
But my knowledge on using hex code or turning big numbers into small text is limited.

But I’d love to learn more about it because I know it’s a very powerful tool and strings in Lua seem pretty optimized so it might actually be a very viable way of compressing vectors and big numbers.

Knowing how to write the code is cool but I’d love knowing how it works as well so I’m not just copying over things like a parrot without knowing the meaning or logic behind it.

Try this

local bigIneger = v3.X + v3.Y + v3.Z

The problem with this method is that you can’t separate them.
This just adds numbers together, it doesn’t combine them in the way I intended where you can split them later without losing information.

Can you explain a litte bit more about separating them

Let’s say I have the numbers 10, 2850 and 565.
I want to combine them into a single value so it’s compressed.

Once it reaches the other side (the server), I want to turn that single value back into the numbers 10, 2850 and 565 without losing important information.

I plan to possibly also use this method for compressing data in datastores if it gets really big for some reason.

If you are looking to compress the data, I would suggest looking at either LZW or PK compression methods. Both have their strengths and weaknesses. I’m sure there are LUA implementations out there you can use.

If you can read code, then 7-zip source code is available for download.

I’ve actually looked into that but it had me wondering if this would actually result in smaller data.

Because to compress data, you also need a dictionary of keys and values to decompress it later but the dictionary of course also takes up space which might result in the compressed data becoming larger if it already is small.

A array of vectors and numbers already is relatively small.

The problem is that if we want to “zip” it, it might compress the data itself, but since we now also have to send a dictionary to the server for decompression, the dictionary might use more data, making the zipping inefficient and just slower.

Like I said, strengths and weaknesses. In some cases, it’s probably best to just leave things alone.

Updated title to be a bit more relevant and general because I might have to seek different solutions.

I did actually come across a library that uses remote events in a more optimized way but I don’t want to rely on 3rd party libraries since I might write my own library instead to use for multiple projects.

Trying to learn techniques instead of just copying what someone else already did.

1 Like

If you’re sending a large amount of data at one time, then compression makes sense. The LZW table is 256 integers which contains the counts for each character that appears in the data stream. You just send the table along with the compressed data and the receiving end recreates the tree.

GZIP style algorithms work by creating a dictionary. However, to keep the dictionary small and dynamic, a new dictionary is created for every 8KB to 16KB of data or so. This allows the algorithm to adapt to content changes in the data stream. For example, take an ELF executable which has a multitude of different types of data. The binary data won’t compress very well. However, text data does compress well.

With data compression, the more data you have, the more it makes sense to use it. My understanding is that Roblox converts data to JSON before sending it on the wire. They might even compress and encrypt it. It would make sense to do so. Have you studied any materials relating to information theory and information content? I have some books on data compression. They are outdated, but they cover the basics of what you need to know.

2 Likes

Roblox converting stuff to JSON before sending it through a remote did leave me with questions.
I saw a post earlier that showed how much data every data type in Roblox used.

If I recall it was something like…
A number was 9 bytes.
A string 2 bytes + it’s length.

A Vector3 was roughly 12 or 14 bytes which surprised me because that implies that sending a vector is cheaper than 3 separate numbers.
As 3 x 9 would be 27 bytes, which is way more than a Vector3 uses apparently.

What ESPECIALLY baffled me is that sending a boolean, a value that is normally only 1 bit and can only be 1 or 0, takes up…

4 bytes

Yes a boolean apparently is 4 bytes, which left me wondering if the JSON theory is true.
because the word “true” on itself already is 4 characters long which would make up for the 4 bytes.

But "false"is actually 5 characters but if I recall, a boolean uses 4 bytes regardless of it’s value which is really weird because if Roblox truly does put everything in a JSON table then this should be 5 bytes, right?

I did stumble upon an public library/resource called BridgeNet2.
And apparently this library is really good at optimizing networking and I’m just trying to figure out how it manages to use less data somehow.

I’ve looked through it’s code on GitHub but can’t exactly find in what script or which function does the “compression” or “optimizing” of data.
I don’t want to rely on 3rd party libraries because I want to develop my own at some point likely and not straight up copy what someone else wrote.
I seek to learn how things work so I can eventually pass on the knowledge.

Blank remote call: ~9 bytes

string (len 0): 2 bytes
string (len 1): 4 bytes
string (len 2): 8 bytes
string (len 3): 9 bytes
string (len 4): 10 bytes
string (len 5): 11 bytes
string (len 6): 12 bytes
string (len 8): 14 bytes
string (len 16): 22 bytes
string (len 32): 36 bytes

boolean: 2 bytes
number: 9 bytes

table (empty): 2 bytes
table (array with 4 numbers): 38 bytes

EnumItem: 4 bytes
Vector3: 13 bytes
CFrame (axis aligned): 14 bytes
CFrame (random rotation): 20 bytes

I did find this post, by @Tomarty.

Oh, here it apparently says a boolean is just 2 bytes, huh? Maybe I got 2 sources mixed up.
But that is still a lot of bytes for something that is basically only on or off.

A CFrame apparently is 20 bytes which also absolutely blows my mind because CFrames hold a position which is 3 values AND a rotation which in Euler form (I think) also has 3 or 4 values depending on if it uses quaternions.

6 numbers should be 6 x 9, right? funny sum.
Wouldn’t that be 54 bytes? That’s more bytes than a string with 32 characters.
Howwwwwww?

Does this imply I could compress strings by putting characters inside CFrame components?
The more I learn, the less I seem to know about the subject.
Seems like I might not know as much as I initially knew.
Roblox engine under the hood surely has it’s mysteries.

2 Likes