Network Optimization (2021) - Preventing High Latency & Reducing Lag

Hello developers! :smile:
You may have read one of my old articles on this topic with a similar title:
Network Optimization - Preventing High Latency & Reducing Lag - Resources / Community Tutorials - DevForum | Roblox

This is my rebooted version of that now fairly out of date article. Over time, the way Roblox has transmitted remotes has changed quite a bit, and, works wildly differently than it used to. Because of this, most of the content in my old article is no longer accurate. This post is meant to have the latest information on network optimization.

This article is based on my own testing and prior knowledge of Roblox’s data transfer. These are things I have a lot of experience with, and principles I follow all of the time, but, not everything here will be perfectly accurate and behavior can change a lot as time goes on. Some assumptions on how Roblox’s data queue works may be incorrect, however, since this is based on testing, the “what to do and why” sort of information will still be pretty accurate.

Remote data transfer

Remotes transmit requests at variable rates. Remote data is transmitted up to 60 tps, however, most mass remote requests are coalesced into one big request. The throughput limit of remotes is completely dependent on the network speed of the server and client and is highly variable, but, still respects Roblox’s global throughput limit per play.

IMPORTANT Unit Discrepancy

In this article I use these units:

  • kbi - kilobits (This “should” be kb)
  • kb - kilobytes (This “should” be kB)

The limit for this data was previously listed in my article and on the devhub as 50 kb/s (50 kilobits/s) for any player. This is 8 times less than 50kB/s (50 kilobytes/s), which is the actual data limit as shown through testing. This confusion came from the fact that kB is often improperly written as kb. Additionally, Roblox’s network debugger lists “KB/s” which is again, unclear and technically improper since capitals matter. Roblox’s network tab uses kilobytes per second (kB).
image

The proper way should be kb vs kB however unfortunately this is not well respected by anyone (even I personally disagree with these units being listed this way and don’t respect it myself). In this case, for clarity I will use kbi to reference kilobits and kb to reference kilobytes as this is how I prefer to write my units.

How much data can Roblox transfer?

Roblox has a soft transfer limit of 50kb/s between a player’s client, that’s 1024 * 50 bytes per second. A Roblox server has no limit to how much data it can send or receive globally, but you are prevented from sending more than this limit or receiving more than this limit for any given player. It’s important that you send well below this data limit, otherwise, you will slow down (or even halt!) all replication throughput. Since data is effectively queued, the connection between the client is not lost so your ping can be well above the 30 second timeout, even minutes behind. Sending too close to this limit is a no no, since this limit encompasses all data throughput, not just remotes.

What can go wrong if you send too much data?

Here is an example of what can go wrong. Let’s say you send 60kb/s via remotes constantly. Roblox will coalesce many of your requests into one bigger one, which, has the effect of usually allowing Roblox to send data for a few frames before some of your data is sent.

In this case Roblox likely is only sending 50kb of your data every few seconds, meaning most of your data is going to the queue. This means that most of the queue is taken up by your data, and, due to how Roblox prioritizes packets in the queue, eventually, the queue can get big enough where Roblox isn’t really being prioritized anymore, since there is a lot of old remote data from several seconds ago that Roblox sees as data that needs to get sent sooner.

This can result in measured pings alone reaching over 100k ms after only a few minutes. But, that’s only measured ping, that doesn’t include any info about how data us prioritized. That means, even though measured ping is 100k ms, which is over a minute, the effective ping for things like replication could be ten, twenty, thirty minutes making your game completely unplayable in only a few minutes.

How can you manage your data throughput better?

I would recommend firstly giving yourself a goal limit. I would say 25kb/s is a reasonable hard limit to give yourself. This reserves half of Roblox’s data throughput for yourself and half for Roblox & replication. It’s okay if you occasionally go above your limit in rare cases, or even above the 50kb/s limit, but, you never want to go above your limit for more than a few seconds. Going over the 50kb/s limit will gradually increase ping as more and more of the data throughput is allocated to your remotes and more and more replication data is queued up.

You should not only take into account remote requests, but, also property changes. If you change more than a few properties at once or change more than a few instances at once, you should unparent the target instance(s) first with as few .Parent sets as possible, set your properties, and then reparent them. This turns what could be hundreds or thousands of property changes into a few property changes (or rather, ancestry changes) and a few instances being sent.

For example, let’s say in your game you have an entity system, let’s say you have coins that spawn in a folder in the workspace called Coins. Let’s say you clear the coins on your map at the end of a round. What you should not do is loop over each coin and delete it individually. Instead, you should :Destroy() your Coins folder, and create a new, empty one.

Continuing the example, let’s say you want your coins to spin, or change color. What you should not do is create this behavior applied by the server. Instead, you should have the client do the property setting and simply have the server occasionally tell the client “hey, here’s a list of coins and what colors you should make them in the future.” Even better would be to simply keep this behaviour on the client.

Doing all of this might require restructuring of your game’s code, or even rewriting how entities work in your game. But, there is unfortunately no way around it. This is similar to having good game security, if your game is designed without security in mind, and you want to improve security in the future, it could require large changes to how your game works.

This isn’t just about ping!

Good network practices can also massively improve the FPS and general performance of your player’s clients as well. This is because processing incoming network data is expensive, and can be extremely performance heavy in mass quantities. Processing one big packet is easier than processing one thousand small packets, since there is small CPU, network, and memory overhead for every packet sent. 10000 packets (also referring to things that might be combined into one packet) times an overhead of 0.01 for each is 1000, but 1 packet times an overhead of 0.01 is still only 0.01.

This is exactly why Roblox coalesces your remote requests into one big request every second or so. It might increase perceived ping a little, but, a lot less data is sent, a lot less CPU is used on the server and client, a lot less memory is required, and generally, a lot less everything is needed, and, you should take inspiration from this property of data transfer wherever you can.

Conclusion & Special note on instance based terrain, voxel & entity systems

Terrain and entities are both cases where you expect to potentially be making many thousands of changes. Often times, and in the case of terrain, always, you will find that parenting things after you do the work is surprisingly very fast. For example, let’s say you generate some voxel-style terrain. If you parent each voxel to nil, and when you’re done generating terrain, parent each voxel to the workspace, you will find you will get better performance than if you parented each voxel to the workspace immediately. On top of that, if you parent each voxel to a folder parented to nil, and when you’re done generating the terrain, parent that folder to the workspace, you might see even close to a 100x speedup!

This is again, due to the property of overhead above. There is almost always overhead to having a large quantity of things, even when you least expect it. You can always expect to see better, or at least equal performance by coalescing things together into bigger chunks. You’ll never see worse overall performance.

The caveat to this is that if your chunks are too big, you’ll see a lot of stuttering, which can be more distracting than overall performance being a little low. For example, say you’re getting 100 FPS in a game, but, every second you get a lag spike that takes you down to 1 FPS. This can be a lot more distracting, and a lot less enjoyable than a stable 50 FPS.

So, the takeaway is that you should do as much as you can but still maintain balance. The more you practice balancing these things, the better you will become at it, and, you might see that adapting your entire style in favor of these behaviors might also make it easier to develop performant games, reducing your overall time cost since you’ll do less going back and optimizing.

90 Likes

:smiley: glad you updated the Topic but where is your evidence to backup your findings?

Do you have a test place that we can see for ourselves? It’s not easy to trust someone else blindly without them providing any proof, I’m not calling you a lier though.

You’d be more credible and trustworthy if you provide a repro file.

(I also hate that the thread is a giant wall of text but I read through anyways)

Use ClearAllChildren

4 Likes

Here is the place file I used to collect data on remotes as well as test how replication is effected. I spent roughly two hours working with this place file while writing this article. Additionally, some of this stuff is from older devhub content that doesn’t exist anymore, e.g. the 50kbps bit, I’m not sure if any archives exist. RemoteData.rbxl (23.6 KB)

Additionally, does :ClearAllChildren() not send individual AncestryChanged events over the network? From my understanding, the reason :Destroy()ing the top level instance is so much less intensive is because only one AncestryChange is sent for the top level instance. If :ClearAllChildren() only sends a singular piece of data, that’s great news for me. I’m not sure how I might effectively test in the case of :ClearAllChildren(). (In order to test for :Destroy(), I basically took an average of frame time and network transfer rate for the two options, which I did roughly six months ago when I was working on the terrain gen for one of my games, I’d probably have to take the same approach again)

From my understanding, :ClearAllChildren() isn’t any different than looping over instance:GetChildren() and calling :Destroy() on each child, which, is intensive because of the overhead for sending each change event individually. The prior also has extra CPU overhead which is easy to test, since :Destroy()ing the top level instance is again only one event.

I’ll have to do a benchmark and see how topLevelInstance:ClearAllChildren() compares with topLevelInstance:Destroy().

14 Likes

Great post!

Thanks for the specifications regarding the data limit and insight on how Roblox handles netcode!

Could you possibly provide more examples on how one would go about reducing data size?

2 Likes

So does the server have any limits when it comes to network? Like lets say I spam a remote to the server, does the client’s network go down or the server? or is it both?

The server and client both limit their outgoing transfer rates per player to 50 kilobytes per second as I discussed in the post. Even without a limit most likely you’d hit a CPU bottleneck on either end before you could actually cause problems for someone, so, if you are concerned about DoS attacks specific to Roblox, don’t be, as, there is pretty much no way to actually transmit enough info to do this.

Primarily network limits are likely in place simply to reduce network strain on Roblox’s end and introduce predictability, not to increase security. By limiting the bandwidth of each server to something small and only allowing a certain amount of bandwidth in per player, this allows Roblox to easily allocate their server bandwidth for different sized servers allowing them to easily scale up or scale down. If you think about it, a 10 player server requires at most 100 kB/s times 10 players, so, at most one megabyte per second.

Basically, it gives Roblox a good way to know how much they might need to allocate to a given server or their entire server network.

@Avallachi I can certainly add more info, but, unfortunately I don’t have a good way to test transmission size at the moment. I am planning to do a deep dive into Roblox’s network protocol to really find out myself exactly what’s going on, but, we’ll see, I have been short on time lately.

1 Like

I find this article a bit helpful, but it still doesn’t answer the question of why roblox games have been laggier than usual since the first quarter of 2020 to now. This is all physics and server-to-client replication lag, which includes tweenservice, NPC movement, et cetera. For example, just yesterday I revisited a game of mine I made in 2018 of a brick being tweened back and forth in random intervals. I don’t know why I thought it was cool back in the day but I remember there was absolutely no lag in the game. Coming back, the brick movement was extremely choppy. Every second or so it paused then skipped a few positions and so on. It’s not the only game this phenomenon occured in. Pretty much every game that utilises server-to-client replication visuals instead of solely client visuals has this issue. I find it quite difficult to make every significant dynamic visual client sided only. This method can yield syncing issues, and will be especially detrimental to physics based visuals like rollercoasters or swings.

Hi there great post, I have a few questions,

  1. when you say you shouldn’t go over 50kb/s, what would one RemoteEvent firing look like? Let’s say you wanted to send a table of info, how much many kbs would that be?

  2. I use for loops a lot and do exactly what you are saying not to do, how important is changing that vs the remote event stuff, would it be worth changing a lot of my game?

  3. Would RemoteFunctions have the same effect as events?

It depends on the data in the table, and, that probably is changed over time if different types serialize differently.

I don’t know what you might be referring to since I’ve kinda forgotten exactly what I’ve written, but, generally, its alright if you fire an event in a loop, or even every frame in Heartbeat, on the server or the client, but generally you just don’t want to be sending a lot of data repeatedly.

Sort of, RemoteFunctions are closer to firing an event, and then firing an event back on the other side. It’s also possible for one side to permanently halt the other by yielding forever, which makes RemoteFunctions not ideal for getting data from the client since it’ll halt any code you’re invoking in like a wait call until the data is sent back.

Typically, I like to avoid RemoteFunctions altogether, but, that’s just personal preference because I like building my event stuff from scratch, but, generally, the difference is that RemoteFunctions will take one network frame for the sender to send the request, and one network frame on the receiver to give the return info, and then lastly, one more frame for the sender to get the return data back.

An event will take one network frame for the sender, and one on the receiver because of how the data gets processed.

1 Like

Hello, I have a question that may be related, should I be worried and change how I disable PointLights in one of my game called Maze Generation? The current way it works is:

  • On server start, everytime the server generates a 120x120 grid cell, tag the PointLight with a tag using CollectionService
  • When a blackout/blood hour event occurs at a random time, the server first gets an array of the PointLight(s) from CollectionService with the pointlight tag, and then iterates through them, check if they are really a PointLight and disable them
  • After some time, the server does the same as the step above, we get the PointLight(s), iterate through them, but this time we enable them back (so returning the maze back into a normal state)

At the moment, there is no noticeable lag yet (from me or my friends, even my friend who has a Pentium pc said it is playable), however should I be worried and change how this works into the client-side?

The reason I am asking this is because everytime the maze generates, there can be over 180k instances (includes PointLight(s)) (may not be accurate, but I did optimize the cells into fewer instances), additionally theres a random chance for the PointLight to be removed everytime a cell is generated when the server first starts.

If you wanted to optimize for bandwidth during a change, when you enter a blackout/blood hour event you could fire a remote letting the client know that it should visually enter the blackout/blood hour state. Then on the client you can collect the list of PointLights and disable them, or change their color.

The result is that, rather than sending a thousand or so property changes for each light, you send a small string, and the client does the same changes. The only performance costs you’d have there would be any overhead of using CollectionService to get all the point lights, which, probably isn’t much if there is a cost to getting the list, and maybe overhead of setting the property directly on the client, but, that’s not likely to be much if anything either.

But, it’s up to you whether or not you think the bandwidth is worth it, because likely it won’t effect performance cost positively or negatively by moving it to the client, at least, in any way that is obvious. It might give you more room network-wise to do other stuff when you’re entering a blackout or blood hour event.

If you’re using StreamingEnabled, there could be less overhead or something for a lot of tagged instances depending on how CollectionService is implemented, but, that’s just speculation, I don’t know anything about how CollectionService works.

1 Like

Yes, the game does use StreamingEnabled with default settings and the low-memory setting option thing.

So, I send a array containing the PointLight instances to the client along with telling them that a blood hour/blackout will happen? I think I get what you mean, thank you for making the post!

1 Like