Networking latency with combat system cooldowns

So I am designing a combat system, it works as follows: the client presses keys to perform actions, basic sanity checks are done on the client and the effects are displayed instantly for optimal UX and a cooldown debounce is put in place. Through a remoteevent, the server does sanity checks like checking server cooldowns for that action and then a debounce cooldown is put into place. The problem with this is that the server and client are out of sync due to varying latency, so some actions may be refuted on the server that the client has already validated, giving the user false feedback.

Iā€™ve done some looking around and found a method for resolving issues such as the one described above through something called the ā€œleaky bucket algorithmā€ posted here. While this method does seem like a good approach for cooldowns, I donā€™t see any way I can Prevent exploiters from doing multiple actions at once (ie. using a damage-dealing skill and an attack at the same time) because the cooldowns are independent from each other and the server has no means of verifying if an action is already in process.

How do developers tend to deal with the networking latency issue described in the first segment? And if you use the leaky bucket, is there a way the server can check if an action is being performed in order to prevent exploiters who want to cheat using the vulnerability I described with this algorithm?

10 Likes

The leaky bucket algorithm will still work well in your case, however some parameters will need to be adjusted. The rate of decay should be exactly the inverse of the maximum rate of input. However, the threshold needs to be adjusted to optimally split between network latency and the ability to cluster actions for exploitation like you describe.

There are many ways to formulate the problem - as a statistical, a machine learning, or even a physical controller problem. You can either perform real time or design time analytics to determine the optimal threshold. Iā€™d recommend a mix ā€“ gather client data, analyse it to create a model, and apply it dynamically based on learned user parameters. If you donā€™t want a client-specific model (but instead one that works in every case) then it will not likely not fit the data as well and result in false positives or negatives due to varying network noise. It may be worth evaluating which endpoints are mission critical and must be exploit resistant and which can accept some abuse within reason.

For your data, if you choose to collect it, I would gather:

  • user id of the requesting player
  • session id (if the player leaves and comes back they may use a different network route, or even device!)
  • server id (different locations)
  • endpoint hit
  • sender timestamp
  • receiver timestamp
  • quantity of data
  • important data depending on semantics of the endpoint that influence maximum rate (convert a scalar quantity of a ā€œenergyā€ resource into a percentage reduction of cooldowns). The distance of an interaction may also influence the cooldown of the action by costing more ā€œenergyā€.

This data should be sufficient for any offline technique to solve this problem. You may want to collect more data since it is always useful to have around to perform all sorts of analytics on. If you want a quick and dirty solution, simply pick the threshold that strikes a good balance of the percentage of interactions accepted, the ability it gives exploiters to cluster actions, and the importance of securing that endpoint. BTW, this assumes that most, if not all, your data is gathered from players who are not exploiting. You may want to simulate exploits and determine the percentage of the exploits accepted vs real interactions rejected.

However, since there are lots of ways to go about solving the problem, what would you like to do? How important is security vs performance for you? Are these endpoints being hit very frequently, and thus need a quicker algorithm? Are there a mix? How important is player convenience vs security for you, and does it need to be specified per an endpoint? Would you like a more complex method or simpler? Do you prefer machine learning, statistics, or controllers like the leaky bucket? Are you opposed to gathering data and would prefer an online-algorithm? Is the performance and complexity impact of a client-specific controller worth it to you?

P.S.
Iā€™m in a masterā€™s program in CS and commonly deal with issues like this, so that is the perspective Iā€™m approaching this with.

5 Likes

Best case scenario Iā€™d like even users with latency to have a near flawless experience in terms of feedback while having exploiters abide by the cooldowns placed in the game. I donā€™t really need any data as I can work with experimental values for thresholds. In my case, I am using it for a combat system with attacks that are as quick as .4 seconds, meaning their cooldown is also .4. I also have several abilities that are to work with a stamina system and their own cooldowns that last up to as long as ~20 seconds. I prefer to use algorithm based solutions that donā€™t require much work outside of programming, the leaky bucket would work well for these individual cooldowns. On top of individual cooldowns, though, Iā€™d like a cooldown that works the same way a debounce boolean ā€œcanActionsā€ being set from true to false to true again with a wait(t) works; with each action checking if it can be performed by determining if another action isnā€™t already being done with an if statement and this booleanā€™s value.

Because of the network delay, however, I can no longer do this with a simple debounce. I understand that I can use the leaky bucket for the individual cooldowns of actions and remote spam prevention, but I canā€™t see how I can make use of the way this algorithm works to do server checks for ensuring a move cannot be used on top of another through means of firing remotes as an exploiter.

Besides this issue, I understand everything and the leaky bucket would be a great solution for the original network latency problem. Iā€™d just prefer if I couldnā€™t sacrifice any security with this algorithm to have good UX. Are you aware of any coding solutions that would work with the leaky bucket algorithm that allows me to check if an action is already being done by the player on the server in a similar fashion to the debounce example?

As I understand it, you would like to use the leaky bucket algorithm without offline analysis with an endpoint-sensitive threshold. What about client-specific parameters? It may give some extra overhead, but Iā€™ll assume youā€™ll want them. You can have a global cooldown that needs to accept requests first before endpoint-specific cooldowns are used.

As for a nice UX method on top of leaky buckets, you can set another threshold which determines the if an action should be added to a queue to be performed when available. Iā€™d set this threshold to a value like 0.1 seconds + mean client latency from the time the cooldown would have become available.

These are all the parameters to be adjusted:

  • maximum threshold
  • % of threshold determined by server latency vs player specific latency
  • desired % of interactions that should be denied (since we donā€™t know the bound between exploits and clean interactions weā€™ll simply reject the most likely interactions to be exploits.)
  • desired % difference between actual % rejected and % actually rejected that causes an update. The update may also be continuous if you desire
  • amount of server and player-specific history to keep
  • threshold to not perform an action
  • endpoint security level

Given an existing sane threshold to perform an action, it can be adjusted real time using this algorithm:

  • If there is not enough history, return
  • Scale the desired rejection % (and optionally maximum threshold) by the endpoint security level
  • If the server rejection history does not match the desired rejection % then:
    • add / subtract more to the thresholds server wide (up to % maximum for the server)
  • If the player rejection history does not match the desired rejection % then:
    • add / subtract more to the threshold for the player (up to % maximum for the player)

These threshold updates can be performed at any desired rate. These threshold and the accepted / rejected history should only track interactions that came in before they were supposed to. All calls after a cooldown are always accepted and donā€™t count toward the desired %.

If we were to have data and perform analysis offline, we could try to determine the optimal % threshold or the difference between an exploit and noise, but this is a decent online algorithm. As with any online algorithm, the attackers may be able to gain up to the maximum allowance by trying to exploit more than their normal interactions. This is because all the online algorithms have to work on is the current history. Iā€™d adjust the history length, server vs client threshold, and maximum threshold to get to a point you are comfortable with.

1 Like

Ok, so a few questions:

  1. How would I get the server to verify an action without delaying the client through verification of requests like youā€™ve described above?
  1. By another threshold do you mean another ā€œbucketā€ loop, but this time one that tracks actions somehow? or are you referring to another sort of cooldown variable on the server? And how would you get the mean client latency? More info on this process would be greatly appreciated.
  1. How would we be able to tell which ones came in before they were supposed to? Do you mean an action that hasnā€™t been verified by the server with the additional threshold thing you described in Q2, or when the ā€˜bucketā€™ for the action is full, or something else?

Also, if you could give a step-by-step example where all of these procedures are performed (mainly the global cooldown that accepts requests and the queue list threshold check and the server to client communication that takes place), starting from when a user presses a keybind and sanity checks are done on the client, to having the server verify the action and replicating it. Iā€™d greatly appreciate this as it may help clear up anything Iā€™m not understanding from what youā€™ve state thus far, thank you for your help!

Iā€™m a bit of a beginner but I would just focus on the client locking itself out with the cooldown. So that it canā€™t cast another action and LOOK as if itā€™s doing the 2 attacks. Then Iā€™d send the message to the server to work out the attack and damage etc and have it not do another attack calculation until like 2 seconds before the Client cooldown drops out. Allowing 2 seconds for latency.

That way the server does all the work and the client just sends the message saying ā€œDo the real attack stuffā€ and then the client shows off all the fake eye candy stuff to make it look instantaneous for the player.

That way if they hack their own cooldown, they can send as many messages as they want to the server but itā€™s busy doing itā€™s first message. And the player might look like itā€™s doing attack after attack but the server is ignoring it all. They can have all the eye candy they want and make it look awesome like machine gun spells but only 1 message will be received by the server. And if is does get multiple messages in a row (someone hacking) the server can kick them.

But I could be wrong.

1 Like

This wouldnā€™t really work because it relies on a debounce on the server, which is always going to be out of sync depending on the userā€™s latency. And it also wouldnā€™t really work for quick attacks that last a mere .4 seconds like that of a quick punch or sword swing because 2 seconds is far greater. You need to account for a ā€˜burstā€™ of events all at once because the time the events arrive on the server is different from the time they are fired respective to each other on the client.

Also, kicking players for exploits is never really a good solution because with this problem exploits tend to act like really high latency players, and the worst thing to do would be to accuse a player of something they didnā€™t do. Iā€™d delay any further events until those are processed/cleared instead because it works for everyone.

I appreciate you trying to help, though! This problem is really quite a complex one.

1 Like

Heā€™s saying something like a leaky bucket, except when something exits the bucket, thatā€™s when you perform your action.