What makes skill? Need help for leaderboard

How can skill accurately be determined?

1. What do you want to achieve? Keep it simple and clear!

I want to make three leaderboards that rank players based on their skill, with each leaderboard being for a specific mini game.

2. What is the issue? Include screenshots / videos if possible!

How do I determine skill? Here is a list of fake generated statistics for 2 players. Can you determine who is better in each category?


Bomb Run:

Total wins: 40
Total losses: 18
W/L percentage: 69%
Fastest Time: 17.084s
Average Time: 5m 43.725s```

Sword Fight:

Total wins: 22
Total losses: 45
W/L percentage: 33%
Total kills: 154
Total deaths: 340
K/D ratio: .45```

Brick Battle:

Total wins: 44
Total losses: 44
W/L percentage: 50%
Total kills: 308
Total deaths: 308
K/D ratio: .50```

Dev_Diablo -

Bomb Run:

Total wins: 30
Total losses: 15
W/L percentage: 67%
Fastest Time: 20.836s
Average Time: 33.725s```

Sword Fight:

Total wins: 20
Total losses: 0
W/L percentage: 100%
Total kills: 140
Total deaths: 0
K/D ratio: 1```

Brick Battle:

Total wins: 55
Total losses: 55
W/L percentage: 50%
Total kills: 385
Total deaths: 385
K/D ratio: .50```

As you can see, in Bomb Run, Lynnlo is better in total wins, W/L %, and fastest time. However, their average time is so awful that it is obvious if they faced Dev_Diablo, they would lose. We can’t base the leaderboards on ratios though; that would literally be the worst option, because a new player can win once and quit, and stay on top with a 100% ratio.

Pertaining to the second question, who is better for each mini game?

Bomb Run

  • Lynnlo
  • Dev_Diablo

0 voters

Sword Fight

  • Lynnlo
  • Dev_Diablo

0 voters

Brick Battle

  • Lynnlo
  • Dev_Diablo

0 voters

3. What solutions have you tried so far? Did you look for solutions on the Developer Hub?

There is no accurate solution, to my knowledge. So I need to go with the closest thing to a solution that I can get.

We really have two options. Since ratios and percentages are definitely the worst answer, we have the option to go for the umbrella categorization or a more narrow categorization.

We can base each leaderboard by the amount of wins; or for the Bomb Run leaderboard we can make it the shortest time, and for Sword Fight and Brick Battle we can make it top kills.

The question is which is better? As shown in the mock demo for Bomb Run, although Lynnlo has more wins and is better overall, his average speed shows he would lose to Dev_Diablo. More wins doesn’t necessarily determine skill. And having the most kills or top speed doesn’t necessarily determine it either as displayed in Bomb Run, because Lynnlo got very lucky once and for the most part is a slow player.

Out of the methods I named, what is better?

  • Top wins for each leaderboard
  • Top speed for Bomb Run, and top kills for Sword Fight and Brick Battle

0 voters

Please leave a comment stating what the best method is to determine skill, and if I haven’t already named the way you would determine it, please state what you would do.

1 Like

The concept of “skill” is really more of an arbitrary idea than a measurable characteristic. For example, it’s impossible to say whether Lebron James is more skillful than say Lionel Messi because they play in two very different sports that require different skills. It’s even hard to measure skill between two players of the same sport but different positions.

In most cases where skill is given a numerical value, it’s often calculated with cobbled together stats that just work “good enough”. Take the following baseball statistic as an example: Runs Created

Granted, I looked for the most confusing stat I could find, but what is the point of that * 0.55? It just works.

So what I’m getting at here, is that in terms of you measuring skill as a single number for each minigame, you can really do whatever as long as your happy with the general results. In your case, you’re measuring up multiple different stats to contribute to the overall skill rating, so what stats do you think hold the most prominence?

For example, Fastest Time could be a fluke such as in your Lynnlo Bomb Run example. Obviously, average time should have some more sway in determining the skill rating than the fastest time.

Win/Lose percentage should be another big factor, as it really determines how effective a player is at that minigame. However, a high W/L tied with a low K/D might mean the player is a more passive player that waits for the others to take each other out, so they may not be as skillful as the W/L might suggest.

You should really just create an equation that you feel accurately captures the skill level of the player, putting a higher emphasis on the stats that you feel are more important. Here’s an example equation for a Bomb Run player’s skill (with little thought put into it):

skill = (100/(AverageTime - FastestTime)) * 5 + Wins / (Wins + Loses) * 100

The basic idea here is that I put a prominence on the W/L percentage, so I multiplied by a higher arbitrary number. I did 100/average - fastest because a skillful player should have a smaller difference in those two numbers. Why 100 over? Because it works.

Based on this very rudimentary equation, Lynnlo would have a Bomb Run skill rating of 70.53 and Dev_Diablo would have a skill rating of 103.10. And looking at their stats, you might agree because their W/L is about the same, but Lynnlo has an incredibly high average time. You can do something like this, but with more fine tuning.

This is obviously a valid concern, because a single game does not dictate a person’s skill level. Statistics has a condition to make sure things like this doesn’t skew data. The Large Enough Sample Condition. It states that a sample size of at least 30 samples can be assumed to be normally distributed, and therefore is an accurate representation of the data.

Now, you may not want players to need to play the minigame 30 times to get their skill rating, you can some other outside manipulation of the values. I was thinking creating a confidence interval but unfortunately with my very limited statistics knowledge I don’t know how to do this without population data, which would be a nightmare to collect. You can do something where if the sample size is less than 30, multiply the result by n / 30 (as an example), which lower the skill based on how close the data is to “normal”.

Lets do this for three players on Sword Fight, based only on the W/L ratio for simplicity. Lynnlo (22 - 45), Dev_Diablo (20-0), and MayorGnarwhal (1 - 0).

  • Lynnlo has 67 games played, more than 30. We can keep his skill at 33.
  • Dev_Diablo has 20 games played, less than 30. We can get his skill rating with 100 * (20/30), giving him 66.7.
  • MayorGnarwhal has 1 game played, less than 30. We can get his skill rating with 100 * (1/30), giving him 3.3.

As you can see in Dev_Diablo’s case, this is quite cruel to him, as 20-0 is a very impressive record. You may want to replace (n/30) with a quadratic equation or something that tapers off in the higher numbers. This may also be too cruel for MayorGnarwhal, as 1-0 shouldn’t be comparable to a 1-30 record.

Much needed tl;dr:

Skill is a very hard thing to measure, and will require a lot of objectivity. The best you can do for a complex skill rating system is just make up equations that you feel accurately depict the player’s skill levels.

or just do a top wins leaderboard lol (which will 100% be mostly those who are constantly playing)