[Partial Open Source] ChaWatcher - Anti-Cheat / Anomaly Detector Components For Roblox! Uses Machine Learning From DataPredict!

MYOriginsWorkshop · October 12, 2023, 12:30pm

Also known as Character Watcher!

Updates (Version 1.1) (a.k.a. Distributed Computing Update)

Further reduce your server’s load using distributed computing! Recommended for games with very large number of players!

Clients do all the calculations for certain amount of players. Only simple calculations are made on the server.
Clients will not calculate their own movement data, but instead they will focus on others. This removes the hackers from directly influence their own data. The only exception to this is when there is only one player in the server or your are data collecting.
Related to the above, I will warn you that if there are only two clients and one of them is a hacker, there is nothing for the anti cheat to validate with and will accept everything from the hacker as truth. Implement your own solution for this.
Checks if any clients access the remote events and will run relevant functions when necessary.
For the anomaly detector, the average difference of predicted values will be used to find out if a client has altered the data or predicted values.
If a client attempts to send a predicted value for a player that is isn’t supposed to be watched by that client, then the anomaly detector marks this as a client that has access to remote events.
If the network ping sent to client and returned back is greater than 0.3 seconds, the predicted values received from the client will be ignored. Those predicted values are not up-to-date information, which may lead to inaccuracies.
Whenever a client joins or leaves, all clients will watch a new set of random clients. Makes it more difficult for the hackers to figure out which client controls the hacker’s data. Also, it allows the clients have balanced number of watchers. This reassignment will be done on server side.

Features

Uses BasePart’s and Model’s Velocity, Orientation and Position. Does not depend on HumanoidState, WalkSpeed, JumpPower, JumpHeight!
Only need to collect normal player data!
Contains codes for data collection, model training and anomaly / outlier detection!
Easy to use API (to the point that I don’t need to write an API reference!)
Developers handles the abnormal players upon detecting abnormal behaviours!
Save your data and models offline and online!
Default model parameters and data given for the original version!
Custom version is given to adapt to your needs! For example, kill streaks and aim bots!
Sell and distribute your own modified version of ChaWatcher library! (With certain limits of course.)
IT’S FREE!!!

What makes this a high quality product?

Thorough analysis when choosing the kernel functions and the kernel parameters for support vector machine.
Multiple iterations involving different models (e.g. Expectation-Maximization) were analyzed for performance, accuracy and ease of use. One-Class Support Vector Machine showed the best results.
Data analysis was performed by looking at which features significantly contributes to the accuracy of the anti-cheat. Any features that are non-significant or decreases the accuracy is removed.
1 year worth of machine learning system development prior to working at my first job. I’m also passionate for machine learning knowledge in general, where I took related courses prior joining a university.

Example Code (Taken From ChaWatcher's Documentation in GitHub)

Anomaly Detection (Original Version)


local ServerScriptService = game:GetService("ServerScriptService")

local MatrixL = require(ServerScriptService.MatrixL)

local ChaWatcher = require(ServerScriptService.ChaWatcher)

local AnomalyDetector = ChaWatcher.Original.AnomalyDetector.new() -- Setting to default.

-- First argument above is the normal threshold. If the predicted value is less than the normal threshold, then the player is considered not nornal.

AnomalyDetector:bindToOutlierFound(function(Player, predictedValue, fullDataVector) -- Runs a function if player's data is an outlier.

	print(Player.Name .. " has an outlier data!")

end)

AnomalyDetector:bindToHeartbeat(function(Player, predictedValue, fullDataVector) -- Runs a function on every heartbeat.

	print(Player.Name .. "\'s data has been collected!")

	local distance = fullDataVector[14]

end)

AnomalyDetector:bindToMissingData(function(Player) -- Runs a function if cannot create a data vector.

	print(Player.Name .. " has missing data!")

	local currentDataVector, previousDataVector = AnomalyDetector:getPlayerDataVectors()

end)

AnomalyDetector:start() -- Starts detecting outlier data.
AnomalyDetector:stop()  -- Stops detecting outlier data.
AnomalyDetector:start() -- Starts detecting outlier data. Again!

Links

Read the documentation at GitHub here!

Get the library from GitHub here!

Please report any bugs here.

You can see the thread for my DataPredict library here!

The original version checks for any abnormal movement behaviours, while the custom version contains an empty template for you to use for your own needs.

Don’t forget to leave a like!

MYOriginsWorkshop · October 12, 2023, 12:57pm

Now I shall wait for this library to explode!

VSCPlays · October 12, 2023, 12:57pm

Finally, an machine learning anti cheat, but can you explain to me what “machine learning” is?

MYOriginsWorkshop · October 12, 2023, 1:06pm

Short version and non-mathematical version:

Machine learning is a way for computers to find patterns in the data. The computer then exploits these patterns to create a prediction.

Long version and mathematical version:

Machine learning uses differentiation to iterate through non-optimal parameters to optimal (best) parameters. During training, the difference between the predicted value and the actual value (e,g, predicted value - actual value) is used to improve the parameters. Once an optimal parameters are found, we can use these to predict values.

Synitx · October 12, 2023, 1:08pm

Love your machine learning stuff, can you please make few more examples/resources using your data predict library, it seems a little bit complex to me (since I suck at machine learning)

MYOriginsWorkshop · October 12, 2023, 1:09pm

Haha sure! I just need to find some free time though. >_>

Synitx · October 12, 2023, 2:24pm

Hey, so I was playing with the module a bit.
But what does this mean?

Like if predicted value is above 0.5 then it means the player is exploiting? or does that mean something else?

MYOriginsWorkshop · October 12, 2023, 2:26pm

It would be considered as normal player if the player’s predicted value is above the
normal threshold value. If the predicted value is lower than 0.5, then the player is exploiting.

Synitx · October 12, 2023, 2:35pm

it gives value below .5 whenever i am walking normally.

External Media

MYOriginsWorkshop · October 12, 2023, 2:39pm

Ah. that is just an example. If you really want to test it out, set every arguments to nil.

It will use the normal threshold from the default model settings. Putting a value there will just override the default model settings.

Once you set that, you should see less of it being an outlier.

Synitx · October 12, 2023, 2:40pm

Oh I see, thanks for your time!

xChris_vC · October 12, 2023, 4:16pm

How’s performance in production usage? This seems like an interesting concept but likely full of false-positives and high activity impact.

MYOriginsWorkshop · October 12, 2023, 4:43pm

For the false positives, it heavily depends on the model settings you set as well as the normal threshold. For example, when choosing kernel functions, you can expect the radial basis function (RBF) gives less false positives compared to sigmoid function. That is what I have observed from building and testing my library. I recommend you using RBF first before using anything else.

In addition, you have to choose the normal threshold, and different kernel functions have different normal threshold. For example, the lowest normal threshold value for RBF is somewhere between 0.25 and 0.4. Meanwhile, the sigmoid function will have threshold of around -0.1. So, you need to observe and fine-tune the normal threshold until you are comfortable with the amount of false positives.

In my experience, I managed to get very low false positive rate. The default model settings that I have in the library managed to catch a player with 30 walkspeed (the default is 16) when combined with other movements, while maintaining next to none false positives for normal players.

With proper fine-tuning, you can really get good results.

Now in terms of performance, expect model training to be very resource intensive. It must not be done in a live server.

But for anomaly detection and data collection? It is quite negligible. I applied quite a bit of Big-O analysis to ensure the algorithms runs as efficiently as possible.

Also, one of the biggest reason why I made this library open-sourced so that the others are able to share their own ChaWatcher libraries with lower positive rates than the currently we have here.

umamidayo · October 13, 2023, 6:51am

I see the video you uploaded and it shows it running on an interval, how is the performance?

And also, this looks more like a tool for developing an anti-cheat rather than an anti-cheat itself because you’re using the abnormalities to find the upper and lower bounds of the tolerance. Wouldn’t it be wiser to use it as a tool for this purpose?

MYOriginsWorkshop · October 13, 2023, 9:12am

I have stated the performance here:

Also the interval you are talking about, it uses Roblox’s RunService.

For the anti-cheat naming issue… Well, I’m just using the game industry standard where these “tools” are considered anti-cheat despite only taking advantage of machine learning element. Though, I did consider naming it as a “tool”, but I wanted to market my product.

MYOriginsWorkshop · October 13, 2023, 9:22am

Just added in some updates for support vector machine proprietary source code. The accuracy should be slightly more accurate now.

Don’t need to update the whole library, just change that particular file named “SupportVectorMachine” under “AqwamProprietary source code”

Also, I accidentally published a previous weaker default settings. Might want to update that as well.

umamidayo · October 13, 2023, 10:00am

Also the interval you are talking about, it uses Roblox’s RunService.

Yes, I know it is RunService.Heartbeat. I’m talking about the time interval for your project, because Heartbeat passes delta time as the event parameter. In your code, you’re running the code for every heartbeat / physics frame. This is a bad practice because in DataCollector.lua, the script is raycasting 3.1 studs below the HumanoidRootPart for each player that’s alive. Server-sided raycasting on heartbeat is terrible for performance and should be used more sparsely. I recommend that you add a parameter for the developers to set the interval, so that it doesn’t lag the game.

Another thing: almost every variable is being instantiated on heartbeat. A lot of your variables for heartbeat functions can be stored in a table or dictionary of character information, not created for every heartbeat. Since you’re creating new variables on heartbeat, this increases server memory usage. I recommend that you instantiate a variable in a dictionary and change them with the heartbeat.

Source code: GitHub - AqwamCreates/ChaWatcher: A Machine-Learning Anti-Cheat / Anomaly Detector For Roblox. Uses DataPredict Library.

For the anti-cheat naming issue… Well, I’m just using the game industry standard where these “tools” are considered anti-cheat despite only taking advantage of machine learning element. Though, I did consider naming it as a “tool”, but I wanted to market my product.

I understand that you’d like to market your product, but you can’t forget that misleading developers from the name of the product can be a negative factor in your marketing. This project works more like a tool than an anti-cheat because an anti-cheat is used to not only detect, but to prevent and to remove cheaters from your game. In this case, your anti-cheat project is more of a tool to detect anti-cheat, which is used by developers to develop preventative measures and remove cheaters.

Other than that, I think this is a pretty solid resource; it’s doing what the creator advertised, but I would advise people to create their own anti-cheat if they want something that won’t bog down the performance. 7.5/10.

MYOriginsWorkshop · October 13, 2023, 10:19am

Ah. Thank you very much for pointing out the performance flaws to me. It’s just that I’m really limited in terms of performance knowledge and can’t really find resources that talks about these kind of things on deeper level. Do you have like any posts or resources that I can read from? Preferably the ones that has speed comparison. It would be useful for my next DataPredict library update since I’ll be focusing on optimizations.

This is another reason why I made the library somewhat open-sourced: to allow people to create an improved version of ChaWatcher library with better performance.

umamidayo · October 13, 2023, 10:36am

Unfortunately, I don’t have good resources for you to read about the performance differences on Heartbeat raycasting and declaring variables on Heartbeat. However, you may find the results by testing / benchmarking through the developer tools (F9). My word on this is not credible because I don’t have a resource / link / research on the topics, but you can test this yourself and try to find the difference between the two methods.

Use the ScriptProfiler to analyze your script’s data performance. It will give you the exact line numbers and functions for data performance. This allows you to find which function is creating large quantities.

Use the memory tab to search for “LuaHeap” and/or “Script” to know how much memory your scripts are using; you can also look for the name of your script to see how much memory is being allocated. Since you’re constantly declaring variables, this method may not matter.

Lastly, the Script Activity should tell you the cost of each script’s process. Of course, this should be used as an overall analysis of your script’s performance, so once you’ve optimized each portion, this would be used to see how performant your script is. Typically anything under 2-3% is OK.

MYOriginsWorkshop · October 14, 2023, 1:11pm

New update!

Introducing distributed computing version! Have a look at the first post for more details!