[Release 1.15] DataPredict - Machine And Deep Learning Library (Learning AIs, Generative AIs, and more!)

I guess trying the self-play method is the safest option here, I had the neural network fought the hard-coded swordfight bot, according to what I know, it will take an extreemly long time for it to even progress one step.

Alright, let me know if you have any success because I’ve been trying to do this for a year already :sob:.

AlphaZero chess developed by Google, was trained for 9? hours using 64 tpus. That 9 hours is literally equivalent of hundreds of hours of training using a general purpose laptop, like mine.

Edit: realized that comparing Chess to Swordfighting is not realistic since Chess is much more complicated than Swordfighting

You know what the crazy thing is? I’m pretty sure that roblox studio has a cap on CPU usage and @PhoenixSigns was able to make a sword-fighting AI that learns in just a few hours using self-play and what I suspect to be some type of DQN algorithm. I’ve been trying to contact him but it seems like it’s impossible. You can find this here: PhoenixSigns on X: “AI learns to sword fight through deep reinforcement learning! It improves by fighting against itself and past versions. Here is a short video of its training process and results. #Roblox #RobloxDev https://t.co/e5FbovAoEr” / X (twitter.com)

The main problem with my sword-fighting AIs is the reward calculation. How do you avoid sparse reward while encouraging AIs to walk towards each other? It sounds like a simple problem, just add a magnitude-based reward right? Well, what if the enemy gets closer to the agent and not the other way around? This complicates things to a whole other level because the agent can be rewarded for simply doing nothing because the enemy got closer to it. I’m not exactly sure how to solve this sadly. Maybe inverse reinforcement learning? A neural network trained to mimic human players? That would limit its exploration but it could work. Where would you gather the data from though?

I guess the solution might be to not add intuitive rewards, though that might slow things down by a bunch, the neural network will have a deeper understanding of swordfighting.

I would reward the neural network based on how much health it has, times the negative of how much health it’s opponent has.


Here is my attempt using genetic algorithm, which failed completely.

(I trained for 12 hours straight, encode the neural network into a string, woke up tommorow, decoded the string into the neural network, ran it, and got this result) (Ignore the error message, that was the hardcoded-bot AI)

Probably more successful than mine. Mine experienced single-action collapse where it chose the same action regardless of the state and even the idle punishment I setup didn’t stop that apparently. I didn’t implement DQN in the conventional way though so maybe that’s what’s wrong. The conventional way is really laggy though because it requires training after each action taken after the experience replay buffer has reached a certain amount of experiences.

@Cffex and @ImNotKevPlayz, let me give you guys some tips. If you guys are planning to do long term training, take advantage of BaseModel’s getModelParameters() and setModelParameters() function that is inherited by reinforcement learning neural networks.

getModelParameters() function returns the tables of matrices that holds all the weight parameters used by the neural network models.

Then you can save them in DataStore or something.

2 Likes

I will attempt this again using DataPredict, I hadn’t gotten the chance to do this since everyday was busy for me, however not today.

2 Likes

Hello, I was wondering if it is possible to convert the matrices to JSON string using HttpService?

Yea. The thing is those matrices are just regular tables of tables containing values. Nothing too special.

It was designed that way to avoid the hassle of converting stuff.

1 Like

You can encourage the agent to fight against enemies and not idle around by:

  • Punishing the agent with very small negative values when idling per period of time. This is because the idle time can be quite lengthy and using large negative values wouldn’t be too appropriate.

  • Reward the agent with very large values when it damages enemies. The reason for the large values because hurting the enemies are rare occurrence. This should encourage the agent to seek out enemies instead of idling around.

I can give you one

local RagdollModule = {}

local ReplicatedStorage = game:GetService("ReplicatedStorage")
local Players = game:GetService("Players")
local PhysicsService = game:GetService("PhysicsService")

local Events = ReplicatedStorage:WaitForChild("Events")

local RemoteEvent = Events.Ragdoll

local Table = {"Left Shoulder","Right Shoulder","Left Hip","Right Hip","Neck"}

function RagdollModule:Ragdoll(Character : Model, bool : boolean)
	local Humanoid : Humanoid = Character.Humanoid
	local Player = Players:GetPlayerFromCharacter(Character)
	if Character.Humanoid.Health > 0 and (Character:FindFirstChild("Torso") and not Character.Torso.Neck.Enabled) and bool == false then
		if Player then
			RemoteEvent:FireClient(Player,false)
		else
			Character.Animate.Disabled = false
			Humanoid:SetStateEnabled(Enum.HumanoidStateType.GettingUp,true)
			Humanoid:ChangeState(Enum.HumanoidStateType.GettingUp)
		end
		for _,v : Instance in ipairs(Character:GetDescendants()) do
			if v:IsA("Motor6D") then
				v.Enabled = true
			elseif v:IsA("BallSocketConstraint") then
				v.Enabled = false
			end
		end
	else
		if Player then
			RemoteEvent:FireClient(Player,true)
		else
			Humanoid:SetStateEnabled(Enum.HumanoidStateType.GettingUp,false)
			Humanoid:ChangeState(Enum.HumanoidStateType.Ragdoll)
			for _,v in ipairs(Humanoid.Animator:GetPlayingAnimationTracks())do 
				v:Stop(0)
			end
			Character.Animate.Disabled = true
		end
	
		for _,v : Instance in ipairs(Character:GetDescendants()) do
			if v:IsA("Motor6D") and RagdollModule:Check(v.Name)then
				v.Enabled = false
			elseif v:IsA("BallSocketConstraint") then
				v.Enabled = true
			end
		end
	end
	return
end

function RagdollModule:Check(Name : string)
	if table.find(Table,Name) then
		return true
	else
		return false
	end
end

function RagdollModule:Joints(Character : Model)
	local Humanoid : Humanoid = Character.Humanoid
	Humanoid.BreakJointsOnDeath = false
	Humanoid.RequiresNeck = false
	
	for _,v : Instance in ipairs(Character:GetDescendants()) do
		if v:IsA("Motor6D") and RagdollModule:Check(v.Name) then
			local BallSocketConstraint = Instance.new("BallSocketConstraint")
			local Attachment0 = Instance.new("Attachment")
			local Attachment1 = Instance.new("Attachment")
			
			Attachment0.CFrame = v.c0
			Attachment1.CFrame = v.C1
			Attachment0.Parent = v.Part0
			Attachment1.Parent = v.Part1
			
			BallSocketConstraint.Attachment0 = Attachment0
			BallSocketConstraint.Attachment1 = Attachment1
			BallSocketConstraint.LimitsEnabled = true
			BallSocketConstraint.TwistLimitsEnabled = true
			BallSocketConstraint.Enabled = false
			BallSocketConstraint.Parent = v.Parent
		elseif v:IsA("BasePart") then
			v.CollisionGroup = "RagdollA"
			if v.Name == "HumanoidRootPart" then
				v.CollisionGroup = "RagdollB"
			elseif v.Name == "Head"then
				v.CanCollide = true
			end
		end
	end
end

return RagdollModule

This an r6 ragdoll I beautified for you to use easily.
Source code : Ragdoll Script R15 and R6 [PC, Mobile, Xbox] - Resources / Community Resources - Developer Forum | Roblox

1 Like

I have added Expected SARSA Neural Network to existing Release 1.2 / Beta 1.15.0.
Too lazy to add to new versions.

1 Like

I’ve already tried that, but it doesn’t seem to necessarily solve the issue of sparse reward. It still collapsed to a single action. Maybe I did something wrong?

Hmm… Probably calculation issues with your code?

Maybe try the library like mines to check if it is the sparse reward issue as opposed to calculation issues in your code.

Oddly enough, the AI learns how to walk toward the enemy AI if the enemy AI is frozen. So the code seems to be working. But when the enemy AI is allowed to move, the learning agent can’t learn and the random reward graph reflects this. Maybe this is because the environment is stochastic in a sense. I’ve heard that DQN is not able to learn stochastic policies.

I should try your library, I just haven’t done it yet for some reason. It looks a bit more complicated than the other library I tried so I thought it might take some time to learn. If you can link some tutorials and code examples that would help greatly.

I’ll just put the link of the sample source code of the model and its environment here.

Also it isn’t really that hard to use to be honest. You can see the example code here:

local DataPredict = require(game.ServerScriptStorage:WaitForChild("DataPredict  - Release Version 1.2"))

local QLearningNeuralNetwork = DataPredict.Models.QLearningNeuralNetwork.new(100, 0.1, 1000, 10, 0.45, 0.1, 0.945)

QLearningNeuralNetwork:addLayer(2, true, 'LeakyReLU') --// input

QLearningNeuralNetwork:addLayer(6, true, 'LeakyReLU') --// hidden 1

QLearningNeuralNetwork:addLayer(5, true, 'LeakyReLU') --// hidden 2

QLearningNeuralNetwork:addLayer(3, false, 'sigmoid') --// output

QLearningNeuralNetwork:setClassesList({"A", "B", "C"}) --// output's neuron class

QLearningNeuralNetwork:setPrintReinforcementOutput(false)

local prediction, probability = QLearningNeuralNetwork:reinforce({{1, 2, 3}}, -0.01, false)

print(prediction, probability)

You can have a look at the functions in the API documentation. Make sure you also read the NeuralNetwork one as well since all reinforcement learning neural networks inherits the properties from it.

1 Like

Wait, looking at the code, you’re using sigmoid for the output layer of the DQN, right? Aren’t you not supposed to use any activation function for the output because DQN outputs the expected discounted cumulative reward for each action essentially treating RL as a regression problem?

1 Like