@Cffex and @ImNotKevPlayz, let me give you guys some tips. If you guys are planning to do long term training, take advantage of BaseModel’s getModelParameters() and setModelParameters() function that is inherited by reinforcement learning neural networks.
getModelParameters() function returns the tables of matrices that holds all the weight parameters used by the neural network models.
You can encourage the agent to fight against enemies and not idle around by:
Punishing the agent with very small negative values when idling per period of time. This is because the idle time can be quite lengthy and using large negative values wouldn’t be too appropriate.
Reward the agent with very large values when it damages enemies. The reason for the large values because hurting the enemies are rare occurrence. This should encourage the agent to seek out enemies instead of idling around.
local RagdollModule = {}
local ReplicatedStorage = game:GetService("ReplicatedStorage")
local Players = game:GetService("Players")
local PhysicsService = game:GetService("PhysicsService")
local Events = ReplicatedStorage:WaitForChild("Events")
local RemoteEvent = Events.Ragdoll
local Table = {"Left Shoulder","Right Shoulder","Left Hip","Right Hip","Neck"}
function RagdollModule:Ragdoll(Character : Model, bool : boolean)
local Humanoid : Humanoid = Character.Humanoid
local Player = Players:GetPlayerFromCharacter(Character)
if Character.Humanoid.Health > 0 and (Character:FindFirstChild("Torso") and not Character.Torso.Neck.Enabled) and bool == false then
if Player then
RemoteEvent:FireClient(Player,false)
else
Character.Animate.Disabled = false
Humanoid:SetStateEnabled(Enum.HumanoidStateType.GettingUp,true)
Humanoid:ChangeState(Enum.HumanoidStateType.GettingUp)
end
for _,v : Instance in ipairs(Character:GetDescendants()) do
if v:IsA("Motor6D") then
v.Enabled = true
elseif v:IsA("BallSocketConstraint") then
v.Enabled = false
end
end
else
if Player then
RemoteEvent:FireClient(Player,true)
else
Humanoid:SetStateEnabled(Enum.HumanoidStateType.GettingUp,false)
Humanoid:ChangeState(Enum.HumanoidStateType.Ragdoll)
for _,v in ipairs(Humanoid.Animator:GetPlayingAnimationTracks())do
v:Stop(0)
end
Character.Animate.Disabled = true
end
for _,v : Instance in ipairs(Character:GetDescendants()) do
if v:IsA("Motor6D") and RagdollModule:Check(v.Name)then
v.Enabled = false
elseif v:IsA("BallSocketConstraint") then
v.Enabled = true
end
end
end
return
end
function RagdollModule:Check(Name : string)
if table.find(Table,Name) then
return true
else
return false
end
end
function RagdollModule:Joints(Character : Model)
local Humanoid : Humanoid = Character.Humanoid
Humanoid.BreakJointsOnDeath = false
Humanoid.RequiresNeck = false
for _,v : Instance in ipairs(Character:GetDescendants()) do
if v:IsA("Motor6D") and RagdollModule:Check(v.Name) then
local BallSocketConstraint = Instance.new("BallSocketConstraint")
local Attachment0 = Instance.new("Attachment")
local Attachment1 = Instance.new("Attachment")
Attachment0.CFrame = v.c0
Attachment1.CFrame = v.C1
Attachment0.Parent = v.Part0
Attachment1.Parent = v.Part1
BallSocketConstraint.Attachment0 = Attachment0
BallSocketConstraint.Attachment1 = Attachment1
BallSocketConstraint.LimitsEnabled = true
BallSocketConstraint.TwistLimitsEnabled = true
BallSocketConstraint.Enabled = false
BallSocketConstraint.Parent = v.Parent
elseif v:IsA("BasePart") then
v.CollisionGroup = "RagdollA"
if v.Name == "HumanoidRootPart" then
v.CollisionGroup = "RagdollB"
elseif v.Name == "Head"then
v.CanCollide = true
end
end
end
end
return RagdollModule
I’ve already tried that, but it doesn’t seem to necessarily solve the issue of sparse reward. It still collapsed to a single action. Maybe I did something wrong?
Oddly enough, the AI learns how to walk toward the enemy AI if the enemy AI is frozen. So the code seems to be working. But when the enemy AI is allowed to move, the learning agent can’t learn and the random reward graph reflects this. Maybe this is because the environment is stochastic in a sense. I’ve heard that DQN is not able to learn stochastic policies.
I should try your library, I just haven’t done it yet for some reason. It looks a bit more complicated than the other library I tried so I thought it might take some time to learn. If you can link some tutorials and code examples that would help greatly.
I’ll just put the link of the sample source code of the model and its environment here.
Also it isn’t really that hard to use to be honest. You can see the example code here:
local DataPredict = require(game.ServerScriptStorage:WaitForChild("DataPredict - Release Version 1.2"))
local QLearningNeuralNetwork = DataPredict.Models.QLearningNeuralNetwork.new(100, 0.1, 1000, 10, 0.45, 0.1, 0.945)
QLearningNeuralNetwork:addLayer(2, true, 'LeakyReLU') --// input
QLearningNeuralNetwork:addLayer(6, true, 'LeakyReLU') --// hidden 1
QLearningNeuralNetwork:addLayer(5, true, 'LeakyReLU') --// hidden 2
QLearningNeuralNetwork:addLayer(3, false, 'sigmoid') --// output
QLearningNeuralNetwork:setClassesList({"A", "B", "C"}) --// output's neuron class
QLearningNeuralNetwork:setPrintReinforcementOutput(false)
local prediction, probability = QLearningNeuralNetwork:reinforce({{1, 2, 3}}, -0.01, false)
print(prediction, probability)
You can have a look at the functions in the API documentation. Make sure you also read the NeuralNetwork one as well since all reinforcement learning neural networks inherits the properties from it.
Wait, looking at the code, you’re using sigmoid for the output layer of the DQN, right? Aren’t you not supposed to use any activation function for the output because DQN outputs the expected discounted cumulative reward for each action essentially treating RL as a regression problem?
Perhaps the data you are feeding it isn’t the data you want. A more consistent pattern would be the enemys offset relative to the bots current position. That way instead of getting a position some random position in space, you get a much more consistent stream of pattern data.
I’m already doing that. That isn’t necessarily the issue. The issue in my opinion is either a bug in the code that only appears when switching to a more complex environment (enemy AI allowed to move) or an issue with the reward structure. I’m thinking of recoding the entire project using this library to prevent bugs and simplify the code.
The bot definitely has to train to learn and in nature would make a lot of mistakes until it has enough data.
It’s important that you save your models data.
As your model increases in size it’s important to consider that for each save key they have a maximum size of 4mb. So parsing your datastructure in anyway you can would be best.
function SaveDataPairs (Datastorkey,ChatBotData,Querie)
-- Create an empty table to store the data pairs
local DataPairs = {}
-- Iterate over the children of the Queries folder
for _, child in ipairs (Querie:GetChildren ()) do
-- Check if the child is a string value object
if child:IsA ("StringValue") then
-- Add the child's name and value to the table as a key-value pair
DataPairs[child.Name] = child.Value
end
end
-- Save the table to the data store with a key of "ChatBotQueries"
local success, result = pcall (function ()
ChatBotData:SetAsync (Datastorkey, DataPairs)
end)
if success then
print ("Data pairs saved successfully")
else
warn ("Data pairs failed to save: " .. result)
end
end
datachange=true
local datachangehandle={}
function LoadDataPairs (Datastorkey,ChatBotData,Querie)
local Querie=Querie
local name
local value
-- Load the table from the data store with the same key that was used to save it
local success, result = pcall (function ()
return ChatBotData:GetAsync (Datastorkey)
end)
if success then
if result then
print ("Data pairs loaded successfully")
-- Iterate over the table
for name, value in pairs (result) do
-- Create a new string value object in the Queries folder with the name and value from each pair
if Querie:FindFirstChild(name)==nil then
local datachanged=true
local StringValue = Instance.new ("StringValue", Querie)
StringValue.Name = name
StringValue.Value = value
datachangehandle[Datastorkey] = function() if datachanged==true then datachanged=false return Datastorkey,ChatBotData,Querie else return false end end
--Learning Function
StringValue:GetPropertyChangedSignal("Value"):Connect(function()
datachanged=true
datachange=true
end)
else
Querie:FindFirstChild(name).Value=value
Querie:FindFirstChild(name):GetPropertyChangedSignal("Value"):Connect(function()
datachanged=true
datachange=true
end)
end
end
else
print ("No data pairs found for this key")
end
else
warn ("Data pairs failed to load: " .. result)
end
end
You can convert the models vector matrix table to a string using a table to string method
The bot simply doesn’t learn at all. I’ve trained it for 3000+ rounds before which is more than enough data and yet it failed to learn anything beyond taking the same action again and again. This points to an issue with the implementation, the hyperparameters, or the reward structure. Maybe it could be all of them. A DQN shouldn’t take that long to converge on a policy. If it starts taking the same action over and over again it means that something is wrong. Plus the loss never showed signs of increasing or fluctuating like a normal DQN model would.
Yeah, that’s what I thought as well. The loss looked like an exponential graph dipping down and never really spiked at all. But like I said earlier the strange thing is the loss looked healthy when I restarted the training and froze the enemy AI. It was even able to learn how to walk toward the enemy.