Don’t know if this is the right place to put this under but basically, I need help designing a reward function that will encourage 2 sword-fighting AIs to walk towards each other. This is for my deep reinforcement learning sword-fighting AI project. This problem sounds simple on paper, but it becomes more complex when you realize that the enemy is a moving target as well. So, I am a bit lost on how to design a reward function that will reward say agent A for walking towards agent B. A simple distance-based function doesn’t seem to work because agent A could stand still and do nothing while agent B walks towards it rewarding agent A for doing nothing. You also have the question of if agent A moved x amount of steps towards agent B but in that timeframe agent B moved away from agent A should agent A still be rewarded?
For more context, I have a gif showing the environment the AIs are trained in: https://gyazo.com/0e6ba8cd39039c60bb9450b05c92323e
There’s also a weird glitch where the AIs somehow fling themselves super far at the start of the round. Not sure how this happens. Might be because of AlignOrientation.
Reward function right now:
if AgentMoved then
local Vel = Self.HumanoidRootPart.AssemblyLinearVelocity
if Vel.Magnitude > Threshold then
LastIdleTime = os.clock()
if NewDistance < OldDistance then
Reward = 1 / NewDistance
elseif NewDistance > OldDistance then
Reward = (DefaultParams.MovingBackPunishmentFactor * (NewDistance^2 / 1572.77186483) - 0.1)
end
--Reward = -0.00001 * NewDistance
else
if LastIdleTime == nil or os.clock() - LastIdleTime >= 2 then
Reward = -0.1
end
end
end
local CurRotDif = (Target - CurRot)^2
if CurRotDif > OldRotDif then
Reward += -0.000001 * (CurRotDif)
else
Reward += 0.5 / (1 + CurRotDif)
end
if NewHealth ~= OldHealth or NewEnemyHealth ~= OldEnemyHealth then
local HealthLostReward = -(1 - (NewHealth / Self.Humanoid.MaxHealth)^2)
local DamagedReward = 1 - (NewEnemyHealth / Enemy.Humanoid.MaxHealth)^2
Reward = HealthLostReward + DamagedReward
end
local Terminate = false
if Self.Humanoid.Health <= 0 and Enemy.Humanoid.Health > 0 then
Reward = -1
Terminate = true
elseif Self.Humanoid.Health > 0 and Enemy.Humanoid.Health <= 0 then
Reward = 1
Terminate = true
end
State function:
local function CalculateInputs()
local Inputs = {}
assert(Self ~= Enemy,'Cannot set self to enemy!')
local SelfHRP = Self.HumanoidRootPart
local EnemyHRP = Enemy.HumanoidRootPart
table.insert(Inputs,(SelfHRP.Position.X - EnemyHRP.Position.X) / 29.15)
table.insert(Inputs,(SelfHRP.Position.Y - EnemyHRP.Position.Y) / 7.5)
table.insert(Inputs,(SelfHRP.Position.Z - EnemyHRP.Position.Z) / 29.15)
local RotY = math.atan2(SelfHRP.CFrame.LookVector.X,SelfHRP.CFrame.LookVector.Z)
RotY = math.deg(RotY) % 360
local EnemyRotY = math.atan2(EnemyHRP.CFrame.LookVector.X,EnemyHRP.CFrame.LookVector.Z)
EnemyRotY = math.deg(EnemyRotY) % 360
local StandardizedDif = (RotY - EnemyRotY) / 208.15
table.insert(Inputs,StandardizedDif)
local SelfAttacking = 0
if script.Parent.ClassicSword.Handle.SwordSlash.Playing or script.Parent.ClassicSword.Handle.SwordLunge.Playing then
SelfAttacking = 1
end
table.insert(Inputs,SelfAttacking)
local EnemyAttacking = 0
local FoundSword = Enemy:FindFirstChildWhichIsA('Tool')
if FoundSword then
if string.match(string.lower(FoundSword.Name),'sword') ~= nil then
if FoundSword.Handle.SwordSlash.Playing or FoundSword.Handle.SwordLunge.Playing then
EnemyAttacking = 1
end
end
end
table.insert(Inputs,EnemyAttacking)
local NormalizedEnemyXVel = EnemyHRP.AssemblyLinearVelocity.X
NormalizedEnemyXVel /= 20.5
table.insert(Inputs,NormalizedEnemyXVel)
local NormalizedEnemyYVel = EnemyHRP.AssemblyLinearVelocity.Y
NormalizedEnemyYVel /= 53.15
table.insert(Inputs,NormalizedEnemyYVel)
local NormalizedEnemyZVel = EnemyHRP.AssemblyLinearVelocity.Z
NormalizedEnemyZVel /= 22.1
table.insert(Inputs,NormalizedEnemyZVel)
local NormalizedEnemyRotYVelocity = EnemyHRP.AssemblyAngularVelocity.Y
NormalizedEnemyRotYVelocity /= 84.75482177734375
table.insert(Inputs,NormalizedEnemyRotYVelocity)
local NormalizedHP = Self.Humanoid.Health / Self.Humanoid.MaxHealth
local NormalizedEnemyHP = Enemy.Humanoid.Health / Enemy.Humanoid.MaxHealth
table.insert(Inputs,NormalizedHP)
table.insert(Inputs,NormalizedEnemyHP)
local IsJumping = 0
if Self.Humanoid.Jump then
IsJumping = 1
end
local EnemyJumping = 0
if Enemy.Humanoid.Jump then
EnemyJumping = 1
end
table.insert(Inputs,IsJumping)
table.insert(Inputs,EnemyJumping)
return Inputs
end
I have also looked at this implementation: Roblox Sword Fighting AI (Q-Learning) - YouTube and I am considering using the same type of reward structure they used, which is essentially a step function that takes in the distance of the enemy as the input:
Reward = -1
if Magnitude > 30 then
Reward -= 1000
elseif Magnitude > 15 then
Reward -= (Magnitude - 15)^2
end
My primary question for this method is though, why does this work? Doesn’t this suffer from the same issue I specified above? (Agent gets rewarded for doing nothing and simply waiting for the enemy to come towards it which does not encourage it to learn how to move towards the enemy.) After all, it intuitively seems easier for the agent to learn how to stand still rather than walk towards the enemy.