AIs not Improving Genetic Algorithm Bullet Shooter Simulation

ImNotKevPlayz · June 1, 2022, 3:33am

So, I’ve been trying to implement a successful Genetic Algorithm for my little bullet shooter game. The rules are simple. There are boxes, each containing 2 AIs that do 1v1s against each other. There is no health, but the fitness of each individual is determined by the number of points scored (number of bullets landed on enemy) and amount of damage taken (-3 fitness if they get hit by an enemy bullet).

The AIs fight each other for 10 seconds, then it performs some basic tournament selection, where it chooses 2 random individuals to fight each other with the victor included in a mating pool. It repeats this process 4 times. Then it picks random individuals in the mating pool and performs crossover & mutation for the next generation.

The issue is, I left the simulation running for quite a long time and the AIs didn’t seem to improve at all. I think the most likely cause would be the fitness function, but I don’t know a way to resolve this. I’m not sure exactly how to define a good fitness function to evaluate the performance of each individual without it having issues. Right now, the AI seems to value spinning in circles and mass shooting or doing random movements in general.

Video:
https://gyazo.com/d504f4f54ca85928a0abf54aa5dff3a3

nicemike40 · June 1, 2022, 3:40am

Is there any input to the algorithm? Do agents have vision or a concept of where things are?

What do your weights, which you’re training, represent in this model?

Have you tried other values for your fitness function?

It’s hard to help much more without code to see

ImNotKevPlayz · June 1, 2022, 3:51am

Hi, thanks for the reply. The agents have 15 inputs. The first 12 are little lines that detect things such as bullets or the enemy. Here is an image to demonstrate what I mean:
Screenshot 2022-05-31 234303
The last 3 inputs are:
CanShoot - Tells the AI whether it can shoot or not
DifInX - The difference in “X” or really just a standardized absolute value difference in the Z axis.
DifInY - Difference in “Y” or again a standardized absolute value difference in the “X” axis.

What do your weights, which you’re training, represent in this model?

I’m not sure exactly what you mean by this question. Each agent uses a neural network, so the weights are essentially just neurons part of the neural network. Some clarification would help.

Also, I have tried other values for my fitness function. I tried encouraging the AIs to shoot by giving them a small reward when they shot a bullet, but that caused spam-like shooting behavior. My original fitness function was purely based off of points scored, or AKA bullets landed on the enemy. My latest idea was to penalize the agents for each movement to hopefully encourage them to take more efficient movements and discourage spinning in circles and spam shooting bullets. Also, sorry for not providing code earlier. This code might be quite a bit but here it is:

script.Parent.AlignPosition.Position = Vector3.new(script.Parent.Position.X,0.5,script.Parent.Position.Z)

local SS = game:GetService('ServerStorage')
local LastShot = nil
local MovementSpeed = 2.5
local TurnSpeed = 5

local function MoveForward(Amount)
	script.Parent.AlignPosition.Position += Vector3.new(script.Parent.CFrame.LookVector.X,0,script.Parent.CFrame.LookVector.Z) * Amount
end

local function MoveBackward(Amount)
	script.Parent.AlignPosition.Position -= script.Parent.CFrame.LookVector * Amount
end

local function TurnLeft(Amount)
	script.Parent.AlignOrientation.CFrame = CFrame.Angles(0,math.rad(script.Parent.Orientation.Y + Amount),math.rad(-90))
end

local function TurnRight(Amount)
	script.Parent.AlignOrientation.CFrame = CFrame.Angles(0,math.rad(script.Parent.Orientation.Y - Amount),math.rad(-90))
end

local function Shoot()
	LastShot = os.clock()
	local Clone = SS.RedBullet:Clone()
	Clone.CFrame = script.Parent.CFrame + script.Parent.CFrame.LookVector * 4
	Clone.Parent = workspace
	
	coroutine.wrap(function()
		--script.Parent.BillboardGui.Score.Text = tostring(script.Parent.BillboardGui.Score.Text + 0.1)
		
		repeat
			if Clone.Parent == nil or Clone == nil then break end
			local TouchingParts = workspace:GetPartsInPart(Clone,OverlapParams.new())

			for i,v in pairs(TouchingParts) do
				if v.Parent.Name == 'Blue' then
					Clone:Destroy()
					
					script.Parent.Parent.Parent.Blue.Head.BillboardGui.Score.Text = tostring(script.Parent.Parent.Parent.Blue.Head.BillboardGui.Score.Text - 3)
					script.Parent.BillboardGui.Score.Text = tostring(script.Parent.BillboardGui.Score.Text + 3)
				end
			end
			
			task.wait()
		until Clone.Parent == nil or Clone == nil
	end)()
end

local function UpdateInputs()
	local RetTable = {}
	
	for i = 1,#script.Parent.Parent.Inputs:GetChildren() do
		local SelectedPart = script.Parent.Parent.Inputs[tostring(i)]
		local TouchingParts = workspace:GetPartsInPart(SelectedPart,OverlapParams.new())
		
		for _,part in pairs(TouchingParts) do
			if part.Name == 'BlueBullet' then
				table.insert(RetTable,-1)
				break
			elseif part.Parent.Name == 'Blue' then
				table.insert(RetTable,-3)
				break
			else
				table.insert(RetTable,0)
				break
			end
		end
	end
	
	if LastShot == nil or os.clock() - LastShot >= 0.25 then
		table.insert(RetTable,1)
	else
		table.insert(RetTable,0)
	end
	
	local DifInX = math.abs(script.Parent.Position.Z - script.Parent.Parent.Parent.Blue.Head.Position.Z)
	local DifInY = math.abs(script.Parent.Position.X - script.Parent.Parent.Parent.Blue.Head.Position.X)
	
	DifInX = (DifInX - 30) / 30
	DifInY = (DifInY - 30) / 30
	
	table.insert(RetTable,DifInX)
	table.insert(RetTable,DifInY)
	
	return RetTable
end

local function Sigmoid(x)
	return 1 / (1 + math.exp(-x))
end

local function Tanh(x)
	return (math.exp(x) - math.exp(-x)) / (math.exp(x) + math.exp(-x))
end

local function Relu(x)
	if x > 0 then
		return x
	else
		return 0
	end
end

local NumberofInputs = 15
local NumberofHiddenNodes = NumberofInputs + 2
local NumberofOutputNodes = 3
local Rand = Random.new()
local RandomStart = true
local Individuals = {}
local Fitnesses = {}
local Weights = {}
local Stop = script.Parent.Parent.Parent.Stop
local Winner = script.Parent.Parent.Parent.Winner
-- REMEMBER TO INCLUDE BIASES, NOT JUST WEIGHTS

local function RandomizeWeightsBiases()
	Weights = {
		['In-Hidden1'] = {},
		['Hid1-Hidden2'] = {},
		['Hid2-Output'] = {}
	}
	
	for i = 1,NumberofInputs * NumberofHiddenNodes + NumberofHiddenNodes do
		table.insert(Weights['In-Hidden1'],Rand:NextNumber(-3,3))
	end

	for i = 1,NumberofHiddenNodes * NumberofHiddenNodes + NumberofHiddenNodes do
		table.insert(Weights['Hid1-Hidden2'],Rand:NextNumber(-3,3))
	end

	for i = 1,NumberofHiddenNodes * NumberofOutputNodes + NumberofOutputNodes do
		table.insert(Weights['Hid2-Output'],Rand:NextNumber(-3,3))
	end
end

script.Parent.Parent.ChangeWeightsAndBiases.Event:Connect(function(WeightsBiases)
	if not WeightsBiases then
		RandomizeWeightsBiases()
	else
		Weights = WeightsBiases
	end
end)

script.Parent.Parent.RequestWeightsAndBiases.OnInvoke = function()
	return Weights
end

script.Parent.Parent.Parent.StartSimulation.Event:Connect(function(WeightsBiases)
	repeat
		local Inputs = UpdateInputs()
		local InHid1 = Weights['In-Hidden1']
		local Counter = 1
		local HidLay1Vals = {}
		local WS = 0
		
		for i,v in pairs(InHid1) do
			if Counter < NumberofInputs + 1 then
				WS += Inputs[Counter] * v
			else
				WS += v
				Counter = 0
				WS = Relu(WS)
				table.insert(HidLay1Vals,WS)
				WS = 0
			end
			
			Counter += 1
		end
		
		local Hid1Hid2 = Weights['Hid1-Hidden2']
		Counter = 1
		local HidLay2Vals = {}
		WS = 0
		
		for i,v in pairs(Hid1Hid2) do
			if Counter < NumberofHiddenNodes + 1 then
				WS += HidLay1Vals[Counter] * v
			else
				WS += v
				Counter = 0
				WS = Relu(WS)
				table.insert(HidLay2Vals,WS)
				WS = 0
			end
			
			Counter += 1
		end
		
		local Hid2Output = Weights['Hid2-Output']
		Counter = 1
		local RawOutputVals = {}
		WS = 0
		
		for i,v in pairs(Hid2Output) do
			if Counter < NumberofHiddenNodes + 1 then
				WS += HidLay2Vals[Counter] * v
			else
				WS += v
				Counter = 0
				table.insert(RawOutputVals,WS)
				WS = 0
			end
			
			Counter += 1
		end
		
		for i,v in pairs(RawOutputVals) do
			RawOutputVals[i] = v / 12
		end
		
		local RefinedOutputVals = {}
		local FinalOutputs = {}
		
		table.insert(RefinedOutputVals,Tanh(RawOutputVals[1]))
		table.insert(RefinedOutputVals,Tanh(RawOutputVals[2]))
		table.insert(RefinedOutputVals,Sigmoid(RawOutputVals[3]))
		
		table.insert(FinalOutputs,math.round(RefinedOutputVals[1]))
		table.insert(FinalOutputs,math.round(RefinedOutputVals[2]))
		table.insert(FinalOutputs,math.round(RefinedOutputVals[3]))
		
		MoveForward(FinalOutputs[1] * MovementSpeed)
		TurnRight(FinalOutputs[2] * TurnSpeed)
		
		if FinalOutputs[3] == 1 then
			if LastShot == nil or os.clock() - LastShot >= 0.25 then
				Shoot()
			end
		end
		
		task.wait()
	until Stop.Value == true
end)

(Code for Red AI, the code for the Blue AI is essentially the same, with just names switched)

kaibrown4 · June 16, 2022, 2:10pm

To my understanding of the code, it seems that you are rounding the output values. With my experience of neural networks, rounding is bad, because it doesn’t allow the neural network to get the precise values needed for improving. Also, you can try changing to different activation functions (like leakyrelu, sigmoid, etc…), some activation functions perform better than others.

ImNotKevPlayz · June 17, 2022, 1:58am

A genetic algorithm is DIFFERENT from backpropagation. Both aim for the same goal but use different methods. They’re also used in different places. A genetic algorithm is used in cases where you cannot determine the correct answer to a problem, but you can evaluate the fitness of the AI. This is also known as reinforcement learning. A genetic algorithm uses evolution to improve the neural network instead of calculating the partial derivative of the cost function with respect to each weight and bias. Also, I don’t think the problem lies within the neural network structure itself. Rather, I think this random-like behavior could be caused by the lack of useful inputs and bad fitness evaluation.

Honestly, the fitness algorithm is such a pain in the head for me. I might just resort to manual selection. For a bullet shooter simulation like this, I don’t think there is a good way to determine the score of each individual without running into problems. Right now, the majority of AIs get a score of 0, or a super inflated score due to them going behind the enemy and mass shooting by pure luck.

WreckDough · June 24, 2022, 2:41am

Maybe you could try editing your fitness function to encourage shooting near the enemy. For example, giving the agents a score depending on how close their bullet was from hitting the enemy. This would encourage the agents to shoot at the enemy, even if they miss, instead of trying to get lucky enough to hit the enemy. Because, right now, it seems your main issue is that the genetic algorithm is essentially guessing network parameters (weights and biases) to maximize fitness, instead of continuously improving.

Also, overfitting is a potential issue, what is your model size? (# of layers and size of each)

ImNotKevPlayz · June 26, 2022, 3:40am

I actually did try doing that but I don’t think I implemented it very well. I did a thing where once the bullet was deleted, it would track the distance from the bullet to the enemy and use that to add fitness accordingly. There were a lot of problems however. One of the main one being that the accuracy of the bullet was well, not accurately measured. The moment the bullet touched the wall, the enemy could’ve already moved away then moved back to the same position it was in and it would’ve thought it was an accurate hit. I don’t really know a good way to measure the accuracy of the bullets but if I do it would be a massive help for the fitness function.

Also, to answer your second question there are currently 2 layers consisting of 17 hidden nodes. I’m not sure if this is too much though. There are 15 inputs so I want to keep it at 2 layers, but I might consider cutting down on the number of hidden nodes per layer.

WreckDough · June 26, 2022, 4:08am

Maybe try doing this:
For every bullet, find the closest the bullet was to the opponent throughout the bullet’s entire lifetime (track the distance to the opponent every frame). Then, when the round is over, take the average of the closest all bullets were to the target, and incorporate that average in to the fitness.

It doesn’t seem like overfitting could be an issue here, but the fitness function alone might not be the issue. Are you sure you don’t have another bug? e.g. the fitness is reversed, the genetic algorithm finds the worst possible agent. That’s just an example though.

ImNotKevPlayz · June 26, 2022, 4:17am

That sounds like a good suggestion, and I will definitely try implementing it soon. Also, I don’t really think it’s an issue with anything else besides the fitness algorithm, but that could not be the case considering the fact that the fitness function is the first obstacle I have to overcome. There could be future problems with the genetic algorithm that only become apparent after I fix the fitness function issue. I don’t really think that will happen though, and I’m pretty sure the selection algorithm I’m using (tournament selection) should be a good choice for my genetic algorithm.

ImNotKevPlayz · June 26, 2022, 4:46am

Seems like my code is working so far, but I’ve ran into another issue. The fitness keeps saying “nan” as it is trying to do 0/0. For some reason, even though I’m doing a NaN detector (if score ~= score) some of them are still showing up as a NaN score. Also, the problem of AIs going behind the enemy by pure chance still appears. I have combatted this though with the use of tournament selection and averaging the scores obtained from the tournaments. This should favor participants that are more well-balanced.

Edit: I’m using this code currently:

local fitness = tostring(BulletAccuracySum / ShotsFired + DamageTaken)
		
		if fitness ~= fitness then
			fitness = 0
		else
			print(fitness)
			print(BulletAccuracySum / ShotsFired)
			print(DamageTaken)
		end

And it seems like it falls into the “else” section sometimes. When it does, it has this as the output

  00:52:45.460  nan  -  Server - BlueScript:273
  00:52:45.460  nan  -  Server - BlueScript:274
  00:52:45.460  0

This is really confusing. What could be happening here? I will try setting the fitness variable to just BulletAccuracySum / ShotsFired then doing fitness ~= fitness.

ImNotKevPlayz · June 26, 2022, 4:58am

Nevermind, I found the issue. I was doing tostring on the fitness value.

Edit: I found another issue. I should be rewarding the AI more if the bullet is more accurate (closer to the enemy) instead of just setting the fitness as the closest distance the bullet was to the enemy. I plan on doing this by doing 1 / (closest dist. average) + damagetaken. (damagetaken is a negative value)

Edit 2: I found another issue again. This is really frustrating. Doing 1 / 0 returns infinity. Now I need a way to circumvent this.

Edit 3: I fixed it with the help of vlekje513#0001 on the roblox script assistance discord. Thanks, vlekje513#0001 !

ImNotKevPlayz · June 26, 2022, 3:00pm

After about 7 generations, the AIs seem to have learned a strategy that doesn’t seem to be what I’m aiming for. The AI now for some reason favors staying in place and shooting while spinning a lot. I’m not sure how exactly to describe this behavior. Here’s a video:

cool effects - Roblox Studio (gyazo.com)

Lord_BradyRocks · July 2, 2022, 5:41pm

Hello,

The rbxl works, I will check it out and let you know what I find.

Thanks