Neural Network Library (Obsolete)

The second iteration of this library was released recently: it is superior to this library in every way except raw speed. Check it out!

After seeing a friend (@Auxintic) playing around with Roblox neural networks and learning how to make them, a sense of competition drove me to outdo him for absolutely no reason other than as an excuse to understand machine learning myself. For the last few weeks, I have researched neural networks from the ground up and have designed a neural network library whose sole purpose is to be as customizable and open source as possible.
To my knowledge, this is the first Roblox module that encompasses the features below while being easy to use in any application and open source.
Now, most of you are scratching your heads: What are they for, what are they?

Neural Network Use

Neural networks are usually used in places where a normal behaviour tree based AI is impractical or far too difficult to code. AI that adapts to the player mid-interaction, AI that predicts what a player will do, AI that finds hidden trends to identify something in a pile of data, self-driving cars, etc.
Many of us have seen such examples, like a bot that learned how to play perfect chess/checkers, bots that use images of our brainwaves to picture what we are thinking about (real thing by the way!), and of course, self-driving cars.
On Roblox, by far the most famous example is @ScriptOn’s AI car project:

Though the training will be tedious, this is all possible with neural networks. This library, for example, is capable of doing anything and everything ScriptOn used in his project, even the visualizer. But first, let’s answer the other question: what ARE neural networks?

Artificial Neural Networks: Explained

Neural networks are digital mechanisms that mimic how neurons work in our brains. The main and most common type of NNs (neural networks) are “feedthrough” networks that primarily work on the concept of “weights” and “biases”. But, what are those?


You can see that a “feedthrough” NN consists of an input layer, a multitude of hidden layers, and an output layer. Each layer has a varied number of nodes, our artificial little neurons. Each one of these nodes contains some data inside, namely the “weights” and “bias”.
The “weights” are numbers that are multiplied against a previous node’s output to increase or decrease its importance and impact. A weight is assigned to every incoming connection from the previous layer’s nodes as seen in the image above. A -0.012 weight would mean that the node in question has little impact, while a 5.7 would make it very important, for example. The number of weights in a node is equal to the number of nodes in the previous layer.
Unlike weights, all nodes have only 1 bias. The bias is a value that is used to offset the node’s output; like a filter. It increases or decreases the node’s sensitivity to the inputs. A bias of -2.4 would make the node fire only when enough positive inputs are given, while a 5.3 would make the node fire very easily, maybe even when no inputs are given!
In the image below, you can see how every connection has it’s associated weight while every node has its own bias. The number on the top left of every node is its output number. But, how do we take all of these weights, inputs, and biases to get the output?

For this, we use an activation function. This is technically any function that turns the sum of the inputs*weights and bias into a valid number, preferably a small one. Any function will do, but usually, a “ReLU” function is used (see “Activation Functions” in the documentation). The output of the node is then transferred to the nodes in the next layer and the process repeats until you reach the output layer. The outputs of the nodes in that layer will serve as the final outputs of the network.
To train networks, we can either sort, pick, and breed them with the genetic algorithm, or force them to remember using backpropagation. Basically speaking, this is when you backtrack the network and adjust its parameters to make sure that it will get closer to the given outputs using the given inputs.

See? It’s not that difficult. That’s because, with this library, you won’t have to delve into the extremely complex math behind all of this! You should now have a basic understanding of how feedthrough NNs work, enough to use the library at least.
Before we get to the documentation, let’s go over what you can actually do.

Features

If you don’t understand some of the following terms, please read the section above.

This library allows for (as of right now):

  • Creation of deep NNs (neural networks)
  • Creation of vanilla recurrent NNs
  • Forward propagation (running them)
  • Backward propagation (training them, info below)
  • Swappable activation functions (changing of functions that determine a node’s activation state)
  • Genetic sorting/breeding/mutating (genetic algorithm, info below)
  • Saving/loading networks
  • Complete data validation for every function to make sure you enter the correct data
  • NN character size estimation
  • Complete NN visualization with generated UI

Created NNs are nested arrays that are built as compact as possible; it will look like a mess of numbers and brackets with no obvious identifiers at all. This is to make the logic much easier and to conserve space. For example:

[2,[[[[257.954171596558182955050142482,-47.9329605569352565908047836274],[21.2074097087458284249805728905,38.6520296545417423317303473596],[-42.4222303464519470139748591464,6.85683244327564267450725310482]],[-136.463560000000114769136416726,21.9540700000000157388058141805,-61.9059000000000168029146152548]],[[[-100.745423146666624347744800616,-196.992169456771591740107396618,-198.703435243651796326957992278],[149.765747803782403479999629781,11.3767919729767026382205585833,-85.3957527910369975643334328197],[-142.571721831055413076683180407,-187.762893398775787545673665591,-485.631061596044446559972129762]],[-129.142590000000268446456175297,-154.52082000000004313733370509,-34.9673200000000790055310062598]],[[[-61.1807320003023065169145411346,473.094462928015843772300286219,612.418483683289650798542425036]],[-191.613499999999731926436652429]]],false,"LeakyReLU"]

All created networks can have any number (to a reasonable degree) of input nodes, output nodes, hidden layers, and hidden nodes. When creating a network, you can also set the default bias value for all nodes. By default, this number is set to 0.5 because some activation functions rely on it being above 0 at the start to avoid dead nodes.

Forward propagation simply takes the given NN alongside an array with the inputs and runs it. It is rather instantaneous and works flawlessly. The output of this is an array of the outputs according to the network.

Backward propagation is where it gets pretty tricky. Simply put, it takes an array of the inputs and an array of the desired outputs and uses something called an optimizer algorithm to nudge all of the NN’s parameters to the right direction determined by the given learning rate. The learning rate is just a number that scales down how fast the network trains; accuracy over speed. Now, the optimizer algorithm is the tricky bit. There are many algorithms out there but many are very difficult to let alone understand, so I chose to just go with the standard SGD (stochastic gradient descent) algorithm.
I have also managed to implement the Adam algorithm, but it is rather slow and quadruples network sizes due to the data needed.

The activation function is one of the most important motors for a NN. It is what decides how the nodes behave through all training. This functions can largely be anything, but some functions should be used in only certain applications. For the heck of it, I decided to implement as many as I possibly could.
Currently, the activation functions my library supports are:
Identity, Binary Step, Sigmoid (duh), Hyperbolic Tan, Arc Tan, Sin, Sin Cardinal, Inverse Hyperbolic Sin, Soft Plus, Bent Identity, ReLU, Soft ReLU, Leaky ReLU (x0.1), Google’s Swish, Elliot’s Sign, Gaussian, and SQ-RBF.
These functions, however, are only for the hidden layers. The output layer uses the standard sigmoid function as I have not yet implemented SoftMax.

To use the genetic algorithm, you have to first create a generation of networks. For this, you specify a folder, the number of networks, and their sizes. Once this is done, you have to score them somehow; check how well each network does in the given task. With the networks and the scores, we can run them through the genetic algorithm. This algorithm simply marks the best network, kills the worst 60%, breeds the top 40% with chances depending on their scores, applies a slight noise and mutation to all children networks, and presents the new generation to you. Designed to be easy to use, the only thing up to you is the scoring of the networks.

Recurrent networks, also called RNNs, are another type of NNs (alongside feedthrough) that allow for the network to remember each node’s activations from the last timestep (last time it was run). This allows for NNs that are better suited for circumstances where the previous decisions should have a weight to what it should do presently. This includes AI that experience the concept of time, like NPCs. Full-on implementations of RNNs are called LSTM, long short term memory networks. They are quite a bit more complex, so we’ll stick with simple vanilla RNNs for now.

With my visualizer, you can now see the networks you create. This visualizer has 2 versions, one that shows you the current state of the network’s parameters (red being negative and green being positive), while the other shows the network’s activations as it is being ran with the given inputs!

Now, for the fun part.

Documentation

*Note: whenever I write "NETWORK", it means that the value also accepts StringValues whose value is a string form of a network; generated by createGenNet(). "CONTAINER" means the same thing but for the folder that contains said StringValues, also by createGenNet().

module.createNet(inputs, hiddenL, hiddenN, outputs, activator, recurrent, defaultBias, warn)

local network = module.createNet(2, 2, 3, 1, "ReLU", false, 0.2)
--Value types:
--createNet(integer, integer, integer, integer, string [OPTIONAL], boolean [OPTIONAL], decimal [OPTIONAL], boolean [OPTIONAL])

This is the function responsible for creating networks. You have to provide the input count, hidden layer count, hidden node count (in each hidden layer), output count, activator function identifier, whether or not it should be recurrent, and the default bias. All numbers have to be integers except for the bias; it can be a decimal.
The activator chosen is linked to the network and should not ever be changed afterwards. For info on what can be put in this string, please refer to Activator Function at the end of the documentation below.
Whether it should be recurrent is a boolean value. This will, as described above, allow NNs to influence future actions with past ones. RNNs do not work well for problems where time is not relevant. However, one thing is extremely important. If you are going to backpropagate a recurrent network, you cannot use any activation function that has an infinite range. You have to use Sigmoid or Tanh. This is due to exploding gradients that make RNNs untrainable using backpropagation in the long run. If you want to use other activators, use the genetic algorithm, instead (probably preferable anyway to be honest).
As for the weight initialization, all weights are set using the ‘He’ initializing method.
The ‘warning’ value is just a boolean to whether or not you want recurrent-related warning to be hidden. Try not to get into a habit of setting it to true unless you have to.

module.forwardNet(network, inputs, giveCache)

local network = module.createNet(2, 2, 3, 1)
local out = module.forwardNet(network, {1,4}) [1]
--Value types:
--forwardNet(array/NETWORK, array, boolean [OPTIONAL])

This function is responsible for running the networks. It takes the network and inputs to generate an output. The output is always an array so you will have to index it as such.

Extremely important note about inputs: They absolutely have to be scaled beforehand to a range reasonable range like 0 to 1. You cannot insert values like 285 and expect it to work properly. Depending on the activation function used, the range can extend to -1 and +1. This has to be done by the user because the network has no idea what the maximum/minimum values of your inputs are and how it should scale the numbers. I will add a function that does this for you soon.

As for debugging, you are allowed to enter True for giveCache for the function to instead return an array whose first entry is the output array, while the second entry is an array containing all the activations for all the nodes in the network when ran. This is strictly for debugging and is not recommended for normal use.

module.backwardNet(network, rate, input, target)

local network = module.createNet(2, 2, 3, 1)
module.backwardNet(network, 0.1, {1,4}, {4,1})
--Value types:
--backwardNet(array/NETWORK, decimal, array, array)

This function is responsible for training the networks. It takes the given network, runs it, uses the SGD algorithm to determine where it needs to adjust the parameters according to the target values, and uses the given rate to take a step in the right direction.
It is recommended to keep the rate at a low number to improve accuracy over speed; somewhere between 0.01 and 0.1.

module.createGenNet(folder, networkCount, inputs, hiddenL, hiddenN, outputs, activator)

local networks = module.createGenNet(workspace.Folder, 10, 2, 2, 3, 1)
--Value types:
--createGenNet(object, integer, integer, integer, integer, integer)
--RETURNS: (array)

This function is responsible for creating the NN setup needed for the genetic algorithm according to the given settings. Nearly the same as createNet(), except it now requires the number of networks to be created, and the folder that will contain them. The major difference is that this function returns an array containing the references to each network object in the container given, (StringValues to be specific). This is because keeping the network in the array on their own is bad for memory, so it’s better to instead save all of the networks to StringValues and put those into the array.
The StringValues are accepted into any function that can take “NETWORK” and the folder is accepted by any function that can take “CONTAINER”.

module.runGenNet(networks, scores, giveBestScore)

local networks = module.createGenNet(workspace.Folder, 5, 2, 2, 3, 1)
module.runGenNet(networks, {22,54,97,33,13})
--Value types:
--runGenNet(array/CONTAINER, array)

This function is responsible for running the genetic algorithm. It takes the given networks, sorts them from best to worst according to the scores, kills the worst 60%, breeds the best 40% with chances according to how well they performed, adds a random ±0.01 noise to all the biases of the children, and mutates the parameters of 2 random nodes for all the networks. The best network is excluded from the mutation to ensure that the generation doesn’t degenerate (though this does hinder performance slightly).
For tweaking this algorithm, I had to do lots of testing and sampling. For the graph-lovers, here is the chart of my experiments!


“King Mode” means making the best network exempt from mutations for the given generation

module.saveNet(network) / module.loadNet(string)

local network = module.createNet(2, 2, 3, 1)
network = module.saveNet(network)
network = module.loadNet(network)
--Value types:
--saveNet(array/NETWORK)
--loadNet(string/NETWORK)

These functions are responsible for saving and loading networks with Roblox’s JSON encoding. This is by far the easiest and quickest method of saving the NNs in some form for later use.

module.hardCode(network)

local network = module.createNet(2, 2, 3, 1)
network = module.hardCode(network)
--Value types:
--hardCode(array/NETWORK)

This function gives a string identical to saveNet(), but with curly brackets instead of square brackets. This makes it easy to copy-paste the network into a script for later use.

module.getAproxLength(inputs, hiddenL, hiddenN, outputs, active, recurrent)

print(module.getAproxLength(2, 2, 3, 1, "ReLU", false)) 
-->> 823
--Value types:
--getAproxLength(integer, integer, integer, integer, string, boolean)

This function is responsible for estimating the size of a neural network in characters when saved in string form. This is very helpful when predicting sizes of potential networks and knowing when to stop adding more layers. This function will always estimate a little bit more than the real value because it is better to be safe than sorry.
Fun fact, did you know that a network with 1000 inputs, 100 layers, 100 nodes, and 100 outputs would have a length of around 35.5 million characters?!
(could be fun to add support for such behemoth networks… hmmm…)

module.getVisual(network)

local network = module.createNet(2, 2, 3, 1)
local visual = module.getVisual(network)
visual.Parent = game.StarterGui
--Value types:
--getVisual(array/NETWORK)

This function is responsible for creating the visual UI of a given network. This UI has an aspect ratio builtin so you can resize it however you want without it losing shape! Currently do not support recurrent networks and it is not suggested to try with networks that have 20+ nodes per layer. It will automatically used updateVisualState() to colour the visual.

module.updateVisualState(network, visual)

local network = module.createNet(2, 2, 3, 1)
local visual = module.getVisual(network)
visual.Parent = game.StarterGui
module.updateVisualState(network, visual)
--Value types:
--updateVisualState(array/NETWORK, ScreenGui)

This function is responsible for updating and colorizing the given visual using the given network. Will give a result similar to this:

module.updateVisualActive(network, visual, inputs, range)

local network = module.createNet(2, 2, 3, 1)
local visual = module.getVisual(network)
visual.Parent = game.StarterGui
module.updateVisualActive(network, visual, {0,1}, 2)
--Value types:
--updateVisualActive(array/NETWORK, ScreenGui, array, integer [OPTIONAL])

This function is responsible for running the network with the given inputs and greyscaling the visual to match the node/connection activities. Basically, it shows how the network runs live. The result will look something like this:
visualactive
Running it with slowly changing inputs results in an effect shown in the features section at the top of the page.

Activator Functions

This is the string responsible for setting the activation function. The accepted values are:

"Identity", "Binary", "Sigmoid", "Tanh", "ArcTan", "Sin", "Sinc", "ArSinh", "SoftPlus", "BentIdentity", "ReLU", "SoftReLU", "LeakyReLU", "Swish", "ElliotSign", "Gaussian", "SQ-RBF"

When choosing your activation function, many factors come into play. Some activators prefer certain situations and some don’t work for anything but those niche circumstances. Most of the time, a simple ReLU will do (it is the most common one). By default, when creating NNs, it is set to “LeakyReLU”.

Backpropagation Example Script

As an example, the following code uses backpropagation to create a simple network that calculates whether a set of X,Y coordinates is above or below an x³+2x² cubic function. To make this example work, all you need to do is place the library in workspace. Try it out!

local module=require(workspace.KNNLibrary) --Activating and getting the library

local network = module.createNet(2,2,3,1,"LeakyReLU") 	--Creating a network with 2 inputs, 2 hidden layers, 2 nodes per hidden layer, 1 output node,
														--and with ReLU as the activation function
local vis = module.getVisual(network) 	--Creating the visual
vis.Parent = game.StarterGui			--and parenting it to StarterGui so you can see it as the Server
										
local counter=100000	--We will train for half a million times
local tim=tick()
for i=1, counter do
	local xCoo,yCoo=math.random(-400,400)/100,math.random(-400,400)/100 --For this test, our precision is between -4.00 and +4.00
	local correctAnswer=1
	if 2*(xCoo)^2+xCoo^3<yCoo then 			--The function we're using for this test is x^3 + 2x^2. We want the network to tell us whether or not
		correctAnswer=0						--a set of X,Y coordinates is above or below the function's line
	end
	module.backwardNet(network,0.01,{xCoo,yCoo},{correctAnswer})
	if tick()-tim>=0.1 then  --To avoid timeouts, we add a wait() every half second the script runs. This is to make sure even potato processors will work
		tim=tick()
		wait()
		print(i/counter*(100).."% trained.")
		module.updateVisualState(network,vis)
	end
end
print(module.saveNet(network)) --We print out the network just for fun and demonstration
local wins=0
local tim=tick()
for i=-400,399 do				--Here, we cycle through every coordinate between -4.00,-4.00 and +4.00,+4.00
	for d=-400,399 do
		local xCoo,yCoo=(d)/100,(i)/100
		local answer=module.forwardNet(network,{xCoo,yCoo})[1] --Since the output is an array, we have to index the number we want, in this case, 1
		local out=1
		if 2*(xCoo)^2+xCoo^3<yCoo then
			out=0
		end
		if math.abs(answer-out)<=0.4 then 	--Though this bit is completely up to the user, I set a tolerance of +-0.4
			wins=wins+1						--If the output is 0.6 while the answer is 1, I mark it correct.
		end
		--[[If you want that really cool fading effect like what I demoed, enable this code. Note that it will never finish
			testing with it.
			 
		if d%5==0 then
			module.updateVisualActive(network,vis,{xCoo,yCoo},1)
			wait()
		end	
		
		]]
	end
	if tick()-tim>=0.5 then --Lets add a wait() here too so we don't lag up too much
		tim=tick()
		print("Testing... "..(i+400)/(8).."%")
		wait()
		module.updateVisualActive(network,vis,{math.random(-400,400)/100,math.random(-400,400)/100},1)
	end
end

print((wins/640000*100).."% correct!") 	--This tells us overall, how accurate our network is at calculating whether points are above or
										--below a x^3+2x^2 cubic function

Genetic Algorithm Example Script

For the genetic algorithm, here is an example script that does the same thing as the one above, but using generations. It is far easier, more universal, and has more potential, but pays the price by being very slow compared to backpropagation. Like the other example, just paste this script in workspace along with the library (as seen in the require()) and enjoy!

local module=require(workspace.KNNLibrary) --Activating and getting the library

local nets = module.createGenNet(workspace.Networks,20,2,2,3,1,"LeakyReLU") 	--Creating a generation of 20 networks with 2 inputs, 2 hidden layers, 2 nodes per hidden layer, 
																				--1 output node, and with ReLU as the activation function
local vis = module.getVisual(nets[1])
vis.Parent = game.StarterGui
																
for g=1, 1000000 do 			--We will run for 1 million generations, which is basically forever 
	print("Generation: "..g)	--Print the generation number
	local scores = {}			--Array storing the scores for every network
	local step = 8		--step value used for lowering the resolution of the scoring. Lower step means higher resolution but far slower
	local tim=tick()			
	for z=1,#nets do	--Looping through every network in the generation
		local network = module.loadNet(nets[z])	--We load up the StringValue of the network in question
		local wins=0
		for i=-400,399,step do				--Instead of using math.random(), we simply cycle through every possible coordinate from
			for d=-400,399,step do			-- -4.00,-4.00 to +4.00,+4.00 and score based off of that
				cac,cac2=d/100,i/100
				local answer=module.forwardNet(network,{cac,cac2})[1] 	--We run the network and get a response from it
				local out=1
				if 2*(cac)^2+cac^3<cac2 then			--The function we're using for this test is x^3 + 2x^2. We want the network to tell us whether or not
					out=0								--a set of X,Y coordinates is above or below the function's line
				end
				if math.abs(answer-out)<=0.4 then		--Though this bit is completely up to the user, I set a tolerance of +-0.4
					wins=wins+1							--If the output is 0.6 while the answer is 1, I mark it correct.
				end
				if tick()-tim>=0.1 then	--To avoid timeouts, we add a wait() every tenth of a second the script runs. This is to make sure even potato processors will work
					tim=tick()			--If you want a little more speed over the pretty effects, switch the 0.1 into a 0.5	
					wait()
					module.updateVisualActive(network,vis,{cac,cac2},1)
				end
			end
		end
		table.insert(scores,wins)	--With the score calculated, we insert it into the scores array
	end
	local best = module.runGenNet(nets,scores) 			--With all of the networks scored, we run the next generation
	module.updateVisualState(nets[best],vis)			--For every new generation, we show how the best network looks 
														--for a split second
	
	table.sort(scores)									--Purely for demo purposes, I sort the scores to find the best one
	--We calculate the best possible score using our step value and print out the success rate %
	print("Best network success rate: "..scores[#scores]/(800/step)^2*(100).."%")
end

Challenge: Try to reach 99.999% or 100% with this script while the step is set to 1! If you manage to pull through such a task or get surprisingly close, feel free to use hardCode() to paste the network in the comments below!

The Module

The most important thing is probably the module itself, right? Here’s the link to it:

I tried to comment as much as I could throughout the code to make it a bit more readable and easier to take apart. As I have only recently converted the library into module form, there may be a few bugs that I am not aware of; let me know right away of any issues or of any suggestions and feedback! I will post the main info of any updates in the comments below.
Make sure to post anything you make with it too! I’d be very curious to see them.

382 Likes

WOW. This is beyond comprehension, this module is so epic! I was going to go through the pain and struggle of making another module like this but now we have yours and its open-sourced! Thank you for this truly epic module, I will play around with this later on.

23 Likes

Finally giving me a reason to understand how NNs work. I will be looking at how this functions. Thanks for this resource!

8 Likes

This looks like an awesome resource! For people like me who learn better through looking at examples, I’m sure it would be greatly appreciated if you could put together a small example of this module being applied to a problem/task. This would demonstrate the capabilities of the module better and would make learning more about it easier, at least in my opinion.

17 Likes

Very interesting, I really love the concept and Idea!

-KEEP UP THE WORK!

7 Likes

Definitely will be learning how this works.

4 Likes

Just added a simple backpropagating example that I personally used when testing.
Will soon add a genetic version of the same problem!

3 Likes

One question I do wanna ask, is… any reason you put (ANN) in your title? What does it stand for / mean?

4 Likes

It stands for “Artificial Neural Network”. This is because there are many types of neural nets meant for different purposes. There are ANNs, CNNs (convolutional), and much more. This library, however, only focuses only on ANNs.

4 Likes

What is ANN’s purpose? Like could I make this neural network learn where a zombie / npc would get stuck in the map, and find out which direction to turn to get unstuck?

1 Like

In general, ANNs work for any problem that has a sane amount of inputs/outputs (for example, not images), and when it doesn’t need to be too complex.
A better way to summarize it: ANNs are the first network you try. If for whatever reason it doesn’t fit, you proceed to another type of network. ANNs will most likely work well for anything on Roblox.
Think of it like algebra; its the first thing you try. If it doesn’t work, you try Calculus and so on.

2 Likes

I don’t want to be mean, but I don’t understand what sane means. Did you mean same?

4 Likes

Sane meaning within reason.
On Roblox, the max inputs/outputs you can really work with is about 500-700. Past that, you’re on your own.
This is why ANNs don’t work well for images; you can’t have an input for every color of every pixel. Plus it would take centuries to train it properly. This is why Convolutional Neural Networks exist; for images recognition. However, their structure is very different and is a huge task on its own to create.

6 Likes

This is a wonderful module! Great work!

I used to do neural networks back in the day (made a library from scratch in JavaScript, but mostly just stole back propagation math lol). However, a few questions about the genetic aglorithm:

  • Have you considered adding a completely random network every generation? This may assist in nudging you out of local minimums (for others out there, a local minimum is when a network has perfected a task with the resources given, but still has room for improvement. Think of it like a minefield: you have to push it out of one deep hole to get it into the deepest one. Adding more randomness is one simple way to prevent this). Unfortunately though, near the end of training it may do more harm than good by eating resources.
  • Why not clone the best network and tweak the clone? This will both preserve the original while still allowing the genetic algorithm to attempt to perfect it.

Great work my friend, I’m glad you were able to pull this off!

Personal question: Do you understand the math behind it fully? I myself watched a lot of Coding Train and took the back propagation algorithm from him (gotta love Dan). I’m considering trying to learn the math this semester in CS, but we’ll see. Thanks!

5 Likes

In your example, what would 0.01 be?

3 Likes

Yes, I have considered adding a completely random network every generation. However, I decided not to do this because it would do very poorly in small size generations and is basically a more extreme form of the mutations. If you want more randomization, you can just adjust the mutation parameters manually (will probably add an option for that later).

Though I do save the best network as is for the next generation, any more attention to it would be bad for the network. If you stick too much to 1 network, you will hit a local minimum and won’t be able to get out of it. This is why instead of having the best network as one of the parents for every child, I do it randomly. This adds the measure of variance that makes genetic algorithms useful.

Though the SGD algorithm is rather tricky, watching enough 3brown1blue videos alongside some math explanations allowed me to understand the math behind it. You won’t be able to get all the information compatible with your brain from 1 single video; you have to watch a whole series of them until you start to understand this complex concept.
Thanks for the support, hope you enjoy it!

5 Likes

0.01 would be the learning rate. 0.01 is a safe bet because though it does make progress very slow, when working with math function problems, precision is extremely important. There is a low chance you’ll hit a point from which you can’t progress (local minimum, @DatOneRandomDude explains it well above). However, you can easily increase it to 0.1 and get more of less the same results.

2 Likes

So does that mean the higher that this is, then the faster it will learn?

4 Likes

Yes but this comes at the cost of lower precision; by setting a higher rate, you’re limiting how precise the network can get. This is why there are optimizer algorithms other than SGD that fix this problem, like “Momentum”.

2 Likes

Also, how would I properly put your example code in a while true do loop? Because if I run it I would have to hit stop when it is done, then hit run again.

2 Likes