I’ve been playing with neural networks for the past couple days. I got the structure of the network working and tweaked it into a color-matching thing, where it starts with 3 initial colors (red green blue every time) and has a set of goal colors, and a bunch of weights are tweaked so the produced output equals the goal output. The optimal setup appears to be only rgb input and no hidden layers, meaning a direct link between the individual rgb components and each goal color, which makes sense. Adding intermediate layers or other starting colors makes the process slower, stuck or unsolvable.
play with the code, change the inputs, change the hidden table (currently empty; no hidden layers)
hidden table is a table of the number of nodes in each hidden layer.
this is really just a feedforward neural network implementation of something very simple, but I thought it was a neat way to try out the structure I made for networks.
source:
goes in ServerScriptService, runs when you press play (character unnecessary)
Using one node in the hidden layer, it generally makes it a grey color, pretty much the average of the hue of each of the goal colors. That one node has a weight attached to each output representing its lightness. With two nodes, they tend to become inverse colors (e.g. red and cyan, green and magenta.) With three nodes, they tend to try to become red green and blue.
With multiple layers, the random changes isn’t really enough to get it somewhere. There are too many parameters, and changing one throws off others, so the local mins are stickier. The next step, I suppose, would be backpropagation. Could somebody explain backpropagation/how I could implement it in this?
I’m having tons of trouble with backpropagation. I have a feeling it doesn’t make sense or apply to this case where I’m using colors, because the values aren’t exactly numeric; they’re stored in triples, and distances/differences are in 3 dimensions instead of a scale. How can I make it learn faster and not get stuck in a bad local min as much when there’s multiple layers?
This is cool! I don’t know much about neural networks, but there’s something about visualization that strikes a chord in me.
I have a feeling it doesn’t make sense or apply to this case where I’m using colors, because the values aren’t exactly numeric; they’re stored in triples, and distances/differences are in 3 dimensions instead of a scale.
RGB coordinates are just a parameterization of the color space. Using the Euclidean metric with the coordinates is an OK approximation for small distances. But the entire color space is something different
This Wikipedia article has some equations you might be interested in.
It seems that all color models use at least 3 parameters. I could use the hex number form to represent each color as a unique number, but then everything becomes extremely strange. Red is 16711680, green is 65280, blue is 255. It’s a modular system, so at each multiple of 256, the blue component becomes nothing and green is increased. This system would need to be on a scale from 0 to 1 also, so each value would be represented by number/16777216 and the network would need to be able to learn something significantly more complicated.
In that color system, half of green isn’t dark green, it’s half green half blue. the weights be strange.
I still don’t see the bias terms, or am I missing something.
Basically, you can treat all of the vectors as being one longer: [x1 x2 x3 1] where the last element is a constant 1. This allows you to include constants (aka bias) in the network nodes.
Because your input is biased, this could be important.
The biases are actually included in the network itself instead of the set of weights. Line 17 of the paste in the createnetwork function makes some random 0-1 biases where it says layer[i]=math.random(). The bias is counted on line 92/93 in readn where ‘v’ is the sum of the output values of the nodes of the previous layer multiplied by their weight to that current node. Instead of setting v to 0, it’s set to that initial default node value that represents the bias.
The weights are set to random values, but so is the initial value before the outputs are all added up. The bias is just a pretend output value of 1 with a weight attached to it. Instead of simulating an additional node, I just go directly to setting what a bias would do as the default value of the node.
With the bias node, the outputs for all of the nodes in a layer are calculated, and then, for a node in the next layer, each node in the previous layer (including the bias node) takes its output, multiplies it by its weight to that node, and adds it to the value of that node. The bias node has an output value of 1 and a weight. Instead of having a node, I just have a value for each node in the network that gets set as the base value of that node, which is the same effect as having an extra node attached to it with a value of 1 and a weight.
line 17 is where initial bias is set randomly, line 25 is for the actual weights. in the alternetwork function, it has a list of numbers it can change, where it adds biases on line 45 and weights on 54, and it picks a random thing in the network and changes it.
My advice is to avoid this optimization for the purpose of debugging, and add in all of the bias terms. Every hidden node and every output node has a bias term that is independent. I’m not proficient with lua, but it seems like you are tuning a reduced set of parameters.
Every hidden and every output node has an independent bias term.
Every node does have its own bias, or its own default value. There’s a table called “network” which contains a number of layers and another “weights” table. The first layer is the set of inputs, and the last layer is the set of outputs. Within each layer is a number for each node in that layer. If the bias were represented as a node, this number would just be zero until the network was given an input that got run through it, but instead of zero, they’re all set to a random bias from the start. The weights table contains a set of weights that connects one layer to the next, where each set is a table of the weights from one node in the previous layer to a node in the next layer. (I turned off the bias for only output nodes in this network because it felt like it was slowing down this particular model of colors.)
I could change up the structure of the network so that the bias looked like it was a node, but I thought of this first and it worked. Backpropagation and autoencoders are the next challenge.
Does alternetworks update the bias? It seems to update weights for inputs, but not the bias.
Getting a canonical version of the network up and running is still my advice. As it stands, you have to convert every new piece you add into your format, which makes it more difficult to debug any issues that arise.
alternetwork has a list called “paths” where I put a number that it can change. It puts all of the weights in there on line 54 and puts all of the biases in there on line 45, then picks a random thing from that list and changes it on line 67.
Now that I’ve read up on backpropagation and made my own implementation (albeit with limited success), I’ll do my best to explain it. You should be familiar with multivariable calculus.
Let F be the network function, and consider a training example (I, O), denoting the input and output respectively. We wish to minimize the difference between F(I) and O, which we will represent as an error function E.
In order to do this, we have to tune the weights until E is sufficiently close to zero. Because I and O are constants, E is only dependent on the weights. We take the gradient of E, which is a function of the network output, and use the chain rule to transform it into a function of the weights. From there, we can perform a gradient descent on the configuration space of the weights.
The Wikipedia article on backpropagation has the standard equations for a sigmoid activation function. I’d start there.
Trying to make real AI using neural networks is like trying to build a house from splinters.
Neural networks are adaptable functions. Run em a few times and adjust what you want your output to be and it will ever so slowly adjust to produce that output. Backpropagation doesn’t make it smart.
It doesn’t replicate, it doesn’t understand, it’s simply a function with x amount of input adjusting a finite amount of variables to give desired output.