I think I managed to find some good settings nonetheless, the agent seems to not turn into walls anymore with A2C and surprisingly reaches the goal part on the straight stretch
It might look like its only using its first output, but after further testing I found out that that is in fact not true, I don’t know why its so perfect that it doesn’t want to steer away from the goal but, it certainly steers either to the left or the right which is a case of bad reward function on my part.
It is less random, to say the least, might have to do with my epsilon setting.
Great improvement I saw, the Agent actually speeds up then tries to slow down by putting its throttle into negative, slowing the car down but not enough so it eventually hits the wall, I will hook up a brake function to let it slow down and see what happens! (PS: not talking abt the video, just trained a fresh agent not too long ago) And I think I figured out why my agent kept turning into the wall a few days ago… you see, my epilson value for reinforcementquicksetup was set above 0 for some reason that made the correct data get manipulated into incorrect data, or so I presume. After setting it to 0, it is much more accurate!
Hmm, could I ask if you’re planning on maybe adding Neuroevolution or Genetic Algorithms to your module? I think from my experience, roblox has a tough time trying to keep up with deep learning calculations as you had previously said, Luau’s calculation speed is not up to par with the rest of the script, hence bringing the error into the room. Doing some research, I have seen a very successful AI attempt in roblox, coming from a genetic algorithm within an autonomous Police AI agent structure, I think you’ve seen his twitter posts before but i’ll quote his devforum link regardless,
He has done a brilliant job at creating his piece of AI, which leads me to suggest, because your module has a wonderful neural network setup with matrices and such, would it be possible for you to maybe implement a genetic algorithm or a neuroevolution implementation?
Nope. No plans. The thing with genetic algorithms / neuro-evolution, you have a lot of way to do it. For me to build the code that covers everything is pretty much impossible. Here are the questions related to this issue.
Do I choose a single value or a single layer (matrix) to evolve?
How do adjust parameters that represents evolution when combining two parameters? Is it choosing maximum, average or minimum values?
Do I construct new neural network architecture from an existing one to represent it as evolution?
And many more.
There’s so many things you can do, and I think it is best to not implement it for flexibility and easier code maintenance.
Hmm, yeah, I agree, it is truly flexible, the amount of ways u can accomplish this is remarkable, apparently scripton rescripted his work 10 times or so, according to his tweets. Well, do you have any tips for using your module to somehow integrate a genetic algorithm/neuroevolution simulation to?
Take advantage of the Matrix library of mines. Unlike the DataPredict code structure, MatrixL is pretty much easy to use and you can read the codes behind them easily. You may want to use these functions:
getSize()
applyFunction()
Also keep in mind that the neural networks stores table of matrices for the model parameters.
Hilo! The NEAT Agent I trained manages to reach the goal point, but I can’t seem to train it to stop there. It either slows down before reaching the goal then reverses away from it, or drives full throttle into the wall thats in front of the goal after reaching the goal. Any thoughts?
Hmm, I changed my settings a bit, and opted for raw output to control the car’s throttle, steering and brake boolean, but my main problem is the car likes to throttle then dethrottle afterwards, making it seem like the car is stationary. I implemented a penalty that imposes -0.01 for every 0.01 seconds the car remains stationary, but it takes a long time for it to stop going back and forth rapidly and pick a direction to drive in, even after that it drives in the wrong direction due to lack of training and time wasted in the rapid acceleration/deceleration that causes the car to be stationary. I was wondering if you knew any setting or parameters I could play around with generally (i.e learning rate or such) to minimise this sort of behaviour?
Please only use raw values when you fully understand the difference between different reinforcement learning algorithms and the structure of neural networks (particularly the activation functions). Seriously. It’s going to take up much more time than it should be.
If you insist on doing this, make sure your last layer isn’t a softmax layer. I’m too lazy to even explain at this point.
Yep, I understand, softmax is only for labelling since it spreads out the probability, which is what I wouldn’t need for raw values. Also, i’m not using a reinforcement algorithm this time , it’s a NEAT algorithm.
For anyone looking to understand neural networks a bit before reinforcement learning or generally machine learning, here’s a series I found super helpful.
Although it is in JavaScript, you should be able to understand through the graphics and physical examples he uses in this video.
If you are interested, you can view the playlist course from his channel that explains step-by-step how to build the simulator he scripted in HTML, CSS and JavaScript.
To be fair, when I started knowing about this concept, almost every source that is not some kind of research paper (Youtube, etc) simply explained in regards to the actual ‘Neural Network’ in the brain.
Which complicated things much.
They did not specify that traditional neural networks are literally just a function with parameters in it. The goal of gradient descent is to calculate the optimal direction of each parameters relative to each other in order to bring down the cost function as much as possible.
In RL for games, I imagine it as a function which fits itself to a high-dimensional curve, like a quadratic function trying to adjust it’s a, b, c coefficients to imitate a target quadratic curve. The target is like the curve that describe the decision of the agent for every input value.