[Added SAC, DDPG and TD3] DataPredict [Release 2.0] - Machine Learning And Deep Learning Library (Learning AIs, Generative AIs, and more!)

MYOriginsWorkshop · May 24, 2024, 2:45pm

Take advantage of the Matrix library of mines. Unlike the DataPredict code structure, MatrixL is pretty much easy to use and you can read the codes behind them easily. You may want to use these functions:

getSize()
applyFunction()

Also keep in mind that the neural networks stores table of matrices for the model parameters.

noisecooldeadpool362 · May 25, 2024, 11:31am

Hilo! The NEAT Agent I trained manages to reach the goal point, but I can’t seem to train it to stop there. It either slows down before reaching the goal then reverses away from it, or drives full throttle into the wall thats in front of the goal after reaching the goal. Any thoughts?

MYOriginsWorkshop · May 25, 2024, 12:14pm

I don’t know. Maybe use a little touch of reinforcement learning to fine tune the model parameters?

noisecooldeadpool362 · May 25, 2024, 12:45pm

Hmm, what about any reward functions?

MYOriginsWorkshop · May 25, 2024, 12:49pm

All I could think is to reward based on distance between the outer edges of the car and the edges of that box I suppose.

noisecooldeadpool362 · May 25, 2024, 2:00pm

Hmm, I changed my settings a bit, and opted for raw output to control the car’s throttle, steering and brake boolean, but my main problem is the car likes to throttle then dethrottle afterwards, making it seem like the car is stationary. I implemented a penalty that imposes -0.01 for every 0.01 seconds the car remains stationary, but it takes a long time for it to stop going back and forth rapidly and pick a direction to drive in, even after that it drives in the wrong direction due to lack of training and time wasted in the rapid acceleration/deceleration that causes the car to be stationary. I was wondering if you knew any setting or parameters I could play around with generally (i.e learning rate or such) to minimise this sort of behaviour?

MYOriginsWorkshop · May 25, 2024, 2:05pm

Meh, can’t really help without a video.

noisecooldeadpool362 · May 25, 2024, 2:10pm

I found that raising the spawn point a bit higher motivates the vehicle to move a bit more, hence the subtle bouncing when it spawns

MYOriginsWorkshop · May 25, 2024, 2:22pm

Please only use raw values when you fully understand the difference between different reinforcement learning algorithms and the structure of neural networks (particularly the activation functions). Seriously. It’s going to take up much more time than it should be.

If you insist on doing this, make sure your last layer isn’t a softmax layer. I’m too lazy to even explain at this point.

noisecooldeadpool362 · May 25, 2024, 2:44pm

Yep, I understand, softmax is only for labelling since it spreads out the probability, which is what I wouldn’t need for raw values. Also, i’m not using a reinforcement algorithm this time , it’s a NEAT algorithm.

noisecooldeadpool362 · May 28, 2024, 12:22pm

For anyone looking to understand neural networks a bit before reinforcement learning or generally machine learning, here’s a series I found super helpful.

Although it is in JavaScript, you should be able to understand through the graphics and physical examples he uses in this video.

If you are interested, you can view the playlist course from his channel that explains step-by-step how to build the simulator he scripted in HTML, CSS and JavaScript.

Cffex · May 29, 2024, 2:27pm

To be fair, when I started knowing about this concept, almost every source that is not some kind of research paper (Youtube, etc) simply explained in regards to the actual ‘Neural Network’ in the brain.

Which complicated things much.

They did not specify that traditional neural networks are literally just a function with parameters in it. The goal of gradient descent is to calculate the optimal direction of each parameters relative to each other in order to bring down the cost function as much as possible.

In RL for games, I imagine it as a function which fits itself to a high-dimensional curve, like a quadratic function trying to adjust it’s a, b, c coefficients to imitate a target quadratic curve. The target is like the curve that describe the decision of the agent for every input value.

Just sharing some of my experiences.

MYOriginsWorkshop · May 31, 2024, 8:42am

Hi guys, I have added a survey at the main post for me to check how satisfied you are with this library.

Do let me know what are your thoughts!

noisecooldeadpool362 · June 1, 2024, 11:25am

Well, his method of explaining does pay a buck when you continue down his series and do the code with him, I’d side with you on the fact that research papers are the key to concisely learning Machine Learning but for anyone not wanting to sit and read laps of papers, this video would be great to get a base understanding of what you can expect from a neural network / training a network with proven proof of concept results. ( also, one of the few videos that don’t throw hidden layers into a black box and ditch it off, he uses matrices to explain some of it in his videos if not, a bunch of graphic solutions like a function graph that suit visual learners like me :3 )

Did I forget to mention it’s from a Karelia University of Applied Science Professor? I’d watch it anyways : )

disclosuure · June 2, 2024, 1:48am

Hello, I am trying to train sword fighting bots, but they just jump around in circles constantly. I’ve trained it for a few hours but there is no improvement. There are 32 agents running on a single model. They seem to just use all the possible outputs at all times.

I used some of the code from the example version 5, but I modified it to use a single model and for multiple agents.

MYOriginsWorkshop · June 2, 2024, 2:17am

Yeah. Don’t use a single model. The thing is that all the reinforcement learning algorithms requires data from previous frames and the data must be sequential in order to work. Otherwise, you are just screwing up the training.

If you really still want to use a single model, at least use different ReinforcementLearningQuickSetup and not use ProximalPolicyOptimization model.

disclosuure · June 2, 2024, 1:22pm

The reason we are using a single model is for training speed so we can do training simultaneously and speed it up. For example like this. If that’s not the correct approach then how can I reduce time for training?

MYOriginsWorkshop · June 2, 2024, 2:02pm

Well, you technically can do multiple ways actually. Here are some below:

Population-Based Self-Play

The sword-fighting self-learning AI codes uses this method where it choose an AI that receives the highest reward to be copied to others every once in a while. That theoretically should eliminate any useless AIs.

Use Single Neural Network, But Make It Used By Different Reinforcement Learning Algorithms

There’s a clear separation between Neural Networks and Reinforcement Learning Algorithms codes. You can use a single neural network that handles the same logic, and make it used by multiple RL algorithms to handle different sequences of data at any time. Also each of the RL algorithms must have their own ReinforcementLearningQuickSetup if you want to have them.

You might initialize it something like this:


local NeuralNetwork = DataPredict.Models.NeuralNetwork.new() -- Let's just assume we already setup the layers.

local AdvantageActorCritic = DataPredict.Models.AdvantageActorCritic

local A2C_1 = AdvantageActorCritic.new()

local A2C_2 = AdvantageActorCritic.new()

local ReinforcementLearningQuickSetup =  DataPredict.Others.ReinforcementLearningQuickSetup

local RLQS_1 = ReinforcementLearningQuickSetup.new()

local RLQS_2 = ReinforcementLearningQuickSetup.new()

A2C_1:setModel(NeuralNetwork) -- Two different reinforcement learning algorithms using the same neural network

A2C_2:setModel(NeuralNetwork)

RLQS_1:setModel(A2C_1) -- Each reinforcement learning algorithm have their own "Quick Setup"

RLQS_2:setModel(A2C_2)

Then there we go!

Shared Gradients

You can create a copy of generated neural network model parameters (which we will call parent model parameters) and ensure that all neural networks uses that specific model parameters. When the neural network “updates”, then you send that gradients to the central model parameters. Every once in a while, you reupload back the “parent” model parameters back to the individual neural networks.

It’s a bit complicated to setup and I’m not sure if it is worth writing the whole thing here. Let me know if you are interested in using this way if you’ve tried the others.

disclosuure · June 2, 2024, 2:14pm

Thank you for the very detailed response, I am just confused about the second option. The AdvantageActorCritic model has no setModel function. Currently I am setting a critic model and actor model. Do you mean that I should use the same critic and actor models for all the AdvantageActorCritic models?

MYOriginsWorkshop · June 2, 2024, 2:17pm

Ah sorry. I completely forgotten about that.

All critic models will use a single neural network, and all the actor models will use another single neural network.