Well, you technically can do multiple ways actually. Here are some below:
Population-Based Self-Play
The sword-fighting self-learning AI codes uses this method where it choose an AI that receives the highest reward to be copied to others every once in a while. That theoretically should eliminate any useless AIs.
Use Single Neural Network, But Make It Used By Different Reinforcement Learning Algorithms
There’s a clear separation between Neural Networks and Reinforcement Learning Algorithms codes. You can use a single neural network that handles the same logic, and make it used by multiple RL algorithms to handle different sequences of data at any time. Also each of the RL algorithms must have their own ReinforcementLearningQuickSetup if you want to have them.
You might initialize it something like this:
local NeuralNetwork = DataPredict.Models.NeuralNetwork.new() -- Let's just assume we already setup the layers.
local AdvantageActorCritic = DataPredict.Models.AdvantageActorCritic
local A2C_1 = AdvantageActorCritic.new()
local A2C_2 = AdvantageActorCritic.new()
local ReinforcementLearningQuickSetup = DataPredict.Others.ReinforcementLearningQuickSetup
local RLQS_1 = ReinforcementLearningQuickSetup.new()
local RLQS_2 = ReinforcementLearningQuickSetup.new()
A2C_1:setModel(NeuralNetwork) -- Two different reinforcement learning algorithms using the same neural network
A2C_2:setModel(NeuralNetwork)
RLQS_1:setModel(A2C_1) -- Each reinforcement learning algorithm have their own "Quick Setup"
RLQS_2:setModel(A2C_2)
Then there we go!
Shared Gradients
You can create a copy of generated neural network model parameters (which we will call parent model parameters) and ensure that all neural networks uses that specific model parameters. When the neural network “updates”, then you send that gradients to the central model parameters. Every once in a while, you reupload back the “parent” model parameters back to the individual neural networks.
It’s a bit complicated to setup and I’m not sure if it is worth writing the whole thing here. Let me know if you are interested in using this way if you’ve tried the others.
Thank you for the very detailed response, I am just confused about the second option. The AdvantageActorCritic model has no setModel function. Currently I am setting a critic model and actor model. Do you mean that I should use the same critic and actor models for all the AdvantageActorCritic models?
We were able to implement the single neural network method and the results are better, but we are having an issue where after some time, all the agents stop moving. We have done some debugging and found some information:
The output action is reported as nil after the issue begins,
Occasionally in NewAdvantageActorCriticModel:setEpisodeUpdateFunction, the advantage and actionProbabilty are nil, and this leads to an error causing the Model:getModel():reset() function to be called every 60th time rather than providing the nil output.
The environment vector looks normal when it succeeds and fails, and also when the output is not nil.
We were able to get them working and now training against hard coded bots so just a matter of refining the reward function. There’s an issue where after a while they all stop moving and the output from the ReinforcementNeuralNetworkQuickSetup is reported as nil. If I restart the server it goes back to normal even if the data is saved.
This one isn’t related to Model:reset() but I think it’s a similar issue. We will try Q-learning also to see if it resolved it
To be honest, I just think the neural network calculation isn’t fast enough between the reinforcement learning models. Maybe all you need is to add pcall around it?
Well, this isn’t really a decoration suggestion but have you heard of KAN? It’s said to improve machine learning training 10x faster according to what I heard. Instead of a multitude of activation functions, KAN’s algorithm uses a B-Spline(Basic Spline) to mush it together which supposedly speeds up the process alongside much other. I think this technique is interesting and would be greatly helpful for module users if you manage to implement it. It is also said that KAN is more readable than MLP, so users can also see what the bot is actually trying to do/how it is learning because of the spline structure. Something else that caught my eye was it said KAN actually beat the curse of dimensionality, which is most likely the reason for the nil values that keep popping up when people try training using your RL models, so this would improve it I think. KAN also needs less data to converge faster than MLP (200 parameters needed for KAN compared to 300,000 parameters from MLP to outperform it)
I hope you are able to look into it. :> Sorry for the lack of activity.
Oh, I thought you knew about KAN, it’s a mindblowing recent invention that people speculate will takeover MLPs in future AI implementations, Kolmogorov-Arnold Network. You should read the paper on it, it’s amazing.