We were able to implement the single neural network method and the results are better, but we are having an issue where after some time, all the agents stop moving. We have done some debugging and found some information:
The output action is reported as nil after the issue begins,
Occasionally in NewAdvantageActorCriticModel:setEpisodeUpdateFunction, the advantage and actionProbabilty are nil, and this leads to an error causing the Model:getModel():reset() function to be called every 60th time rather than providing the nil output.
The environment vector looks normal when it succeeds and fails, and also when the output is not nil.
We were able to get them working and now training against hard coded bots so just a matter of refining the reward function. There’s an issue where after a while they all stop moving and the output from the ReinforcementNeuralNetworkQuickSetup is reported as nil. If I restart the server it goes back to normal even if the data is saved.
This one isn’t related to Model:reset() but I think it’s a similar issue. We will try Q-learning also to see if it resolved it
To be honest, I just think the neural network calculation isn’t fast enough between the reinforcement learning models. Maybe all you need is to add pcall around it?
Well, this isn’t really a decoration suggestion but have you heard of KAN? It’s said to improve machine learning training 10x faster according to what I heard. Instead of a multitude of activation functions, KAN’s algorithm uses a B-Spline(Basic Spline) to mush it together which supposedly speeds up the process alongside much other. I think this technique is interesting and would be greatly helpful for module users if you manage to implement it. It is also said that KAN is more readable than MLP, so users can also see what the bot is actually trying to do/how it is learning because of the spline structure. Something else that caught my eye was it said KAN actually beat the curse of dimensionality, which is most likely the reason for the nil values that keep popping up when people try training using your RL models, so this would improve it I think. KAN also needs less data to converge faster than MLP (200 parameters needed for KAN compared to 300,000 parameters from MLP to outperform it)
I hope you are able to look into it. :> Sorry for the lack of activity.
Oh, I thought you knew about KAN, it’s a mindblowing recent invention that people speculate will takeover MLPs in future AI implementations, Kolmogorov-Arnold Network. You should read the paper on it, it’s amazing.