im trying to do some experiment and adding ExperienceReplay but end at this error. i think you forgot to implement “update” function to ReinforcementLearningBaseModel.
that should be categoricalUpdate(). Not sure why I have missed that. I’ll just go and do fixes to the original library.
ProximalPolicyOptimizationClip dont work with UniformExperienceReplay
local function buildActorCriticRLQSModel(ID)
local MainModel = DataPredict.Models.ProximalPolicyOptimizationClip.new()
local AModel = buildActorModel(ID)
local CModel = buildCriticModel(ID)
MainModel:setActorModel(AModel)
MainModel:setCriticModel(CModel)
local MainModelQuickSetup = DataPredict.QuickSetups.CategoricalPolicy.new(60, 0, "Sample")
local ExperienceReplay = DataPredict.ExperienceReplays.UniformExperienceReplay.new(1, 5, 30)
MainModelQuickSetup:setModel(MainModel)
MainModelQuickSetup:setPrintOutput(false)
MainModelQuickSetup:setClassesList(classesList)
MainModelQuickSetup:setExperienceReplay(ExperienceReplay)
table.insert(ReinforcementLearningQuickSetupArray, MainModelQuickSetup)
if includeRND then
table.insert(RNDModelArray, buildRNDModel(ID))
end
return MainModelQuickSetup
end
Well technically, it should be like that though if we’re using the existing RL theories. I won’t go full explanation why it should be like that. So stick with variants for Deep Q Learning, Deep SARSA, And Deep Expected SARSA if you really want to use the experience replays
How is your reward “nan”?!?! That could have led to the “nan” model parameters.
I would like to know what PC you’re using.
Yea, I was wondering why you’re catching all these errors while I’m not. There could be difference in how your device handles the numbers.
since im travelling (not in home) im using laptop (ASUS Vivobook S 14 OLED) it using Ryzen 5 7535HS. it doesnt have descrate gpu in the laptop just integrated gpu with 16gb ram ddr5…
i think i had to add return math.max(-1e3, reward)
to prevent the nan thingy.
That’s very strange. You pretty much have some of the similar specs with my pc, but you’re the one having a lot of problems… Nobody else in this thread complained about the same thing.
Keep it the same, don’t change anything else.
i just wanted to remove “jump” class… so they focus on attack & movement…
Okay yeah, you can do that. Just make sure the final layers stays as LeakyReLU for all models.
why the RND dont work (break everything)? whenever turn it on and start the run, it make the npc models spinning around…

Rewards generated by RND is likely too high.
i have spent like almost 2 day no result, can u help check?
training model 3.rbxl (341.0 KB)
also does frame time per sec affect to the result? 60 fps & 240 fps…
and you can see that even the reward isnt nan, the parameter still can be nan…
is there adaptive learning rate? and how do i implement it here?
Yeah, there is adaptive learning rate. You can either use the optimizers on their own or combine them with ValueSchedulers. If you choose the latter, the optimizers have the setLearningRateScheduler() function.
Yes. I think the calculations might be too slow for the 240 FPS (if we’re talking about heart-beat service and stuff like that. Not too sure about the “visualization” one)
so should i sticky with 60 or 240