DataPredict [Release 1.17] - General Purpose Machine And Deep Learning Library (Learning AIs, Generative AIs, and more!)

Let’s use PPO-Clip then. You may want to set the numberOfReinforcementsPerEpisode for your ReinforcementLearningQuickSetup

1 Like

Any recommended settings for PPOClip, quicksetup or layers?

Ah. When you use any reinforcement learning models that ends with NeuralNetwork, it will inherit Neural Network model by default. I plan to remove this for this next upcoming version.

For PPO, please refer to the Sword fighting AIs code.

Actor/Critic base right? (30words)

Yep. PPO uses the Actor/Critic ones.

image
errrm… nevermind, I dont know how to fix it…

Okay, let’s go back to advantage actor critic then…

Sorry, can’t really help right now since I am making changes to the documentations.

1 Like

The :getActor() function is broken for the A2C, I don’t think I need it though, :reset() or not :reset() gives same results either way.


I gave it a goal to drive towards, but it still keeps shifting between throttleup and throttledown.

image
same layers for the critic as well

Is my reward function maybe not optimal?

Can you try calling :episodeUpdate() from the A2C (not the ReinforcementLearningQuickSetup) whenever the car resets? I think it doesn’t update properly.

1 Like

This time it almost reached the curve before crashing, but other than that, not many other improvements.

External Media

Here’s a minute long recording I took of it training, maybe you could take a look and analyse it for any faults?

It keeps repeating these movements over and over, it either almost reaches the curve or crashes to its right or left, always in the exact same spot.

Let’s try giving a reward if the length of side ray casts stays the same for a period of time. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

That should encourage the car to move straight.

1 Like

Each time I give it a reward, would I :reinforce() (e.g 5 seconds with constant raycast distance then :reinforce() ) or increase its total reward value then :reinforce() that every 0.01 seconds in the while loop

Increase its total reward. calling reinforce() will just update the model.

1 Like

It wants to stay still when it gets rewarded though, what do I do about that

Already have that, distance from goal = punishment

One thing I have noticed is when you try to reinforce the AI, you need to make sure the reinforcement is provided in a truly valid area, don’t mix bad data.

1 Like

Okay let’s modify what I said before:

Let’s try giving a reward if the length of side ray casts stays the same for a period of time. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

to something like:

Let’s try giving a reward if the length of side ray casts stays the same for a period of time provided that the car exceeds certain throttle speed. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

1 Like

Uploading: 20240522-1501-55.5603461.mp4…

So far all it does it either crash at these corners as u see here, but I think Aqwam’s advise worked abit and the vehicle managed to atleast hit the corner 2 times as u see in the video, it has never been able to do that before, other than that I don’t know how else I can improve it.

I’ll try this (30 wordsssssssssss)