DataPredict [Release 1.22] - Machine Learning And Deep Learning Library (Learning AIs, Generative AIs, and more!)

Actor/Critic base right? (30words)

Yep. PPO uses the Actor/Critic ones.

image
errrm… nevermind, I dont know how to fix it…

Okay, let’s go back to advantage actor critic then…

Sorry, can’t really help right now since I am making changes to the documentations.

1 Like

The :getActor() function is broken for the A2C, I don’t think I need it though, :reset() or not :reset() gives same results either way.


I gave it a goal to drive towards, but it still keeps shifting between throttleup and throttledown.

image
same layers for the critic as well

Is my reward function maybe not optimal?

Can you try calling :episodeUpdate() from the A2C (not the ReinforcementLearningQuickSetup) whenever the car resets? I think it doesn’t update properly.

1 Like

This time it almost reached the curve before crashing, but other than that, not many other improvements.

External Media

Here’s a minute long recording I took of it training, maybe you could take a look and analyse it for any faults?

It keeps repeating these movements over and over, it either almost reaches the curve or crashes to its right or left, always in the exact same spot.

Let’s try giving a reward if the length of side ray casts stays the same for a period of time. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

That should encourage the car to move straight.

1 Like

Each time I give it a reward, would I :reinforce() (e.g 5 seconds with constant raycast distance then :reinforce() ) or increase its total reward value then :reinforce() that every 0.01 seconds in the while loop

Increase its total reward. calling reinforce() will just update the model.

1 Like

It wants to stay still when it gets rewarded though, what do I do about that

Already have that, distance from goal = punishment

One thing I have noticed is when you try to reinforce the AI, you need to make sure the reinforcement is provided in a truly valid area, don’t mix bad data.

1 Like

Okay let’s modify what I said before:

Let’s try giving a reward if the length of side ray casts stays the same for a period of time. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

to something like:

Let’s try giving a reward if the length of side ray casts stays the same for a period of time provided that the car exceeds certain throttle speed. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

1 Like

Uploading: 20240522-1501-55.5603461.mp4…

So far all it does it either crash at these corners as u see here, but I think Aqwam’s advise worked abit and the vehicle managed to atleast hit the corner 2 times as u see in the video, it has never been able to do that before, other than that I don’t know how else I can improve it.

I’ll try this (30 wordsssssssssss)

It definitely is working! Although I was a bit lazy and set it to if Throttle==1 (I definitely need to add a speed variable) it still made the car drive in an almost perfectly straight path and almost cleared the entire curve before crashing into the straight road in front of it, definitely have not made it this far before, especially the fact that it went that far just 4 seconds into training. I will leave it here for today and test it again tomorrow, ill let you know how it goes as usual!

1 Like

Release 1.16 Version Update!

  • Refactored and renamed Deep Q-Learning, Deep SARSA and Deep Expected SARSA. This includes the variants.

  • Made some bug fixes and removed redundant codes for some of the algorithms stated above.

  • That’s pretty much it…

4 Likes

Hello guys!

I have uploaded the version 5 of the sword-fighting AI codes. The version 5 brings back some of the codes from the version 1 so that the AIs can learn more advanced tactics. It is also combined with the version 4 since the AIs in that version learnt things much more faster than the previous versions.

Also, credit to @noisecooldeadpool362 for providing the code improvements related to angle calculations and this will be applied to future versions of the sword-fighting AI codes.

2 Likes