DataPredict [Release 1.19] - General Purpose Machine And Deep Learning Library (Learning AIs, Generative AIs, and more!)

noisecooldeadpool362 · May 22, 2024, 2:03pm

Actor/Critic base right? (30words)

MYOriginsWorkshop · May 22, 2024, 2:04pm

Yep. PPO uses the Actor/Critic ones.

noisecooldeadpool362 · May 22, 2024, 2:10pm

errrm… nevermind, I dont know how to fix it…

MYOriginsWorkshop · May 22, 2024, 2:20pm

Okay, let’s go back to advantage actor critic then…

Sorry, can’t really help right now since I am making changes to the documentations.

noisecooldeadpool362 · May 22, 2024, 2:24pm

The :getActor() function is broken for the A2C, I don’t think I need it though, :reset() or not :reset() gives same results either way.

noisecooldeadpool362 · May 22, 2024, 2:27pm

I gave it a goal to drive towards, but it still keeps shifting between throttleup and throttledown.

same layers for the critic as well

Is my reward function maybe not optimal?

MYOriginsWorkshop · May 22, 2024, 2:31pm

Can you try calling :episodeUpdate() from the A2C (not the ReinforcementLearningQuickSetup) whenever the car resets? I think it doesn’t update properly.

noisecooldeadpool362 · May 22, 2024, 2:41pm

This time it almost reached the curve before crashing, but other than that, not many other improvements.

External Media

Here’s a minute long recording I took of it training, maybe you could take a look and analyse it for any faults?

It keeps repeating these movements over and over, it either almost reaches the curve or crashes to its right or left, always in the exact same spot.

MYOriginsWorkshop · May 22, 2024, 2:49pm

Let’s try giving a reward if the length of side ray casts stays the same for a period of time. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

That should encourage the car to move straight.

noisecooldeadpool362 · May 22, 2024, 2:51pm

Each time I give it a reward, would I :reinforce() (e.g 5 seconds with constant raycast distance then :reinforce() ) or increase its total reward value then :reinforce() that every 0.01 seconds in the while loop

MYOriginsWorkshop · May 22, 2024, 2:52pm

Increase its total reward. calling reinforce() will just update the model.

noisecooldeadpool362 · May 22, 2024, 2:56pm

It wants to stay still when it gets rewarded though, what do I do about that

noisecooldeadpool362 · May 22, 2024, 2:58pm

Already have that, distance from goal = punishment

RaterixRGL · May 22, 2024, 2:59pm

One thing I have noticed is when you try to reinforce the AI, you need to make sure the reinforcement is provided in a truly valid area, don’t mix bad data.

MYOriginsWorkshop · May 22, 2024, 3:03pm

Okay let’s modify what I said before:

Let’s try giving a reward if the length of side ray casts stays the same for a period of time. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

to something like:

Let’s try giving a reward if the length of side ray casts stays the same for a period of time provided that the car exceeds certain throttle speed. Otherwise give zero rewards. Do not use punishments because the ray cast will definitely change its length if it was turning at the corners.

noisecooldeadpool362 · May 22, 2024, 3:04pm

Uploading: 20240522-1501-55.5603461.mp4…

So far all it does it either crash at these corners as u see here, but I think Aqwam’s advise worked abit and the vehicle managed to atleast hit the corner 2 times as u see in the video, it has never been able to do that before, other than that I don’t know how else I can improve it.

noisecooldeadpool362 · May 22, 2024, 3:04pm

I’ll try this (30 wordsssssssssss)

noisecooldeadpool362 · May 22, 2024, 3:09pm

It definitely is working! Although I was a bit lazy and set it to if Throttle==1 (I definitely need to add a speed variable) it still made the car drive in an almost perfectly straight path and almost cleared the entire curve before crashing into the straight road in front of it, definitely have not made it this far before, especially the fact that it went that far just 4 seconds into training. I will leave it here for today and test it again tomorrow, ill let you know how it goes as usual!

MYOriginsWorkshop · May 22, 2024, 8:16pm

Release 1.16 Version Update!

Refactored and renamed Deep Q-Learning, Deep SARSA and Deep Expected SARSA. This includes the variants.
Made some bug fixes and removed redundant codes for some of the algorithms stated above.
That’s pretty much it…

MYOriginsWorkshop · May 23, 2024, 11:31am

Hello guys!

I have uploaded the version 5 of the sword-fighting AI codes. The version 5 brings back some of the codes from the version 1 so that the AIs can learn more advanced tactics. It is also combined with the version 4 since the AIs in that version learnt things much more faster than the previous versions.

Also, credit to @noisecooldeadpool362 for providing the code improvements related to angle calculations and this will be applied to future versions of the sword-fighting AI codes.