[Added SAC, DDPG and TD3] DataPredict [Release 2.0] - Machine Learning And Deep Learning Library (Learning AIs, Generative AIs, and more!)

Apologies for the late reply. I’ll have a look at it and make changes to it.

2 Likes

First off, you might want to fix the parameters inside .new() for the QLearning. Apparently I missed the error in the documentation and you ended up following the wrong information.

3 Likes

For the Goal variable, does it represent only a single “goal” block of yours or both of them?

1 Like

Sorry for the lack of clarity, the Goal variable is only tied to the part on the right. The part on the left is the part where the NPC spawns at the start or teleports to it once it gets penalised too much.
There should be no reason for the NPC to head to the other block besides the Goal block.

1 Like

To be honest, I think it’s because:

  • Doesn’t have a “none” action so that it stops moving when reaches the goal, which causing the NPC to “overshoot” its movement away from the goal.

  • Seems like NPC didn’t learn to how to fix the mistake whenever it “overshoots”, causing it to go full detour. I guess give more training time would help?

1 Like

Hmm, the NPC does have a ‘None’ Action that makes it run an empty function, and I did let it train for roughly 45-60 mins, i’ll try again for another hour and see what happens then.

Also, Updated the model a bit and working on an autosave function right now that saves the model parameters to a datastore once it does not get respawned after a certain amount of time(does not get penalised too much to be respawned). Does the modified Model look alright? ^

I read somewhere that having a Softmax activation function at the output layer helps in distributing values and providing better results, is that true?

1 Like

Replace 100 with 1 for the first parameter. If I were you, I’ll let the other two parameters empty because it will default values that are recommended by research papers.

image
This happens if I don’t keep my values as is, is there something else that’s causing this?
(values become NaN or inf)

Oof, I just realized something wrong with your inputs. You need to do this instead:

environmentVector = {{ 1, currRotationErrorY,  currDistanceFromGoal }}

Then set number of neurons for the first layer to 2.

1 Like

Aha! There we go!

Now little timmy is behaving properly as soon as I start training within a few seconds.
All of those 60 mins were wasted :sob:

There we go. I’m guessing the bias part is the confusing one, isn’t it?

Yep, sure is. I also have a hard time grasping optimal neural network structures or when to use which activation function. So far trying out Brilliant’s free trial course did help a little.

Well, now that I have learnt how to train a singular agent… what about multiple agents, a MARL(Multi-Agent Reinforcement Learning) approach? Is this possible with your library? How would I go about doing something like this? Perhaps for example a bunch of AIs that maybe work together to solve an issue(i.e the hide and seek AI OpenAI developed). I heard that there are some algorithms that are specifically catered to this MARL approach, do you have any in your library?

And is it possible for the trained model to work with dynamic numbers of NPCs? So if it was originally trained to work with 2 agents and 2 NPCs, could it work with 5 agents and 5 NPCs?

Yah that is possible for MARL. There are different types though: Designing Multi-Agents systems - Hugging Face Deep RL Course

Read up the distributed training tutorial in the API documentation. You need to know about DistributedGradients part. You’re going to have to share the model parameters with the main model parameters.

Also look into extendUpdateFunction() and extendEpisodeUpdateFunction() for ReinforcementLearningQuickSetup.

For your second question, I don’t know lol. Why not experiment it out first?

1 Like

Well, I was thinking of using MARL to create an NPC ecosystem, think of it as a bunch of soldier NPCs working together to patrol base/defend it from player intervention. With this idea in mind, I don’t have much experience to form proper inputs for this sort of AI, do you perhaps know any starter inputs I can give them to train a foundation?

There will be only one base or multiple bases? Because if it is multiple bases, there’s a different set of inputs and actions they will be taking.

Hi guys!

If anyone wants to hire me as a freelance machine learning engineer for your large project, you can do so at Fiverr here:

https://www.fiverr.com/s/W9YopQ

Also, please do share this to your social media! I would greatly appreciate it!

1 Like

Think of it as Metal Gear Solid 5’s setup, multiple outpost bases all around a map with incoming AI reinforcements coming in from other bases to support nearby ones where the player is spotted trying to infiltrate/ demolish. Sort of like one whole giant hive with separate houses. Each distribute themselves equally and only send out a few footmen per infiltration detected around them, sorry if it sounds very confusing im not very good at explaining things :sweat_smile: .

I have a couple ideas for how agents can be used here, maybe one main agent that manages all of the outposts at the same time? Or a couple agents in a MARL environment that control their own bases and communicate with values when they need help. But whenever I think of any of these approaches my mind goes blank because the inputs that would be required and the reward functions needed dont come to my mind at all.

If this sort of concept isn’t possible, I totally understand that and also have another idea, I also wanted to train an agent to work alongside other soldiers to form a sort of artificial squad/brotherhood among agents of 3-5 quantities. They would stick together and deal with threats using their own set of policies. Essentially groups of footsoldiers that can communicate and deal with player threats in an unexpected and smart manner using what they learn and evolve with through training.

Okay let’s try this then.

Settings for the agent that controls the strategies for all outpost

  • Use a single AdvantageActorCritic, PPO or PPO-Clip. (Do not use any other models)

  • The Actor will use Stable Soft max for last layer. You will also need to set returnOriginalOutput to true in ReinforcementLearningQuickSetup.

  • Each of actor’s output will represent a single outpost.

  • Since we’re using stable soft max, it will output probabilities between 0 and 1. We will multiply the number of total troops from all outpost with the probability, which will give the number of troops that should stay in that outpost.

  • When the number of troops have been determined, use Roblox’s pathfinding to reach the outpost.

  • For reward, maybe punish based on number of troop lost, and reward based on total damage done to the enemy. Also punish if the outpost is destroyed or any attempts on sending troops to destroyed outpost.

  • For the input, you might want to reserve some inputs slot that determines if any of the outpost is destroyed or not so that it can recognize when not to send the troops to destroyed outpost.

Settings for agents controlling individual troops

  • Use any models that you see fit. Make sure the model parameters are shared.

  • Once the troop reaches the outpost using Roblox pathfinding, use the usual reinforcement learning technique unless they are attacked midway.

  • For the reward, I think you can do this easily.

1 Like

Excellent! Thank you so much, i’ll start by training the agent to manage individual troops. One thing though, i’m not very sure what inputs I can give the agent controlling the troops, since the number of soldiers can be dynamic and wont really be the same all time time especially when they die. (e.g in order to teach them to stick together in squads, ambush together or set their foundation in a way such that they can learn expert techniques in combat later on in the future in their training like automatic dynamic ambushes/cutting off choke points etc)

And what type of model would you advise me to use for them? I am thinking of experimenting with PPO since I read up that they factor in small changes that make agents refine their correct actions further than exploring wrong actions and later on learn more complex actions in the future, so far I have tried out A2C but it does take a while for them to learn compared to DoubleQLearning.