Oof, very tricky stuff there. But let’s try anyways. I still don’t enough context but we’ll go on anyways.
You may want to change the model to AdvantageActorCritic model because Double Q Learning isn’t quite suitable for that task. I really don’t want to explain what are the differences between models, so just trust me.
We will also split the location-based abilities and movement to separate 2 models. Otherwise, I expect it will make training longer and harder.
For location based abilities, take the three output as your vector from the first model. And only call reinforce() once you deploy the location-based abilities for few seconds so that you can calculate the reward. You may also want to set return original output to “true” and use those values instead.
For movement, just reuse the same code that I have given in the sword fighting AIs (version 2), but you may want to adjust it.
I recommend that you guys update the whole DataPredict Library (Release 1.10) right now because there were issues on reward calculations for a number of models that belongs to actor-critic architectures. This resulted in slightly slower training of AIs.
Once you update that, you can start to see your AI’s to train really fast!
Hey. I’m looking to utilize this module for a horror game NPC, but I’m not sure where to start.
I currently have an AI that’s hard-coded with SimplePath to find and detect players. When they detect a player, they initiate a Stalking mode. This stalking mode has three stages:
0 - Wandering, no target
1 - Found target, slowly engages. Doesn’t get too close, backs up if target approaches. Goes to two after a few seconds.
2 - Goes to a hiding spot near the player. After a few seconds, if player hasn’t found them, they will initiate a chase.
3 - Initiated a chase. Will last until target is dead.
What I’d like for the AI to do, in order of how I’d like to code it:
Choose to do certain actions: He can choose to prolong advancing from a stalking stage. He would wait for a good opportunity before advancing from Stage Two to Stage Three.
Adapt in real-time to player statistics: He will have a natural priority towards players who are weakened, or less equipped.
I’d like to keep a good majority of the AI hard-coded, but I’d like to make it so he can choose to perform certain actions to make him seem natural. Is this possible to do with DataPredict?
Tutorials are given in the documentation. You may want to read:
Creating Our First Model
Using Neural Networks Part 1 & 2
Using Neural Networks With Reinforcement Learning
For the models, you can choose the ones that are under “Classification + Reinforcement Learning”.
My recommended models that probably versatile in most use cases is the Advantage Actor Critic. Though it may be a hassle to set up, so have a look at the source code provided to you for the self-learning sword-fighting AIs (version 2) for reference.
Hello everyone! It seems like I will stop maintaining this library as there are no more features to be added and there are no known bugs.
I’d like to know if there are any bugs found on your side before I completely focus on my other personal projects. Once I focus on other projects, expect very slow bug fixes for this library.
All models have their target cost removed from their parameters and moved to BaseModel class. To change the target cost, please use setTargetCost() function from the BaseModel class.
All models now have the ability to check for convergence (i.e. the cost stays the same for number of iterations). To set this, please use setNumberOfIterationsToCheckIfConverged() function from the BaseModel class.
Expect a lot of API breaking changes due to target cost parameter removal. Please change your code accordingly.
Since people like reinforcement learning (a.k.a. self-learning AIs) stuff from this library, would you like me to make video courses related to it. This will allow you to understand the concepts and apply them effectively to your games so that you can build more effective AIs.
Do you want me to create those videos?
Yes.
No.
0voters
If yes, what kind of video content you prefer to see?
Multiple, short videos, where each video covers a section of reinforcement learning.
One long video that covers all the sections for reinforcement learning.
0voters
If yes at the first question, would you like to see practical applications in those videos using DataPredict?
Yes, at the every end of section.
Yes, within every section (e.g. show code while teaching theory)
Also before any video, you should create a ‘script’ that you can read out of. This way helps you prevent from stuttering or producing ‘uh’, ‘um’ sounds. It helps providing to the audience a more coherent and concise teaching
I’m not sure if datapredict would be used for this case, but would it be possible to make some sort of adapting NPC?
So let’s say we have a break-in style game with an NPC that tries to catch the player when they break in their house. What I’m thinking is that the NPC would be trained to a certain point, and that model itself would be used at the start of the game, but as the player progresses, the NPC adapts to match the player. This creates a consequence for carelessness as if the player fools around and keeps getting caught by the NPC, the NPC would have time to adapt, compared to having to beat the game strategically in a fast way which involves avoiding getting caught. If the player isn’t inside of the NPC house, the NPC would just be doing normal tasks like patrolling etc.
How would we create something like that? Would we just simply train the NPC module to predict player’s future positioning and have it path-find there? Or would we give the NPC full autonomy of his model, (The NPC would rely on raycasts for walls/doors/etc alongside other inputs such as the player’s position relative to theirs, The NPC would use the provided information and would move based off its outputs, rather than moving to a predicted position)? Regarding the correct approach, would it be alright if you could provide an step-by-step explanation on its implementation (If you want to)? (I’m kind of new to this. I did look at some youtube videos but I’m still a bit lost).
Seems like possible in theory, but I question the amount of training data it needs. And I mean for both approaches.
For your full autonomy approach:
Stick with Advantage Actor Critic Model
Try stimulating “curiosity” using Random Network Distillation (Shown in tutorial under the documentation). However this is for Beta version of the library.
You may want to make inputs to take in previous data as well as the newly collected one.
For your predict player positioning approach:
Any models can work. Though, I’m still learning on the Advantage Actor Critic Model.
You may want to make inputs to take in previous data as well as the newly collected one.
Simulating “curiosity” might not needed.
Unfortunately, I won’t be giving any tutorials right now. I’m working on the release version for the library. Trying to bring some useful research papers and convert it here like the most recently added “curiosity”.
Hello again! It’s been a while, I started craving programming earlier and decided to start with neural network. I used your module for ‘car chase’ in which the cars have to chase the target part.
I let them train for 4 - 5 hours, came back and found that they did not converge or improve. Is something wrong with my implementation?
Added RandomNetworkDistillation for stimulating curiosity!
Changes
All version 1 double network reinforcement learning models no longer have setModelParametersArray() and getModelParametersArray() functions. It is replaced with setModelParameters1(), setModelParameters2(), getModelParameters1(), getModelParameters2() functions.
Improved PrioritizedExperienceReplay.
Changed the way how QLearningNeuralNetwork, StateActionRewardStateActionNeuralNetwork, ExpectedStateActionRewardStateActionNeuralNetwork and its variants calculate the loss values internally.
Recently, I made some improvements to neural network calculations and made some minor internal changes to some models. I recommend you guys replace the old version of the library with the new one. This applies to the latest release and beta versions. Make sure to update it as soon as possible.
Do you have any advice on how to modify the layers/configuration/senses functions to my model to make it better? I am working on creating some fighting npcs. This is my code: NPCSenses.lua · GitHub I can’t really tell if the model is improving during training.
I am just training them out in the open like this:
Something I noticed is the params weights all look the same and they slowly keep getting more and more negative the longer I train them:
Start:
After about 8 minutes:
I’ll list them from highest importance to lower ones.
Issue 1: Bad inputs.
You’re using current position. Remove that since it does not contain useful information and likely make the NPC learn much more slower.
Remove Y rotation or change how it is calculated. Currently its calculating its own rotation and not relative rotation to the target. It is useless information for the NPC and makes the learning slower.
Issue 2: Reward values probably need a little bit of tweaking.
Rarer events = Higher reward value. Right now its being biased to negative values since the NPC probably might have encounter negative rewards way too often.
Issue 3: Wrong algorithm.
Switch from DQN to Advantage Actor Critic / Proximal Policy Optimization.
Currently the issue is that DQN only takes into account of changing one state to another, but not a whole timeline. What you’re doing is that you’re expecting the NPC to learn the connection between two states, but in reality the actions relies on the whole timeline since it requires future planning.