The bot definitely has to train to learn and in nature would make a lot of mistakes until it has enough data.
It’s important that you save your models data.
As your model increases in size it’s important to consider that for each save key they have a maximum size of 4mb. So parsing your datastructure in anyway you can would be best.
function SaveDataPairs (Datastorkey,ChatBotData,Querie)
-- Create an empty table to store the data pairs
local DataPairs = {}
-- Iterate over the children of the Queries folder
for _, child in ipairs (Querie:GetChildren ()) do
-- Check if the child is a string value object
if child:IsA ("StringValue") then
-- Add the child's name and value to the table as a key-value pair
DataPairs[child.Name] = child.Value
end
end
-- Save the table to the data store with a key of "ChatBotQueries"
local success, result = pcall (function ()
ChatBotData:SetAsync (Datastorkey, DataPairs)
end)
if success then
print ("Data pairs saved successfully")
else
warn ("Data pairs failed to save: " .. result)
end
end
datachange=true
local datachangehandle={}
function LoadDataPairs (Datastorkey,ChatBotData,Querie)
local Querie=Querie
local name
local value
-- Load the table from the data store with the same key that was used to save it
local success, result = pcall (function ()
return ChatBotData:GetAsync (Datastorkey)
end)
if success then
if result then
print ("Data pairs loaded successfully")
-- Iterate over the table
for name, value in pairs (result) do
-- Create a new string value object in the Queries folder with the name and value from each pair
if Querie:FindFirstChild(name)==nil then
local datachanged=true
local StringValue = Instance.new ("StringValue", Querie)
StringValue.Name = name
StringValue.Value = value
datachangehandle[Datastorkey] = function() if datachanged==true then datachanged=false return Datastorkey,ChatBotData,Querie else return false end end
--Learning Function
StringValue:GetPropertyChangedSignal("Value"):Connect(function()
datachanged=true
datachange=true
end)
else
Querie:FindFirstChild(name).Value=value
Querie:FindFirstChild(name):GetPropertyChangedSignal("Value"):Connect(function()
datachanged=true
datachange=true
end)
end
end
else
print ("No data pairs found for this key")
end
else
warn ("Data pairs failed to load: " .. result)
end
end
You can convert the models vector matrix table to a string using a table to string method
1 Like
The bot simply doesn’t learn at all. I’ve trained it for 3000+ rounds before which is more than enough data and yet it failed to learn anything beyond taking the same action again and again. This points to an issue with the implementation, the hyperparameters, or the reward structure. Maybe it could be all of them. A DQN shouldn’t take that long to converge on a policy. If it starts taking the same action over and over again it means that something is wrong. Plus the loss never showed signs of increasing or fluctuating like a normal DQN model would.
2 Likes
If your loss is not fluctuating… There’s a really good chance that there is something wrong with your implementation…
2 Likes
Yeah, that’s what I thought as well. The loss looked like an exponential graph dipping down and never really spiked at all. But like I said earlier the strange thing is the loss looked healthy when I restarted the training and froze the enemy AI. It was even able to learn how to walk toward the enemy.
(Enemy AI not frozen)
Epsilon ~0.308
https://gyazo.com/aa273fa2dc8fc074dc90fb963728e8c1 (Red agent is learning, blue is sort of argmax random policy so it creates an easy target)
I don’t really have any graphs of when the enemy AI was frozen apparently, but here’s what I could find:
One, where do you get that awesome visualization graph?
Two, based on the graph, I think your model has gotten stuck in a local optimum or at a saddle point (google these terms if you don’t understand it).
Have you tried exploitation strategies instead of just exploring? (e.g. Choose a random action per number of actions)
1 Like
I’m using boatbomber’s graphing module: Graph Module- Easily draw graphs of your data
Oh so you mean other action selection policies. Yes, I’ve tried epsilon greedy and Boltzmann softmax policy. Both don’t really solve the issue.
It seems kind of a waste to use machine learning for pathing when you can use default pathfinding get the same results with a robust system.
I think a more effective use would be to give the function the tools it needs to execute the abilities of the NPC. Similar to how our minds handle movement without concious thought. Then you can also utilize a LLM or other AI to provide commentary with a action context. Reward the NPC for successful pathing, landing hits on players and dodging when the player is attacking.
Did you reply to the wrong person? I’m not really doing pathfinding I’m making sword-fighting AIs.
Yea… Likely implementation issue in that case. Usually, those action policies should at least make the model “unstuck”.
I recommend you have a look at QLearningNeuralNetwork and NeuralNetwork source code implementation to see if there are any issues with your implementation. I kind of used research papers and examples from other programming languages to get the correct implementation.
Oh well it seemed we were talking about pathfinding. Regardless I’m just saying, that the easier that you make it for the machine to learn the better results you would get. I think a swordfighting AI would be cool! By manipulating the joints of the character. So you would give the AI model access to the joints and let it do its thing
I’m doing classic sword-fighting using the roblox classic sword. Giving it access to joints would make it a bit more complicated than necessary because this is just one of my first few DQN projects. That sounds like it would add continuous action space, which DQN can’t handle.
Hmm… I would ignore him if I were you. This guy is a bit too confident in his AI knowledge but its more like a filler (empty) knowledge than trying to understand how the whole thing works.
Lol you literally added me on ROBLOX? I don’t appreciate you trying to discount my credibility. You have no idea the levels of research I pursue everyday in the fields of AI and coding in general. So tread lightly
The strange thing is I basically copy-pasted the code from my DQN snake implementation. Which was very successful. For some reason, I didn’t make many videos of the fully trained version but I have some graphs:
“Research”. I have certification, degree in computer science with AI. It’s all in my LinkedIn. Your “research” means nothing if you don’t understand it.
Hmm… Yea. Not too sure what’s the difference the difference in the performance. Anyways, again try using my DQN for your use case.
I also might need to close further discussion for now.
Complaints? You mean out of the 900 people who viewed it the one person who posted gaslighting because they did not understand the title? It is a text-vision module for a LLM. It constructs situational data such as identifying itself, the player and its surroundings.
I’m not talking about the simplicity of what your code can do, but rather I am referring to your codes itself.
Sir, that is not a Large-Language-Model. That is a bunch of “if” statements. I can provide you the mathematics for LLM if you want. LLM don’t use if statements 90% of the time.
It is designed to construct a table of observations.I never said its a LLM its a LLM utility. to give an LLM awareness of its surroundings in text format.