[Added SAC, DDPG and TD3] DataPredict [Release 2.0] - Machine Learning And Deep Learning Library (Learning AIs, Generative AIs, and more!)

alpharomeo0101 · May 18, 2024, 11:06am

There is more to is its just the fact about if it is compatible since there will be more.

MYOriginsWorkshop · May 18, 2024, 11:10am

Oh alright then. Just make note that it will take quite a while to train. I recommend you create a hard-coded AI that represent the player or use this library to create AIs that learns which would be representing the player.

alpharomeo0101 · May 18, 2024, 11:12am

Alright, thank you so much. Will try that.

noisecooldeadpool362 · May 18, 2024, 11:16am

One more thing, do you have any recommended sources for learning Deep Learning and reinforcement learning? Usually I learn best from graphic sources found on youtube, what do you think?

MYOriginsWorkshop · May 18, 2024, 11:25am

Unfortunately, I don’t have any recommendations that suits your criteria. Majority of the knowledge came from extensive reading from articles and research papers.

But if you really want to understand deep learning from the start, then I’d recommend you look into Andrew Ng Deep Learning courses.

noisecooldeadpool362 · May 18, 2024, 12:07pm

What Neuron Structure or settings do you recommend in my scenario to get optimised learning?

MYOriginsWorkshop · May 18, 2024, 12:21pm

Enable the bias for the first layer.

Then change QLearningNeuralNetwork to DoubleQLearningNeuralNetwork. Apparently the original version have issues according to several research papers.

noisecooldeadpool362 · May 18, 2024, 12:56pm

Assuming you referred to DoubleQLearningNeuralNetworkV2, I switched to that and flipped my first bias to true, however, the results are still the same. Is such a simple task really gonna take a long time to train? or is it that my RL Agent still is not working properly?

MYOriginsWorkshop · May 18, 2024, 1:01pm

Yes. It will take quite a bit of time to train.

I did something similar to yours (the combination of numbers part) and pretty much managed to get 70% accuracy.

That being said, accuracy isn’t the best metric to measure the performance though in reinforcement learning area. We usually use average reward per time step.

noisecooldeadpool362 · May 18, 2024, 1:15pm

Well then, moving on from my old experiment, what about humanoids? I experimented with a genetic algorithm previously that I managed to optimise the time-to-train by letting the Agent spawn and train with multiple NPC Humanoids at the same time. Is it possible for such an approach or similar approach in QLearning/PPO?

In mind, I have an idea for an RL NPC that has a goal of walking towards a point. For example, what if there was a goal node somewhere on the map and an RL Agent needed to figure out how to get there with the NPC. Could I spawn in multiple NPCs for the RL Agent to train with to quicken the time to train or would it be impossible/the same as training with 1 NPC?

MYOriginsWorkshop · May 18, 2024, 1:51pm

Yes there is actually. I purposely designed the library to handle multiple agents at once. But I’ll give you the easiest one that is similar to genetic algorithm. You have to code it yourself since there are so many variations of the environment I need to cover just to implement the functionality.

Here are the steps:

Create multiple agents, each have their own model parameters. Also find a place to track total rewards it receives for a given time for each of the agents.
Run the agents in the environment and let them collect rewards.
After each time interval, copy the agent model parameters that have highest total reward value and load them into other agents. Then reset the total reward values for all agents.
Repeat step 2 and 3 until you feel the performance is good.

noisecooldeadpool362 · May 18, 2024, 2:03pm

Just curious, Is this the method you use for your sword-fighting AI? I recognise this from when I was browsing through your code.

Also, by different model parameters, how would I generate new ones or swap them out?

MYOriginsWorkshop · May 18, 2024, 2:06pm

Not really. Now you mentioned it, I think I should do that.

Anyways for the model parameters, just call getModelParameters() and setModelParameters(). These functions are only available for models that inherits the BaseModel class. So you have to look at the API documentation to determine which one. Though, if you’re messing with the NeuralNetwork directly, I’m sure you can call it.

Also the generation of model parameters are automatic upon training.

noisecooldeadpool362 · May 18, 2024, 2:09pm

Nice, and would it be better if I used PPO instead for this exercise? I had read a paper earlier on about OpenAI utilising this algorithm to train their Hide and Seek AI. They quoted that it was beneficial for them compared to traditional Q-Learning and that they would set it as the default Learning Algorithm for their future AI-related projects.

Initially, I did try using the PPO implementation for your library but got confused after hooking up the Actor and Critic models since they did not work when I tried training an agent using it.

What are your thoughts?

MYOriginsWorkshop · May 18, 2024, 2:17pm

I recommend Advantage Actor Critic (A2C) over PPO. I really don’t want to go too much into details, but how the mathematics interacts with the real world matters very much. I don’t think the researchers realized that.

PPO might be more sample efficient than A2C, but it has limitations on where you can implement them.

For general purpose use, stick with A2C.

MYOriginsWorkshop · May 18, 2024, 2:20pm

Also, if you’re confused on how to use actor-critic methods, you can refer to the sword-fighting AIs codes in the main post.

noisecooldeadpool362 · May 18, 2024, 2:30pm

I think for starters ill stick to DoubleQLearning to familiarise myself first then before moving on.

I’ll be back once I get it working!

noisecooldeadpool362 · May 19, 2024, 1:13am

I think I realised a mistake in your sword-fighting code, I’m not sure if you fixed it yet but relativeRot is likely to slow your training because it compares how close the angles that 2 points face. So, if your NPC and the target face 90 degrees perpendicularly, the relativeRot would be 0, but if the NPC were to face the target directly and the target were not facing the NPC, the relativeRot would not be 0. I’m not sure if this was intentional but here’s a headsup.

Same here as well for ‘rotationValueNeededToFaceTheEnemy’

noisecooldeadpool362 · May 19, 2024, 1:35am

Here is a corrected version of your script to calculate the RotationError:

-- Function to calculate the angle between two vectors
local function calculateAngle(vector1, vector2)
	local dotProduct = vector1:Dot(vector2)
	local magnitudeProduct = vector1.Magnitude * vector2.Magnitude
	local cosineOfAngle = dotProduct / magnitudeProduct
	local angle = math.acos(cosineOfAngle)
	return math.deg(angle) -- Convert from radians to degrees
end

-- Function to calculate the required rotation for the NPC to face the target part
local function getTurnAngle(BasePart, targetPart)
	local npcPosition = BasePart.Position
	local targetPosition = targetPart.Position

	-- Calculate the direction vector from the NPC to the target part
	local directionToTarget = (targetPosition - npcPosition).unit

	-- Get the NPC's current forward direction (assuming NPC is oriented along the Z-axis)
	local npcForward = BasePart.CFrame.LookVector

	-- Calculate the angle between the NPC's forward direction and the direction to the target
	local angle = calculateAngle(npcForward, directionToTarget)

	-- Determine if the target is to the left or right of the NPC
	local crossProduct = npcForward:Cross(directionToTarget)
	if crossProduct.Y < 0 then
		angle = -angle -- If the cross product's Y component is negative, the target is to the left
	end

	return angle
end

Notes are inside incase anyone wants to understand the script

noisecooldeadpool362 · May 19, 2024, 4:26am

I let it train for a while with the goal of guiding the NPC to the node on the left, and it does do that most of the time but for some reason it makes a huge detour and goes completely away from the node before coming back to it.

How do I fix it? What’s wrong?