My neural network is too slow for real-time use on NPCs

I’m currently busy writing a neural network library for smart/adaptive NPCs.

Eventually I’d like to be able to use it to emulate fake players in a game for instance, when there are too few players in a server I can spawn in a few bots that were trained to behave like real players.

The problem I am currently facing however is that the neural network is too slow for real-time usage.

Doing a 100 calculations per second takes roughly ~0.12 seconds.

Time varies with complexity but this might not even be enough to run more than 5 bots at once (without parallel Luau), even with parallel Luau this is still way too slow.

I am however trying to improve single-core performance so I can potentially simulate at least 16 bots at once, running 20 calculations/forwardings per second at least.

The script that I use to test the code.

local nnmod = require( game.ReplicatedStorage.neuranet)


local nn = nnmod.new_neuralnet(6, {8, 8, 8, 6})
print(nn)
task.wait(1)


local t     = os.clock()


task.desynchronize()

for i = 1, 100 do
	
	for index, node  in  nn.input
	do
		node.v = math.random()
	end
	
	nnmod.forward(nn, true)
end

task.synchronize()


local delta = os.clock() - t


warn("[NeuroNet] Took " .. delta .. " seconds to complete forwarding.")

nnmod.size_of(nn)

Below is the code primarily responsible for forwarding and processing.

--!optimize 2



local tymo      = require(script.Parent.base._types)

local funcs     = require(script.Parent.base._functions)

local _settings = require(script.Parent.base._settings)



local module = {}


-- Shortcuts.

local abs  = math.abs
local sin  = math.sin
local cos  = math.cos
local exp  = math.exp

local rand = math.random





--[[
	========================================
	
	Node functions.
	
	========================================
]]--


-- Activate the node with it's assigned function.

@native
function module.activate(
	no  : tymo.node,
	val : number
)

	no.v = funcs.func_list[ no.f ] (val)

end




--[[
	========================================
	
	Layer functions.
	
	========================================
]]--



--[[
	Arguably one of the most important functions.
	Processes all the nodes from one layer towards the target node.
	This function is responsible for running the neural network.
]]


@native
function module.forward_node(
	from_layer  : tymo.node_layer,
	target_node : tymo.node
)

	-- We sum everything together here.
	local sum : number = target_node.bw


	-- Loop through every node in the previous layer.
	for i = 1, #from_layer
	do
		-- Cache values from node for code readability.

		local from_val   : number = from_layer[i].v  or 0 -- Value from starting node.

		local target_con : number = target_node.c[i] or 0 -- Target node's connection to previous layer.
		local target_wei : number = target_node.w    or 0 -- Target node's weight.


		-- Perform the maths.

		sum *= (
			(from_val * target_con) * target_wei
		)
	end


	-- Check bias.
	if abs(sum) < abs(target_node.b)
	then
		sum += target_node.b -- Add if less than bias.
	end


	-- Finally activate the node.
	module.activate(target_node, sum)

end





--[[
	This function basically just wraps the forward_node() function into a loop.
	Makes coding infinitely easier and simplifies functions by splitting up logic.
]]


@native
function module.forward_layer(
	from_layer :  tymo.node_layer,
	to_layer   : {tymo.node}
)

	-- LoOoOoOoP where we forward to every node in the target layer!

	for index, target_node  in  to_layer
	do
		module.forward_node(
			from_layer, target_node -- From layer > target node.
		)
	end

end






--[[
	========================================
	
	Neural network.
	
	========================================
]]



-- Runs the entire neural network, yippee!

@native
function module.forward(net : tymo.neuralnetwork, debugging : boolean?)

	-- Forward the entire layer.

	for i, layer in net.layers
	do
		module.forward_layer(
			net.layers[i - 1] or net.input,
			layer
		)
	end


	-- Code below is just for debugging.

	if not debugging then return end


	local last_layer : tymo.node_layer = net.layers[ #net.layers ]

	for k, v  in  last_layer
	do
		print("Node " .. k .. " = " .. v.v)
	end

end




-- Checks the size and complexity of the neural network.

function module.size_of(net : tymo.neuralnetwork) : number

	local node_count : number = #net.input


	for index, layer  in  net.layers
	do
		node_count += #layer
	end


	warn("Size of neural network: " .. node_count ..  " nodes.")

	return node_count
end






return module

Function module that contains various activation / value processing functions.

--!optimize 2


local module = {}


local tymo = require(script.Parent._types)

local _settings = require(script.Parent._settings)





-- Shortcuts.

local abs  = math.abs
local sin  = math.sin
local cos  = math.cos
local exp  = math.exp

local rand = math.random










-- Real business starts beyond this point.


--[[
	========================================
	
	Utility functions.
	
	========================================
]]--




-- 2 functions for validating node connections after addition/removal.

function module.validate_addition(
	target_node     : tymo.node,
	connected_layer : tymo.node_layer?
)

	if not connected_layer
	then
		error("Layer " ..tostring(connected_layer).. " does not exist.")

	elseif (#target_node.c + 1) > #connected_layer
	then
		error("Connections count wouldn't match.")
	end

end


function module.validate_removal(
	target_node     : tymo.node,
	connected_layer : tymo.node_layer?
)

	if not connected_layer
	then
		error("Layer " ..tostring(connected_layer).. " does not exist.")
		
	elseif (#target_node.c - 1) < #connected_layer
	then
		error("Connections count wouldn't match.")
	end

end






--[[
	========================================
	
	Simple math utility functions.
	
	========================================
]]--





-- Returns a random number with range (-1 .. 1).

@native
local function random() : number

	--return ( rand() + rand() ) - 1

	return ( rand() - rand() ) -- Might be slightly faster? Results seem the same as above.
end



-- Converts (0 .. 1) to (-1 .. 1).

@native
local function normalize( v : number ) : number
	return (v * 2) - 1
end



-- Converts (-1 .. 1) to (0 .. 1).

@native
local function unnormalize( v : number ) : number
	return (v + 1) / 2
end



-- Normalize a number to ensure it never goes outside the (-1 .. 1) range.

@native
local function limit( v : number ) : number

	local num : number = unnormalize(v) % 1

	return normalize(num)
end


module.random      = random
module.normalize   = normalize
module.unnormalize = unnormalize
module.limit       = limit









--[[
	========================================
	
	-- Node activation functions.
	
	========================================
]]



-- moduleified hyper-bolic tangent.

--local function htan_plus(x : number, a : number, b : number, c : number, d : number) : number
--	return
--		(exp(x * a) - exp(-x * b)) /
--		(exp(x * c) + exp(-x * d))
--end



-- Linear curve with clamping.

local function linear(x : number)
	return math.clamp(x, -9.999, 9.999)
end



-- Sigmoid curve.

@native
local function sigmoid(x : number) : number
	return 1 / (1 + exp(-x))
end



-- Wrap map curve.

@native
local function wrap(x : number)
	return ((x + 1) % 2) - 1
end



-- ReLu curve.

local function relu(x : number) : number
	return math.max(x, 0)
end



-- Reverse ReLu curve.

local function rev_relu(x : number) : number
	return math.min(x, 0)
end


-- Pi sine curve.

@native
local function pi_sine(x : number) : number
	return sin(x * math.pi)
end



--[[
	Random activation function.
	Can be used to add "randomness" to a neural network.
	
	Useful if you want the result to be different
	even when the input is the same.
]]


@native
local function aran(x : number)
	return x + (random() * _settings.def.random_scale)
end









--[[
	A list of all activation functions.
	
	WARNING: modifying this list will have consequences
	and breaks every neural network that depends on this specific order.
	
	If you wish to add your own functions
	you should always append TO THE BOTTOM of the list.
	
	Neural networks are not backwards-compatible with older versions of this list.
]]

module.func_list = { -- Whole table is made all at once since it might be more optimized.

	linear;
	sigmoid;
	wrap;
	relu;
	rev_relu;
	pi_sine;
	aran;
	math.tanh;
	math.sin;
	math.cos;
	math.round;
}



-- Picks a random function from the function list.

function module.random_function() : number

	local size = #module.func_list

	return math.random(1, size)
end






return module

The output.

 12:22:44.210  [NeuroNet]   Created input layer with 6 inputs.  -  Server - constructor:142
  12:22:44.210   ▶ [NeuralNet]   Created layer with 8 nodes. (x3)  -  Server - constructor:115
  12:22:44.210  [NeuralNet]   Created layer with 6 nodes.  -  Server - constructor:115
  12:22:44.211   ▶ {...}  -  Server - neural net test:8

  12:22:45.231  Node 1 = 0  -  Server - processor:180
  12:22:45.231  Node 2 = -0.05600904476120605  -  Server - processor:180
  12:22:45.231  Node 3 = 0  -  Server - processor:180
  12:22:45.231  Node 4 = 0.18197623951841124  -  Server - processor:180
  12:22:45.231  Node 5 = 0.6186482173094424  -  Server - processor:180
  12:22:45.231  Node 6 = -0.42393142440776876  -  Server - processor:180

...

  12:22:45.327  [NeuroNet] Took 0.09615569999914442 seconds to complete forwarding.  -  Server - neural net test:36
  12:22:45.327  Size of neural network: 36 nodes.  -  Server - processor:201

I’m also totally aware that some activation functions could be slow.
They are necessary however for the most part since this algorithm is inspired by NEAT and I wanted to give every node the potential to have it’s own/unique activation function to allow for more complex interaction between nodes while requiring a much smaller neural network.

I might need some approximations or cheaper alternatives that can give roughly the same results.

Thanks in advance.

Files

If you need the module itself to look at I can DM it, I preferably would like to not share it here since it’s not fully done and ready to be open-sourced just YET.

Eventually I might want to release this module to the public so that more developers can create “intelligent” NPCs.

Real-time performance is very important since you’re essentially supposed to use this for sword fighting, shooting and parkour NPCs that learn and adjust to their environments.

5 Likes

by the neural network being too slow, do u mean that the npc’s actions will be delayed?
i didnt understand that well, so srry if its very obvious

1 Like

Neural Networks are slow and expensive, and overall, I do not believe it is practical to use them for standard NPCs, which have technically millions of potential outputs, yet are comparatively simple to program manually. Text-only NNs already require so much computing power, and there is no doubt in my mind a 100% NN-powered NPC trying to navigate a 3D environment will be millions of times more computationally expensive and difficult to fine-tune than manually programming behaviors.

Of course, I cannot force you to do anything, but I would advise against spending a significant amount of time on something unpredictable for a small part of your game.

And, on the actual question, I am somewhat surprised to see the --!optimize 2, but not --!native. This code seems like a good candidate for Native CodeGen, as it is primarily on the Luau-side and doesn’t interact much with Engine APIs.

5 Likes

The reason for this actually is because I manually mark functions as native using the @native keyword.

I know for a fact that not every function needs / benefits from native code-gen and it saves up memory to not use it on every single function.
The mutate functions and whatnot for instance are not that slow.

This is actually not entirely true.
These neural networks are by no means supposed to have millions of nodes/neurons.

These neural networks are supposed to be low-complexity to make them light-weight enough to run on a CPU without the need for GPU acceleration.

The idea

The general idea is that AI is trained using raycasts and sensors that capture only the most basic information such as distance or what direction an enemy is moving towards which will typically only have about 6 - 12 inputs.

Most neural networks used for games have no more than 50 nodes, the complexity is nowhere near what ChatGPT or Stable Diffusion is for instance.

Game engines like Unity and Unreal have plugins for neural networks that allows you to train simple low-complexity AI models that can perform tasks in real time.

I’m trying to replicate such a system in Roblox and I KNOW for sure that this is possible.

I’ve also seen and played games that have adaptive AI where a neural network is used to make AI less predictable and learn from the player’s actions.

It is true that most basic behavior can be hard-coded into an NPC, but I wish to create something that is much harder to do with manual coding and works better with a neural network.

Neural networks like this will mostly be used for AI that learn how to parkour or sword fight in realistic and adaptable ways, I want an AI system where AI must adapt to the environments of the game to create more immersive and realistic gameplay.

2 Likes

So basically, the current problem is that if you do too many calculations per second, it will essentially hog all the CPU resources, leaving no room for other gameplay logic.

Having a few neural networks doing a few calculations per second is fine.
A single neural network can complete processing information in a single frame.

This is fast enough for maybe a few NPCs.

If I do this in parallel luau, the heavy workload is put on a different CPU core that is less busy.
But this will still become a problem if I have let’s say… 30 NPCs.

I have a 16 core processor, for me running multiple neural networks is not an issue, but it may be for Roblox servers and client-sided neural networks, where I assume I’ll have less cores to work with.

I need to find a way to optimize the processing logic just enough so I can have a few dozen NPCs doing 20 calculations per second for each neural network.

hm, maybe replace sigmoid with relu?
u could also use roblox micro profiler to check which part of ur game takes most time on code, then u can see which problem

and maybe precompute the weights and the biases outside of the frame calculations thing (if possible) idk if u have done it or not

(note i only learned neural networks like 2 months ago, so i may be wronged)

but maybe it wont optimize it enough or not im not sure

1 Like

Well, I cannot exactly replace any activation function since then the resulting math would be drastically different.

Activation functions basically process the final value of a node and transform it using some sort of “curve”.
And Sigmoid functions basically produce a S-shaped curve.

What I COULD potentially do however, is make maybe slightly less accurate but faster alternatives for every activation function.

I’ve considered if math.sin() could potentially replace Sigmoid but it’s shape might be too different.

But not just the Sigmoid function need to be optimized, the other functions as well.
My neural network differs from the typical.

Usually you have one layer of nodes that uses the same function, but my neural network allows every node to have it’s own activation function.
As a result this allows potentially more complex calculations with less nodes but I still have to test that.

But that’s not the point here, I just need to find a way to make this neural network library itself faster.
Seeing if there’s a way to squeeze more performance out of the forwarding and activation functions.

I think I already posted the code here that matters most really, the rest of the modules are pretty irrelevant.

Regardless what you do, or how many cores you have, ROBLOX is capped at 2 threads iirc.

This looks really cool though… worlds ahead of my coding knowledge on luau.

From my knowledge Roblox distributes actors across all available processor cores on a computer.
My computer has 16 cores so Roblox can distribute actors across 32 threads total.

Now after a bit more testing and whatnot I did find that my neural network is not as slow as I thought.
Doing a few hundred calculations per second is entirely doable and possible.

I’m looking into faster/approximate alternatives for the math.tanh and sigmoid function since those are likely to be the slowest.

My module is pretty much ready for use now and I’ve published it, though I still leave this thread open in case people can help optimize what I currently have.
Oh and there might still be some potential bugs with it of course.

I’m considering opening a feedback post on the forums where people can help test and try out this module and suggest potential features or improvements to it.

So far it can already do these things:

  • Accept input and return an output array.

  • Mutate it’s entire structure, including amount of layers and nodes in layers.

  • Be able to JSON encode/decode the neural network so you can copy/paste/save it (untested).

  • Multi-threaded by default, every neural network has it’s own actor instance. Communication happens through bindable events and functions.

  • Both client and server compatible (untested but should work flawlessly in theory since the code is identical on both ends).

1 Like

#help-and-feedback:code-review

1 Like

Because you can’t efficiently batch on Roblox, any machine learning model you attempt to utilize directly in Roblox will end up being slow. The ideal solution would be for Roblox to allow something like Websockets or the HTTP CONNECT method to allow a live relay between your npcs and your model hosted on a server feeding them information on what to do.

With all that said, a neural network is not at all necessary to simulate fake players, though I understand the desire to make these bots indistinguishable from the normal player.

Fake players is honestly just an example usage.
These neural networks are intended to be usable both on client and server in Roblox for any purpose really.

It’s a general-purpose self-modifying neural network for making experimental games and NPCs.

Yes, any decent programmer could manually code most AI behavior.
The point of this however it to add more unpredictability and adaptability to it that would otherwise be extremely hard to manually code.

This neural network doesn’t even need to be used 30 times a second.
You could also have things like movement, attacking, etc pre-coded into an AI and simply use this neural network for decision making (e.g. which target to attack first or when to reload a weapon).

It’s all general purpose, it’s complexity changes during mutation.
You don’t need much knowledge on neural networks to train it because it will find it’s ideal structure.

I’m trying to design this thing as light-weight as possible so it can run relatively fast on a dual-core CPU for instance.

Theoretically, it SHOULD be fast, because most neural networks have no more than 4 - 5 layers and 6 - 8 nodes per layer which normally runs pretty fast.

I suppose some things might also be a Luau limitation, it is slightly slower than C++ and there’s some potential overhead from the garbage collector (I’ve tried minimizing this by not creating objects or tables during convolution).

I want to prove that neural networks in Roblox is entirely possible and COULD be worth it if used right.

I know at least that running 10 - 20 neural networks simultaneously on an old laptop is possible and doesn’t have a huge performance impact if you spread out the convolutions over a whole second instead of trying to do everything in a single frame.

My approach to making neural networks on Roblox has been to train them offsite and import the weights to reconstruct the model. Any limitation by Luau shouldn’t be a problem.

What I would instead think about is basically what your end goal for this. Seamless integration would require sub-millesecond compute times for multiple actions and a high enough prediction accuracy that it wouldn’t seem off to the average player. As far as the projects I have seen this just hasn’t been made possible before.

In terms of collecting data to train (and test) a model like this I would record key inputs from preexisting players and then use that as your training set, using some modern things like flash attention to connect future bot actions to their action history. This is a really big undertaking for just one person (the data collection, cleaning, and interpreting alone), so I would recommend probably a different possible use case: chat bots.

One of the biggest hurdles in the Roblox game space is user-retention. Figuring out a way to integrate neural networks into an existing game as a sort of new-player guide would be a lot more influential I believe. You would have a better time using transfer learning here as you could use preexisting models that are already state of the art, just importing their weights and rebuilding them, alongside some post-training based on the specific experience. I’ve seen some examples that use external servers to tackle this, but you can 100% do this within Roblox.

I’m a little tired, so please let me know if any of this doesn’t make sense and I’ll try to explain it better.

1 Like

Chatbots require way too much complexity for it to be efficient.

My neural network is actually based on NEAT and also Unity/Unreal Engine plugins that allow you to use reinforcement learning to train AI for games.

You could theoretically train a chatbot with this but it would be too resource-intensive and take way too long because it’s generation-based.

I developed this little project just so you can make AI in games that learn parkour, sword fighting, shooting or other simple tasks and adapt to players in a server essentially.

It’s a game AI, nothing more or less than that really.

Neural networks can potentially also be used for procedural level generation where a neural network decides how the level is built and structured but that would of course require a dataset and a lot of manual training.

Teaching AI to sword fight or parkour is not that hard actually and can simply be done by spawning small batches of NPCs and picking the best one out of each generation.