My neural network is too slow for real-time use on NPCs

C_Corpze · January 16, 2025, 11:29am

I’m currently busy writing a neural network library for smart/adaptive NPCs.

Eventually I’d like to be able to use it to emulate fake players in a game for instance, when there are too few players in a server I can spawn in a few bots that were trained to behave like real players.

The problem I am currently facing however is that the neural network is too slow for real-time usage.

Doing a 100 calculations per second takes roughly ~0.12 seconds.

Time varies with complexity but this might not even be enough to run more than 5 bots at once (without parallel Luau), even with parallel Luau this is still way too slow.

I am however trying to improve single-core performance so I can potentially simulate at least 16 bots at once, running 20 calculations/forwardings per second at least.

The script that I use to test the code.

local nnmod = require( game.ReplicatedStorage.neuranet)


local nn = nnmod.new_neuralnet(6, {8, 8, 8, 6})
print(nn)
task.wait(1)


local t     = os.clock()


task.desynchronize()

for i = 1, 100 do
	
	for index, node  in  nn.input
	do
		node.v = math.random()
	end
	
	nnmod.forward(nn, true)
end

task.synchronize()


local delta = os.clock() - t


warn("[NeuroNet] Took " .. delta .. " seconds to complete forwarding.")

nnmod.size_of(nn)

Below is the code primarily responsible for forwarding and processing.

--!optimize 2



local tymo      = require(script.Parent.base._types)

local funcs     = require(script.Parent.base._functions)

local _settings = require(script.Parent.base._settings)



local module = {}


-- Shortcuts.

local abs  = math.abs
local sin  = math.sin
local cos  = math.cos
local exp  = math.exp

local rand = math.random





--[[
	========================================
	
	Node functions.
	
	========================================
]]--


-- Activate the node with it's assigned function.

@native
function module.activate(
	no  : tymo.node,
	val : number
)

	no.v = funcs.func_list[ no.f ] (val)

end




--[[
	========================================
	
	Layer functions.
	
	========================================
]]--



--[[
	Arguably one of the most important functions.
	Processes all the nodes from one layer towards the target node.
	This function is responsible for running the neural network.
]]


@native
function module.forward_node(
	from_layer  : tymo.node_layer,
	target_node : tymo.node
)

	-- We sum everything together here.
	local sum : number = target_node.bw


	-- Loop through every node in the previous layer.
	for i = 1, #from_layer
	do
		-- Cache values from node for code readability.

		local from_val   : number = from_layer[i].v  or 0 -- Value from starting node.

		local target_con : number = target_node.c[i] or 0 -- Target node's connection to previous layer.
		local target_wei : number = target_node.w    or 0 -- Target node's weight.


		-- Perform the maths.

		sum *= (
			(from_val * target_con) * target_wei
		)
	end


	-- Check bias.
	if abs(sum) < abs(target_node.b)
	then
		sum += target_node.b -- Add if less than bias.
	end


	-- Finally activate the node.
	module.activate(target_node, sum)

end





--[[
	This function basically just wraps the forward_node() function into a loop.
	Makes coding infinitely easier and simplifies functions by splitting up logic.
]]


@native
function module.forward_layer(
	from_layer :  tymo.node_layer,
	to_layer   : {tymo.node}
)

	-- LoOoOoOoP where we forward to every node in the target layer!

	for index, target_node  in  to_layer
	do
		module.forward_node(
			from_layer, target_node -- From layer > target node.
		)
	end

end






--[[
	========================================
	
	Neural network.
	
	========================================
]]



-- Runs the entire neural network, yippee!

@native
function module.forward(net : tymo.neuralnetwork, debugging : boolean?)

	-- Forward the entire layer.

	for i, layer in net.layers
	do
		module.forward_layer(
			net.layers[i - 1] or net.input,
			layer
		)
	end


	-- Code below is just for debugging.

	if not debugging then return end


	local last_layer : tymo.node_layer = net.layers[ #net.layers ]

	for k, v  in  last_layer
	do
		print("Node " .. k .. " = " .. v.v)
	end

end




-- Checks the size and complexity of the neural network.

function module.size_of(net : tymo.neuralnetwork) : number

	local node_count : number = #net.input


	for index, layer  in  net.layers
	do
		node_count += #layer
	end


	warn("Size of neural network: " .. node_count ..  " nodes.")

	return node_count
end






return module

Function module that contains various activation / value processing functions.

--!optimize 2


local module = {}


local tymo = require(script.Parent._types)

local _settings = require(script.Parent._settings)





-- Shortcuts.

local abs  = math.abs
local sin  = math.sin
local cos  = math.cos
local exp  = math.exp

local rand = math.random










-- Real business starts beyond this point.


--[[
	========================================
	
	Utility functions.
	
	========================================
]]--




-- 2 functions for validating node connections after addition/removal.

function module.validate_addition(
	target_node     : tymo.node,
	connected_layer : tymo.node_layer?
)

	if not connected_layer
	then
		error("Layer " ..tostring(connected_layer).. " does not exist.")

	elseif (#target_node.c + 1) > #connected_layer
	then
		error("Connections count wouldn't match.")
	end

end


function module.validate_removal(
	target_node     : tymo.node,
	connected_layer : tymo.node_layer?
)

	if not connected_layer
	then
		error("Layer " ..tostring(connected_layer).. " does not exist.")
		
	elseif (#target_node.c - 1) < #connected_layer
	then
		error("Connections count wouldn't match.")
	end

end






--[[
	========================================
	
	Simple math utility functions.
	
	========================================
]]--





-- Returns a random number with range (-1 .. 1).

@native
local function random() : number

	--return ( rand() + rand() ) - 1

	return ( rand() - rand() ) -- Might be slightly faster? Results seem the same as above.
end



-- Converts (0 .. 1) to (-1 .. 1).

@native
local function normalize( v : number ) : number
	return (v * 2) - 1
end



-- Converts (-1 .. 1) to (0 .. 1).

@native
local function unnormalize( v : number ) : number
	return (v + 1) / 2
end



-- Normalize a number to ensure it never goes outside the (-1 .. 1) range.

@native
local function limit( v : number ) : number

	local num : number = unnormalize(v) % 1

	return normalize(num)
end


module.random      = random
module.normalize   = normalize
module.unnormalize = unnormalize
module.limit       = limit









--[[
	========================================
	
	-- Node activation functions.
	
	========================================
]]



-- moduleified hyper-bolic tangent.

--local function htan_plus(x : number, a : number, b : number, c : number, d : number) : number
--	return
--		(exp(x * a) - exp(-x * b)) /
--		(exp(x * c) + exp(-x * d))
--end



-- Linear curve with clamping.

local function linear(x : number)
	return math.clamp(x, -9.999, 9.999)
end



-- Sigmoid curve.

@native
local function sigmoid(x : number) : number
	return 1 / (1 + exp(-x))
end



-- Wrap map curve.

@native
local function wrap(x : number)
	return ((x + 1) % 2) - 1
end



-- ReLu curve.

local function relu(x : number) : number
	return math.max(x, 0)
end



-- Reverse ReLu curve.

local function rev_relu(x : number) : number
	return math.min(x, 0)
end


-- Pi sine curve.

@native
local function pi_sine(x : number) : number
	return sin(x * math.pi)
end



--[[
	Random activation function.
	Can be used to add "randomness" to a neural network.
	
	Useful if you want the result to be different
	even when the input is the same.
]]


@native
local function aran(x : number)
	return x + (random() * _settings.def.random_scale)
end









--[[
	A list of all activation functions.
	
	WARNING: modifying this list will have consequences
	and breaks every neural network that depends on this specific order.
	
	If you wish to add your own functions
	you should always append TO THE BOTTOM of the list.
	
	Neural networks are not backwards-compatible with older versions of this list.
]]

module.func_list = { -- Whole table is made all at once since it might be more optimized.

	linear;
	sigmoid;
	wrap;
	relu;
	rev_relu;
	pi_sine;
	aran;
	math.tanh;
	math.sin;
	math.cos;
	math.round;
}



-- Picks a random function from the function list.

function module.random_function() : number

	local size = #module.func_list

	return math.random(1, size)
end






return module

The output.

 12:22:44.210  [NeuroNet]   Created input layer with 6 inputs.  -  Server - constructor:142
  12:22:44.210   ▶ [NeuralNet]   Created layer with 8 nodes. (x3)  -  Server - constructor:115
  12:22:44.210  [NeuralNet]   Created layer with 6 nodes.  -  Server - constructor:115
  12:22:44.211   ▶ {...}  -  Server - neural net test:8

  12:22:45.231  Node 1 = 0  -  Server - processor:180
  12:22:45.231  Node 2 = -0.05600904476120605  -  Server - processor:180
  12:22:45.231  Node 3 = 0  -  Server - processor:180
  12:22:45.231  Node 4 = 0.18197623951841124  -  Server - processor:180
  12:22:45.231  Node 5 = 0.6186482173094424  -  Server - processor:180
  12:22:45.231  Node 6 = -0.42393142440776876  -  Server - processor:180

...

  12:22:45.327  [NeuroNet] Took 0.09615569999914442 seconds to complete forwarding.  -  Server - neural net test:36
  12:22:45.327  Size of neural network: 36 nodes.  -  Server - processor:201

I’m also totally aware that some activation functions could be slow.
They are necessary however for the most part since this algorithm is inspired by NEAT and I wanted to give every node the potential to have it’s own/unique activation function to allow for more complex interaction between nodes while requiring a much smaller neural network.

I might need some approximations or cheaper alternatives that can give roughly the same results.

Thanks in advance.

Files

If you need the module itself to look at I can DM it, I preferably would like to not share it here since it’s not fully done and ready to be open-sourced just YET.

Eventually I might want to release this module to the public so that more developers can create “intelligent” NPCs.

Real-time performance is very important since you’re essentially supposed to use this for sword fighting, shooting and parkour NPCs that learn and adjust to their environments.

MaddyRing · January 16, 2025, 11:37am

by the neural network being too slow, do u mean that the npc’s actions will be delayed?
i didnt understand that well, so srry if its very obvious

Judgy_Oreo · January 16, 2025, 12:57pm

Neural Networks are slow and expensive, and overall, I do not believe it is practical to use them for standard NPCs, which have technically millions of potential outputs, yet are comparatively simple to program manually. Text-only NNs already require so much computing power, and there is no doubt in my mind a 100% NN-powered NPC trying to navigate a 3D environment will be millions of times more computationally expensive and difficult to fine-tune than manually programming behaviors.

Of course, I cannot force you to do anything, but I would advise against spending a significant amount of time on something unpredictable for a small part of your game.

And, on the actual question, I am somewhat surprised to see the --!optimize 2, but not --!native. This code seems like a good candidate for Native CodeGen, as it is primarily on the Luau-side and doesn’t interact much with Engine APIs.

C_Corpze · January 16, 2025, 7:43pm

The reason for this actually is because I manually mark functions as native using the @native keyword.

I know for a fact that not every function needs / benefits from native code-gen and it saves up memory to not use it on every single function.
The mutate functions and whatnot for instance are not that slow.

This is actually not entirely true.
These neural networks are by no means supposed to have millions of nodes/neurons.

These neural networks are supposed to be low-complexity to make them light-weight enough to run on a CPU without the need for GPU acceleration.

The idea

The general idea is that AI is trained using raycasts and sensors that capture only the most basic information such as distance or what direction an enemy is moving towards which will typically only have about 6 - 12 inputs.

Most neural networks used for games have no more than 50 nodes, the complexity is nowhere near what ChatGPT or Stable Diffusion is for instance.

Game engines like Unity and Unreal have plugins for neural networks that allows you to train simple low-complexity AI models that can perform tasks in real time.

I’m trying to replicate such a system in Roblox and I KNOW for sure that this is possible.

I’ve also seen and played games that have adaptive AI where a neural network is used to make AI less predictable and learn from the player’s actions.

It is true that most basic behavior can be hard-coded into an NPC, but I wish to create something that is much harder to do with manual coding and works better with a neural network.

Neural networks like this will mostly be used for AI that learn how to parkour or sword fight in realistic and adaptable ways, I want an AI system where AI must adapt to the environments of the game to create more immersive and realistic gameplay.

C_Corpze · January 17, 2025, 1:17am

So basically, the current problem is that if you do too many calculations per second, it will essentially hog all the CPU resources, leaving no room for other gameplay logic.

Having a few neural networks doing a few calculations per second is fine.
A single neural network can complete processing information in a single frame.

This is fast enough for maybe a few NPCs.

If I do this in parallel luau, the heavy workload is put on a different CPU core that is less busy.
But this will still become a problem if I have let’s say… 30 NPCs.

I have a 16 core processor, for me running multiple neural networks is not an issue, but it may be for Roblox servers and client-sided neural networks, where I assume I’ll have less cores to work with.

I need to find a way to optimize the processing logic just enough so I can have a few dozen NPCs doing 20 calculations per second for each neural network.

MaddyRing · January 17, 2025, 11:37am

hm, maybe replace sigmoid with relu?
u could also use roblox micro profiler to check which part of ur game takes most time on code, then u can see which problem

and maybe precompute the weights and the biases outside of the frame calculations thing (if possible) idk if u have done it or not

(note i only learned neural networks like 2 months ago, so i may be wronged)

but maybe it wont optimize it enough or not im not sure

C_Corpze · January 17, 2025, 4:40pm

Well, I cannot exactly replace any activation function since then the resulting math would be drastically different.

Activation functions basically process the final value of a node and transform it using some sort of “curve”.
And Sigmoid functions basically produce a S-shaped curve.

What I COULD potentially do however, is make maybe slightly less accurate but faster alternatives for every activation function.

I’ve considered if math.sin() could potentially replace Sigmoid but it’s shape might be too different.

But not just the Sigmoid function need to be optimized, the other functions as well.
My neural network differs from the typical.

Usually you have one layer of nodes that uses the same function, but my neural network allows every node to have it’s own activation function.
As a result this allows potentially more complex calculations with less nodes but I still have to test that.

But that’s not the point here, I just need to find a way to make this neural network library itself faster.
Seeing if there’s a way to squeeze more performance out of the forwarding and activation functions.

I think I already posted the code here that matters most really, the rest of the modules are pretty irrelevant.