Optimizing Thread Performance

Hello, I am currently creating a parallel projectile solver that uses around 8 VMThreads. However, even with that many threads, when solving 10,000 or more projectiles, large stutters occur in the framerate. Currently, the algorithm is as shown: I have placed the hashing process before the actual worker threads because I wanted to see if that would help performance, so that is temporary. Other than that, essentially, this uses a spatial hashing algorithm in order to sort the projectiles to certain workers. I suspect that the hashing of the projectiles is what is causing most of the lag, however, I do believe that the task.spawn for loop is actually causing some lag too. The reason is that the Scheduler is still running a loop per worker thread concurrently.

Any tips for improving the performance?

Just some side information, each worker calculates BOTH the collision and physics step.

WORKER_LOAD = {}
	
	game["Run Service"].Stepped:Connect(function()
		debug.profilebegin("hashingProjectiles")
		for i, v in next, projectiles do -- ! Fairly optimized, but how do I optimize this further... !
			if v.collided() then continue end
			
			local data = v.data() -- Returns a reference, not a copy

			local x = math.floor(data.position.X / GRID_UNIFORM)
			local y = math.floor(data.position.Y / GRID_UNIFORM)
			local z = math.floor(data.position.Z / GRID_UNIFORM)
			local h = math.abs(bit32.bxor(x * 92837111, y * 689287499, z * 283923481))

			WORKERS[(h % NUM_WORKERS) + 1]:ScheduleWork(data.position, data.velocity, data.id)
			WORKER_LOAD[(h % NUM_WORKERS) + 1] = (WORKER_LOAD[(h % NUM_WORKERS) + 1] or 0) + 1
			
		end
		debug.profileend()
	end)
	
	task.wait(2)
	game["Run Service"].Heartbeat:Connect(function() -- Impliment a fixed time step.
		debug.profilebegin("workerSpawner")
		for _, worker in next, WORKERS do
			task.spawn(function()
				local work = worker:Work()
				if work == nil then return end
				for _, v in next, work do
					reverse_search[v[3]].updatePos(v[1], v[2])
					reverse_search[v[3]].updateCollision(v[4])
					reverse_search[v[3]].update()
					if reverse_search[v[3]].collided() then -- This can be optimized 100%
						--table.remove(projectiles, table.find(projectiles, reverse_search[v[3]])) -- Not performant
						reverse_search[v[3]] = nil
						
					end
				end
			end)
		end
		
		WORKER_LOAD = {}
		debug.profileend()
	end)


Basically I think this is what’s causing the lag honestly

    game["Run Service"].Heartbeat:Connect(function() -- Impliment a fixed time step.
		debug.profilebegin("workerSpawner")
		for _, worker in next, WORKERS do
			task.spawn(function()
				local work = worker:Work()
				if work == nil then return end
				for _, v in next, work do
					reverse_search[v[3]].updatePos(v[1], v[2])
					reverse_search[v[3]].updateCollision(v[4])
					reverse_search[v[3]].update()
					if reverse_search[v[3]].collided() then -- This can be optimized 100%
						--table.remove(projectiles, table.find(projectiles, reverse_search[v[3]])) -- Not performant
						reverse_search[v[3]] = nil
						
					end
				end
			end)
		end
		
		WORKER_LOAD = {}
		debug.profileend()
	end)

Fixed this after a while. Was a combination of problems, the module had an infinite loop that randomly started if no work was provided. The previous algorithm I used was not needed therefore I just distributed all the projectiles evenly. I now staged each process into PreAnimation and PreSimulation to reduce load overall. I can simulate around 30k projectiles with decent framerate on the server now.

No matter how efficient you make your game, the client will always need a powerful enough PC to render it. You wont get around performance issues by optimizing code when youre simulating thousands of moving pieces in a physics simulation because most people play on mobile or laptop.

1 Like

I agree with what you said. I ended up completely revamping my logic and so the actors are completely independent of the main thread. Id say i got at least a 10x performance increase out of it. But again i agree with what you said, most devices wont even be able to take advantage of it