Parallel Execution Raycasting

MysteriousVagabond · February 13, 2023, 3:16am

I know I’m not the first to ask this, and I’ve even used a comparable module to great effect in other projects, but I’m struggling to make an effective module for performing batches of raycasts in parallel execution.

Below is what I have so far. It takes in arrays of origins and directions and batches them to the various actors, delivering each batch via a BindableEvent. The actors process the raycasts and fire back sparse arrays of the results along separate BindableEvents, which then get combined into a “results” table and delivered back to calling code.

The primary module:

--!strict

--//ParallelRaycastUtility//
--Allows for the execution of raycasts to be performed in parallel execution.

local BATCH_SIZE = 100
local ADJUSTED_BATCH_SIZE = BATCH_SIZE - 1

local IS_SERVER = game:GetService("RunService"):IsServer()

------------------------------------------------------------------------------------------------------------------------------------
--//ACTOR CONSTRUCTION//------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------

local ACTOR_COUNT: number = script:GetAttribute("ActorCount")
if type(ACTOR_COUNT) ~= "number" or ACTOR_COUNT < 1 then
	ACTOR_COUNT = 64
end

local input_events = table.create(ACTOR_COUNT)
local output_events = table.create(ACTOR_COUNT)

local actor_directory = Instance.new("Folder")
actor_directory.Name = "RaycastActors"

--TODO: Create separate client/server objects.
local actor_base = script:WaitForChild("Actor")

for index = 1, ACTOR_COUNT do
	local actor = actor_base:Clone()
	input_events[index] = actor.Input
	output_events[index] = actor.Output.Event
	
	actor.Handler.Enabled = true
	actor.Parent = actor_directory
end

actor_directory.Parent = IS_SERVER and game:GetService("ServerScriptService") or game:GetService("ReplicatedFirst")

------------------------------------------------------------------------------------------------------------------------------------
--//RUNNER METHODS//----------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------

local actor_index = 1

local function raycast_same_params(origins: {Vector3}, directions: {Vector3}, params: RaycastParams?): {RaycastResult}
	local raycast_count = #directions
	local result_count = 0
	
	local results: {RaycastResult} = table.create(raycast_count)
	
	local current_thread = coroutine.running()
	
	local function process_request(request_index: number, raycast_results: {RaycastResult}): ()
		local count = math.min(BATCH_SIZE, raycast_count - (request_index - 1))
		table.move(raycast_results, 1, count, request_index, results)
		
		result_count += count
		if result_count == raycast_count then
			task.spawn(current_thread)
		end
	end
	
	local working_origins = table.create(BATCH_SIZE)
	local working_directions = table.create(BATCH_SIZE)
	local index = 1
	
	while index <= raycast_count do
		local end_index = math.min(index + ADJUSTED_BATCH_SIZE, raycast_count)
		
		table.move(origins, index, end_index, 1, working_origins)
		table.move(directions, index, end_index, 1, working_directions)
		
		output_events[actor_index]:Once(process_request)
		input_events[actor_index]:Fire(index, working_origins, working_directions, params)
		
		table.clear(working_origins)
		table.clear(working_directions)
		
		actor_index = actor_index % ACTOR_COUNT + 1
		index += BATCH_SIZE
	end
	
	if result_count < raycast_count then
		coroutine.yield()
	end

	return results
end

------------------------------------------------------------------------------------------------------------------------------------
--//EXPOSED FUNCTIONS//-------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------

local exposed_functions = table.freeze({
	raycast_same_params = raycast_same_params
})

return exposed_functions

The actors’ handler:

--!strict

--//Handler//
--Processes raycast requests and returns results.

local input = script.Parent.Input
local output = script.Parent.Output

input.Event:ConnectParallel(function(request_index: number, origins: {Vector3}, directions: {Vector3}, params: RaycastParams?): ()
	local results = table.create(#origins)
	for index, origin in pairs(origins) do
		results[index] = workspace:Raycast(origin, directions[index], params)
	end
	output:Fire(request_index, results)
end)

The system works, and it can speed up requests, but with some pretty big caveats. When testing on an empty baseplate with randomly-generated vectors (max magnitude of 10), this module starts to beat out just doing the raycasting in serial execution at about 5,000 raycasts, which I guess is a win, but that number of requests is usually larger than any I’d generally need to make.

The primary issue is that the system waits a full frame between firing off the original BindableEvents and actually beginning to process the requests in parallel (regardless of signal behavior). In terms of execution time, the parallel execution is nearly three times faster, but I’m essentially trying to make an asynchronous function, and so that yield time is indescernable from execution time as far as calling code is concerned.

Another issue is that I have my batch size set to 100, which is fairly high, but I guess finding the right balance between breaking things up to make better use of actors and reducing overhead will be a matter of trial and error.

Am I missing some sort of trick to deliver my data more efficiently, or am I completely misunderstanding the point of parallel Luau?

MysteriousVagabond · February 15, 2023, 6:56pm

I wanted to come back to this and share some of what I did come to learn that may help others who are having problems with this.

I did not realize initially that parallel execution happens at a specific point within a frame, and that point seems to be between garbage collection and simulation processing (I couldn’t find documentation to verify this, but that’s what it looks like is happening according to the microprofiler).

This means that requests made during input processing/rendering will process the same frame they were made, allowing the system to speed up raycasting. This is helpful on the client, but it doesn’t do anything to rectify the situation for server raycasting, which is where all my original testing was happening and why I was having so many problems.

If anyone has a solution for the server-side issue, please let me know.