Multithreaded voxel game framework with LoD system

Hi, it’s me again. I’ve done voxel stuff in the past, but I think this one is big enough to deserve its own topic.

Quick terminologies:

Voxel - A 3-dimensional pixel; usually represented as blocks. Minecraft, for example, is a voxel game.
Multithreading - This project uses Parallel Luau to multithread certain processes to make them faster.
LoD - Level of Detail; a common feature in games that makes faraway objects render at a lower quality to save performance. You may know this in Roblox as RenderFidelity.

LoD system demo video:

Multithreading demo video:
(There are 343 chunks, each 16x16x16 in size, meaning there’s over 1.4 million voxels!)

Basic explanation on how it works

How does it work?

Chunks use several 3-dimensional tables to represent the voxels. For simplicity, I will be calling them tensors, and they are organized in tensor[x][y][z] = value format. The reason why they have multiple tensors is because each tensor will represent each level of LoD (there are 4 levels).

When a chunk is created, all 4 tensors will be filled with booleans to represent the physical world; true means the voxel is occupied, and false means air. A 3D Perlin noise algorithm is used to generate those booleans for each voxel based on the voxel’s position in the world.

--for the voxel located at (1, 2, 3)
tensor[1][2][3] = perlinNoise3D(1, 2, 3) --> true or false

To save performance, instead of loading in every single occupied voxel, we can just only load in the surface voxels. It’s going to be like marching cubes but everything will remain cubical. Surface voxels are just voxels that are next to air, and we will loop through the entire tensor to find them.

Note that marching cubes is not the only optimization method. A better one would be greedymeshing, but for the purpose and use case of this project, I deem it unfeasible. Note that I do have an older voxel project that does use greedymeshing if you’re interested.

Multithreading

The tensor-generating process is all multithreaded. There is an arbitrary number of actors (workers). When you create a chunk, it will first pick the laziest worker (least amount of current tasks) and instruct it to do work via a BindableEvent. It will take time, so the chunk will actually wait until the worker returns back the result.

function newChunk(...)
  ...
  local worker = getLazyActor() --find a lazy worker

  local results
  worker.ReturnSignal:Once(function(data) --listen for the returned results
    results = data
  end)

  worker.InstructSignal:Fire(chunkData) --give them instructions to do work
  while results == nil do task.wait() end --wait until they are done with the homework
  local tensors = results.stuff --continue doing stuff
  ...
end

Data Compression

But, you may ask, isn’t it very expensive to send super-big tables across BindableEvents? You are right, which is why the data is actually being compressed! It uses LZW compression, the same compression system I used in my older voxel project (link already shared above).
Basically, there is a massive lookup table storing every possible voxel position relative to the chunk (A 16x16x16 chunk has exactly 4096 voxels). This allows the voxels to be easily converted between Vector3 and a 3-character string. The tensor is encoded into a string by the worker before returning it, and then on arrival, it gets decoded back into a tensor.

After all the tensors are ready, you can just tell the chunk which one to load in. It will read the entire tensor and create parts based on the coordinates.

for x, y, z, v in tensor:iter() do
  newPart(x, y, z)
end

Below are some (and parts) of the code I used. I put them here to hopefully give any aspiring prospector some ideas if they are doing anything similar to what I made.

Code for the "Tensor" class object, used to store voxels
type class = {
	new: <T>(tensorValue<T>?) -> tensor<T>,
}

export type array<T> = {[number]: T}
export type matrix<T> = {[number]: array<T>}
export type tensorValue<T> = {[number]: matrix<T>}
export type tensor<T> = { --later aliased in scripts to become Tensor<T> with an uppercase
	_ : tensorValue<T>,
	set: (self: tensor<T>, x: number, y: number, z: number, value: T) -> (),
	get: (self: tensor<T>, x: number, y: number, z: number) -> T?,
	iter: (self: tensor<T>) -> () -> (number, number, number, T),
}

local Tensor = {}
Tensor.__index = Tensor

function Tensor.new<T>(tensor: tensorValue<T>?): tensor<T>
	return setmetatable({_ = tensor or {}}, Tensor)::any
end

function Tensor.set<T>(self: tensor<T>, x: number, y: number, z: number, value: T): ()
	local m: matrix<T>? = self._[x]
	if m then
		local a: array<T>? = m[y]
		if a then
			a[z] = value
			return
		end
		m[y] = {[z] = value}
		return
	end	
	self._[x] = {[y] = {[z] = value}}
	return
end

function Tensor.get<T>(self: tensor<T>, x: number, y: number, z: number): T?
	local m: matrix<T>? = self._[x]
	if m then
		local a: array<T>? = m[y]
		if a then
			return a[z]
		end
	end
	return nil
end

function Tensor.iter<T>(self: tensor<T>): () -> (number, number, number, T)
	return coroutine.wrap(function()
		for x, m in self._ do
			for y, a in m do
				for z, v in a do
					coroutine.yield(x, y, z, v)
				end
			end
		end
	end)
end

return (Tensor::any)::class

Use example of the Tensor data structure

local Tensor = require(script.Tensor)
type Tensor<T> = Tensor.tensor<T>

local thing: Tensor<Vector3> = Tensor.new() --generic type accepts anything
thing:set(1, 2, 3, Vector3.one) --set a value
print(thing:get(1, 2, 3)) --read a value
thing:set(1, 2, 4, Vector3.zero)
for x, y, z, value in thing:iter() do --iterate through the tensor
  print(x, y, z, '|', value)
end
Snippet of the voxel/tensor compression system
...

local encode: {[Vector3]: string} = {}
local decode: {[string]: Vector3} = {}

local timeBegin: number = os.clock()
local counter: number = 0
for x = 1, 32 do
	for y = 1, 32 do
		for z = 1, 32 do
			counter += 1
			local vec: Vector3 = Vector3.new(x, y, z)
			local hash: string = base93.tobase93(counter)
			if #hash == 1 then
				hash = [[\\]]..hash
			elseif #hash == 2 then
				hash = [[\]]..hash
			end
			encode[vec] = hash
			decode[hash] = vec
		end
	end
	if x % 4 == 0 then --so u dont crash lol
		task.wait()
	end
end
print(`Time taken to init tensor encoder: {os.clock() - timeBegin}`)

local function encodeBinaryTensor(tensor: Tensor<boolean>): string
	debug.profilebegin [==[encode binary tensor]==]
	
	local hashes: {string} = {}
	for x, y, z, v in tensor:iter() do --this is a coroutine iterator btw
		if v == true then
			table.insert(hashes, encode[Vector3.new(x, y, z)]) --encode is {[Vector3]: string}
		end
	end
	
	debug.profileend()
	return table.concat(hashes) --table.concat is excluded from profiler
end

local function decodeBinaryTensor(code: string): Tensor<boolean>
	debug.profilebegin [==[decode binary tensor]==]
	
	local t: Tensor<boolean> = Tensor.new()
	for i = 1, #code, 3 do --split the long string into triplets (each triplet represents one voxel)
		local vec: Vector3 = decode[string.sub(code, i, i+2)] --decode is {[string]: Vector3}
		t:set(vec.X, vec.Y, vec.Z, true)
	end
	
	debug.profileend()
	return t
end

...
Snippet of the actor scripts
...
instructEvent.Event:ConnectParallel(function(instruction: taskInstruction)
	print(`Processor {id} received a new task`)
	local encodeds: {string} = {}
	
	for k, corner: Vector3 in instruction.corners do
		local voxelDim: number = wCFG.lodVoxelDims[k]
		local chunkDim: number = wCFG.lodChunkDims[k]
		local scalarField: Tensor<boolean> = Tensor.new() --these are random names lol
		local surfaceMap: Tensor<boolean> = Tensor.new()
		
		local function getVoxel(x: number, y: number, z: number): boolean
			local filled: boolean? = scalarField:get(x, y, z)
			if filled == nil then --if the voxel doesn't exist, make one and store it
				filled = perlin.noiseBinary(corner.X+x*voxelDim, corner.Y+y*voxelDim, corner.Z+z*voxelDim)
				scalarField:set(x, y, z, filled::boolean)
			end
			return filled::boolean
		end
		
		local function isSurfaceVoxel(x: number, y: number, z: number): boolean --just check to see if the voxel is next to air (nothing)
			if not getVoxel(x+1, y, z) then return true end
			if not getVoxel(x-1, y, z) then return true end
			if not getVoxel(x, y+1, z) then return true end
			if not getVoxel(x, y-1, z) then return true end
			if not getVoxel(x, y, z+1) then return true end
			if not getVoxel(x, y, z-1) then return true end
			return false
		end
			
		for x = 1, chunkDim do
			for y = 1, chunkDim do
				for z = 1, chunkDim do
					if getVoxel(x, y, z) and isSurfaceVoxel(x, y, z) then
						surfaceMap:set(x, y, z, true)
					end
				end
			end
		end
		encodeds[k] = tensorEncoder.encodeBinaryTensor(surfaceMap)
	end
	
	local _r: taskResult = {
		id = instruction.id,
		encodedTensors = encodeds
	}
	returnEvent:Fire(_r)
end)
...
3D Perlin noise implementation
local seed: number = 123456
local genScale: number = 0.01

local noise = math.noise
local function noiseBinary(x: number, y: number, z: number): boolean
	local trueY: number = y
	y += 28.48675
	x *= genScale; y *= genScale; z *= genScale
	local _x = noise(y - 3.95382, z + 1.26932, seed)
	local _y = noise(x + 7.40134, z - 5.48274, seed)
	local _z = noise(x - 2.11045, y + 4.88563, seed)
	return _x+_y+_z >= .5 --this number should change to make the world more spacious or full
end

local module = {
	noiseBinary = noiseBinary,
}

return module

11 Likes

I was working on my own voxel system but I was only going to implement basic greedy meshing for rendering, being more focused on finding a way to store chunks and compress/serialize them to allow for ultra large terrain and premade worlds. This is really impressive, well done!

1 Like

Wow, this is Great! Only one thing tho, it seems like when the LoD changes, the Voxels become bigger, but their colour is random, I think it would be better if the colour would be the most used colour in h at place, very good tho!

Thanks! The coloring is just for demonstration purposes to tell apart the voxels. In production the voxels will actually become spheres to make the terrain look less blocky too.

After nearly a month of procrastination hardwork, this project is somewhat ready for testing! Here’s the game link for anyone interested:

The ugly truth…

Roblox really DOES NOT like big parts. Huge lagspikes from internal lighting and geometry updates thingamajig. For this reason, the world is a lot smaller than it should be for the sake of not blowing up your computer. As such, the experience will seem underwhelming, but just imagine yourself as an ant!

Also, I’ve decorated the microprofiler a decent amount so you can see exactly what is happening and where the bottlenecks are!

I’ll check this out right now and give you feedback (please don’t take this down because ill edit it)(I’m wandering what happens if I join on phone…)
I was actually able to play on phone and I didn’t have any problems (only a bit of lag while loading) and I tried with 1 thread and 6; threads and there’s no difference, it’s very impressive of how good this is…

I’ve heard others talk about Parallel Luau not working correctly or at all on mobile and console devices (because they tend to use obscure CPUs?). It’s mainly designed for desktops AFAIK.

You can tell if it’s actually multithreading for you by inspecting the microprofiler.

How do you do that on phone?
I don’t have any idea of ho to do it
(I don’t even have any idea of how to do it I’m Pc :rofl::rofl::rofl:)

It would be nice if Roblox provided us with a method to retrieve the number of CPU cores to limit how many threads we use, similar to what Python has in its os module, os.cpu_count().
This way instead of a hardcoded value of threads, we can setup a ratio of task distribution that adjusts based on the number of available cores.

2 Likes