Super Optimised Pixel Raytracer with Textures! (120+ FPS!)

Hey all. This is something that I have wanted to do for a long time, and that is a pixelated view effect that runs with raytracing in real time. I managed to get this running from 18 FPS to well over

120 FPS (and if I cap the rendering to 60 FPS, my machine can even get over 250 FPS!).

Have a look!

Comparing to a typical raytracer:

Typical raytracer, 1 raycast per pixel, single threaded (benchmark) | 10-18 FPS at 100x100

My raytracer. 20-30% raycasts per frame, multi-threaded | 110-133 FPS at 100x100


Features:

As far as features goes for a raytracer, this has full on texturing and transparency support, all which runs very nicely.

All current textures I have implemented: Brick, Wood, Grass and Corroded Metal:

Since the renderer uses interlacing, I there is also motion blur to hide the interlace slices when moving:


With all these renderings tricks and optimisations combined, you can get some quite impressive results.

A 200x200 shot with all optimisation options enabled running 30+ FPS:

I took advantage of interlacing, native luau code generation, multi-threading, and pixel sleeping on colours that are the same to achieve the high framerate.


Benchmarking - Average FPS:

Low-end Office PC - 47 FPS
  • DESCRIPTION: Originally an office computer. Memory has recently been expanded from 8 to 16gb. The computer is a 2011-2012 Dell model.

  • CPU: Intel Core i7-2600 (4 core)

  • Memory: 16 GB

    • 100x100 Interlaced:
      25 - 28 FPS (Average: 27)

    • 100x100 Interlaced + Pixel Sleeping:
      31 - 52 FPS (Average: 35)

    • 100x100 Interlaced + Pixel Sleeping (No Textures):
      40 - 61 FPS (Average: 47)

Mid-end Laptop - 57 FPS
  • (Specs not available)

    • 100x100 Interlaced:
      24 - 30 FPS (Average: 26)

    • 100x100 Interlaced + Pixel Sleeping:
      25 - 50 FPS (Average: 35)

    • 100x100 Interlaced + Pixel Sleeping (No Textures):
      45 - 61 FPS (Average: 57)

High-end Gaming PC - 130 FPS
  • DESCRIPTION: My personal computer. Bought this machine early this year.

  • CPU: AMD Ryzen 7 5700X (8 core)

  • Memory: 32 GB

    • 100x100 Interlaced:
      70 - 80 FPS (Average: 75)

    • 100x100 Interlaced + Pixel Sleeping:
      72 - 130 FPS (Average: 90)

    • 100x100 Interlaced + Pixel Sleeping (No Textures):
      112 - 157 FPS (Average: 130)


Future plans:

Until the EditabeImage beta is out, I will be further developing this renderer and further optimising as best as I can.

Once EditableImage releases, and depending on the end result, this will either end up in a game of mine, or/and may end being a sold asset for other developers to use.

25 Likes

Nice job! looks amazing! What are you planning to use for it? I’m pretty curious.

4 Likes

At the moment, im not planning to use it at all. I just plan to get this to look good and run at 60 FPS.

Depending on how this turns out, I might even open source this for others

5 Likes

Nice. thats a good item. you should sell it if you do it well.

3 Likes

Have you multithreaded it? If you do, and haven’t already, it will give you even more performance.

3 Likes

I have no idea how to multithread, or how to do it effectively

4 Likes

Well essentially if you have four cores, you’d 2-4x performance (not all cores are equal), roughly, depends on factors. Often times we have more.

So it would be worth it.

But essentially in roblox, we use an actor system. What you want to do is use multiple actors to raycast, so if you had 4 cores, divide the screen into 4 sections, have each actor raycast for that section only, in desynchronized mode, or, do every Nth pixel. Etc. Best to do a lot of actors tho so it works good on computers with a lot of cores.

Btw, what you’re doing is called raytracing, not pixelating, if you were not aware, some other people myself included have also messed around with this before so there should be some resources you can find on the devforum here that are either uncopylocked or explain how to do multithreaded raytracing.

Here is a staff post (warning, old, but it is open source) from when parallel luau came out in 2020.

The biggest performance improvement possible is multithreading.

There are some other techniques that can optimize it (btw i like your idea of using interlacing and pixel sleeping, that was smart!) however multithreading is the holy grail here.

5 Likes

I did some research, and im currently trying to implement it, and its the most painful thing i’ve ever had to do. Im having so many issues. And why do I have to do it across multiple scripts and actors?

This might take awhile…

3 Likes

OMG I FIGURED IT OUT!

It might have taken over a year of me actually getting any kind of multithreading working for any of my pixel-based projects, but thanks to this documentation featuring cloning one script across actors to recycle code with multi-threading, I DID IT!

We now have multi-threading on this bad boy!


Here are some tests at 200x200

BEFORE MULTI-THREADING:

Approximately 13 FPS

AFTER MULTI-THREADING (10 actors):

Approximately 31 FPS


That’s an insanely large performance boost. That’s about 2.3x faster!!

5 Likes

Omg this is amazing, I would LOVE to buy this! Why? Trust me, the stuff people would DO with this is amazing. I love making horror games and this would def be awesome! I am unsure how to help you, but keep at it! It looks amazing! Can’t wait to see the finish product!

4 Likes

I think it can be faster. How does your code look like? I am not asking for the Lua code but more of how do you actually send information between actors and things like that.

2 Likes

I can make lego island with this

2 Likes

The first video looks way better because yours has those weird visual bugs. You should probably fix that before you use it / sell it

2 Likes

The multithreading part was half yoinked from this on the roblox documentation.

Here’s their code:

-- Parallel execution requires the use of actors
-- This script clones itself; the original initiates the process, while the clones act as workers

local actor = script:GetActor()
if actor == nil then
	local workers = {}
	for i = 1, 32 do
		local actor = Instance.new("Actor")
		script:Clone().Parent = actor
		table.insert(workers, actor)
	end
	
	-- Parent all actors under self
	for _, actor in workers do
		actor.Parent = script
	end
	
	-- Instruct the actors to generate terrain by sending messages
	-- In this example, actors are chosen randomly
	task.defer(function()
		local rand = Random.new()
		local seed = rand:NextNumber()
		
		local sz = 10
		for x = -sz, sz do
			for y = -sz, sz do
				for z = -sz, sz do
					workers[rand:NextInteger(1, #workers)]:SendMessage("GenerateChunk", x, y, z, seed)
				end
			end
		end
	end)
	
	-- Exit from the original script; the rest of the code runs in each actor
	return
end

function makeNdArray(numDim, size, elemValue)
	if numDim == 0 then
		return elemValue
	end
	local result = {}
	for i = 1, size do
		result[i] = makeNdArray(numDim - 1, size, elemValue)
	end
	return result
end

function generateVoxelsWithSeed(xd, yd, zd, seed)
	local matEnums = {Enum.Material.CrackedLava, Enum.Material.Basalt, Enum.Material.Asphalt}
	local materials = makeNdArray(3, 4, Enum.Material.CrackedLava)
	local occupancy = makeNdArray(3, 4, 1)
	
	local rand = Random.new()
	
	for x = 0, 3 do
		for y = 0, 3 do
			for z = 0, 3 do
				occupancy[x + 1][y + 1][z + 1] = math.noise(xd + 0.25 * x, yd + 0.25 * y, zd + 0.25 * z)
				materials[x + 1][y + 1][z + 1] = matEnums[rand:NextInteger(1, #matEnums)]
			end
		end
	end
	
	return {materials = materials, occupancy = occupancy}
end

-- Bind the callback to be called in parallel execution context
actor:BindToMessageParallel("GenerateChunk", function(x, y, z, seed)
	local voxels = generateVoxelsWithSeed(x, y, z, seed)
	local corner = Vector3.new(x * 16, y * 16, z * 16)
	
	-- Currently, WriteVoxels() must be called in the serial phase
	task.synchronize()
	workspace.Terrain:WriteVoxels(
		Region3.new(corner, corner + Vector3.new(16, 16, 16)),
		4,
		voxels.materials,
		voxels.occupancy
	)
end)

I basically used the exact same concept. I have 10 actors which each render a 10th of the screen (for a 100x100 resolution, that’s 100x10 each)

As for what code is contained in the parralel execution, that’s the main raytracing, pixel and colour management, and then a bindable event is fired for every pixel back to the core script to draw the pixel to the canvas

2 Likes

Those artefects/visuals are part of the reason why the FPS is so high. I’m only doing about 30% the calculations compared to the previous video

2 Likes

I think you should avoid sending many data. Can you change the pixels after the parallel phase without the need to send the data to Main script?

2 Likes

Could buffer serialization help at all?

2 Likes

Unfortunately not. I have found that this is the best method. And also, not using the bindable event at all and not even setting the pixel hardly changes FPS at all. It’s like a 1% difference, so that part cant be improved much.

2 Likes

Next up… Textures!

2 Likes

Buffer serialisation is slower than just reading/writing to an array as far as I know

2 Likes