How to optimize ball fill code?

Hello guys. I’m trying to improve roblox terrain editor. While I improved results of shapes, I also degrade a lot of performance (256 stud ball with my plugin drop fps up to 7, while roblox built-in one is stable 59)
Can someone help me, and say, how I can optimize this code?

local function GetShapeAffect(Shape, BrushSize, BrushCFrame, VP_X, VP_Y, VP_Z)
	if Shape == "Ball" then
		local X = VP_X - BrushCFrame.X
		local Y = VP_Y - BrushCFrame.Y
		local Z = VP_Z - BrushCFrame.Z
		local Distance = (X*X + Y*Y + Z*Z)^0.5
		if Distance <= BrushSize - 2 then return 1 end
		local BrushOccupancy = math.clamp((BrushSize - Distance) / 4 + 0.5, 0, 1)
		return if BrushOccupancy >= 0.15 then BrushOccupancy else 0
--	...other shapes, but I want understand what I'm doing wrong with ball for now.
	end
end

--...code which uses this function

if BrushPart.Shape == Enum.PartType.Ball then
	local Radius = ToolSettings[Tool]["Brush"]["Size"][2].X / 2 --Getting brush RADIUS (not diameter)

	local RegionStart = Vector3.new(math.floor((BrushPart.Position.X - BrushPart.Size.X / 2) / 4) * 4, math.floor((BrushPart.Position.Y - BrushPart.Size.Y / 2) / 4) * 4, math.floor((BrushPart.Position.Z - BrushPart.Size.Z / 2) / 4) * 4) - Vector3.new(4, 4, 4)
	local RegionEnd = Vector3.new(math.ceil((BrushPart.Position.X + BrushPart.Size.X / 2) / 4) * 4, math.ceil((BrushPart.Position.Y + BrushPart.Size.Y / 2) / 4) * 4, math.ceil((BrushPart.Position.Z + BrushPart.Size.Z / 2) / 4) * 4) + Vector3.new(4, 4, 4)
	
	local RegionCenter = RegionStart:Lerp(RegionEnd, 0.5)
	local Voxels = Region3.new(RegionStart, RegionEnd)
	local Materials, Occupancies = workspace.Terrain:ReadVoxels(Voxels, 4)
	local OccupanciesModified = {} --Cloning Voxels
	for X = 1, #Occupancies, 1 do
		OccupanciesModified[X] = {}
		for Y = 1, #Occupancies[1], 1 do
			OccupanciesModified[X][Y] = {}
			for Z = 1, #Occupancies[1][1], 1 do
				OccupanciesModified[X][Y][Z] = Occupancies[X][Y][Z]
			end
		end
	end
	local BrushPosition = BrushPart.Position
	local TargetMaterial = ToolSettings[Tool]["Materials"]["Materials"][2]
	
	for X = RegionStart.X + 2, RegionEnd.X - 2, 4 do
		for Y = RegionStart.Y + 2, RegionEnd.Y - 2, 4 do
			for Z = RegionStart.Z + 2, RegionEnd.Z - 2, 4 do
				local Result = GetShapeAffect("Ball", Radius, BrushPosition, X, Y, Z)
				if Result > 0 then
					local CeilX = (X - RegionStart.X) / 4 + 0.5
					local CeilY = (Y - RegionStart.Y) / 4 + 0.5
					local CeilZ = (Z - RegionStart.Z) / 4 + 0.5
					if Materials[CeilX][CeilY][CeilZ] == Enum.Material.Air then
						Materials[CeilX][CeilY][CeilZ] = Enum.Material[TargetMaterial]
					end
					OccupanciesModified[CeilX][CeilY][CeilZ] = math.max(Result, OccupanciesModified[CeilX][CeilY][CeilZ])
				end
			end
		end
	end

	workspace.Terrain:WriteVoxels(Voxels, 4, Materials, OccupanciesModified)
else ...
3 Likes

Here are a couple things you could try to see if they make any difference.

First don’t divide by 2(or 4), multiply by .5(or .25) instead.

Create local functions like:

local mfloor = math.floor
local mceil = math.ceil
local v3new = Vector3.new

at the top of your file and use these aliases in the code as being local functions they will run faster.

You could also create a variable for the Vector3.new(4,4,4) as more of a constant and just use that value instead of creating a new one twice each time through the code:

local v3new444 = Vector3.new(4,4,4)

You do this code twice for each X, Y, and Z. You could make 3 variables and just do it once instead of the calculation twice so for instance:

local halfBrushX = BrushPart.Size.X * .5
local halfBrushY = BrushPart.Size.Y * .5
local halfBrushZ = BrushPart.Size.Z * .5

Then your RegionStart and RegionEnd would look like so:

	local RegionStart = v3new(mfloor((BrushPart.Position.X - halfBrushX) * .25) * 4, mfloor((BrushPart.Position.Y - halfBrushY) * .25) * 4, mfloor((BrushPart.Position.Z - halfBrushZ) * .25) * 4) - v3new444
	local RegionEnd = v3new(mceil((BrushPart.Position.X + halfBrushX) * .25) * 4, mceil((BrushPart.Position.Y + halfBrushY) * .25) * 4, mceil((BrushPart.Position.Z + halfBrushZ) * .25) * 4) + v3new444

Hopefully those might make at least a little improvement.

This is not actually faster in luau, and is awful to read. Please don’t do this. :frowning:

This is actually really negligible even with the O(n^3) scaling (you save probably a couple of microseconds when the lag is many milliseconds). Trying to optimize against float division doesn’t make sense when most things already have overhead that shadow it significantly, except in rare cases. This is more something you’d do when you’re expecting millions or billions of iterations.

You could also take advantage of SIMD:

local halfBrush = BrushPart.Size / 2

I believe this is ever so slightly faster than doing BrushPart.Size.X / 2. It does all three simultaneously using SIMD CPU instructions, and you save a couple variable reads/writes, not that that adds to a lot of performance cost.

@OP
I don’t think you are really doing anything “wrong” with the ball code. I think the main cause of lag is just that you have those nested for loops, it is O(n^3), so small performance differences result in big outcomes. Going from 23 voxels on each axis is already a 3.375x slowdown (3^3/2^3).

Couple things that can impact performance quite a bit there:

  • Your nested tables mean a lot more allocation costs, in addition to them just having a lot of allocation costs to begin with. You should use table.create there, especially because you already know the table size (e.g. it’d look something like table.create(((RegionEnd.X - 2) - (RegionStart.X + 2)) / 4)), or alternatively using one table with a Vector3 key which can probably save you a decent bit of time while making the code a little easier to work with.
  • There are several spots where you could easily take advantage of those Vector3 SIMD instructions by doing your math using Vector3s instead of individually per component which is significantly slower. E.g. in your Ball’s X, Y, and Z math, and in the CeilX, CeilY, and CeilZ calculation.
  • .Magnitude is significantly faster in comparison to (x*x + y*y + z*z)^0.5 since the native Vector3 changes. I think this is probably your biggest slowdown in the GetShapeAffect code… IIRC x*x is actually slower than x^2 in luau. I believe this is because it does two variable reads, and I also know ^2 and ^0.5 have some sort of optimization, but I don’t really know much about that, so that is probably also a contributor to that.
2 Likes

I tried to use Vector3.magnitude, (x^2 + y^2 + z^2)^0.5 and other functions. Looks like that (x^2 + y^2 + z^2)^0.5 is FASTEST possible way - 15 TIMES FASTER than magnitude! (1m iterations):

local startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = Vector3.new(21, 3, 14).Magnitude
end
local endTime = os.clock()
print("Built-in function used " .. endTime - startTime)
startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = (21^2 + 3^2 + 14^2)^0.5
end
endTime = os.clock()
print("Square function used " .. endTime - startTime)
startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = math.sqrt(math.pow(21, 2) + math.pow(3, 2) + math.pow(14, 2))
end
endTime = os.clock()
print("Functions used " .. endTime - startTime)
local x, y, z = 21, 3, 14
startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = Vector3.new(x, y, z).Magnitude
end
endTime = os.clock()
print("Built-in function with variables used " .. endTime - startTime)
startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = (x^2 + y^2 + z^2)^0.5
end
endTime = os.clock()
print("Square function with variables used " .. endTime - startTime)
startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = math.sqrt(math.pow(x, 2) + math.pow(y, 2) + math.pow(z, 2))
end
endTime = os.clock()
print("Functions with variables used " .. endTime - startTime)

test code
image
(Sorry for SO long response)

Creating a Vector3 just to read the magnitude will obviously be slower than just doing maths on 3 numbers

even if you will pre-made vector like in my case, it will be slower (6 times slower)

I think Hexcede meant that math.sqrt(Vector.X^2 + Vector3.Y^2 + Vector3.Z^2) is slower than Vector.Magnitude

					if Materials[CeilX][CeilY][CeilZ] == Enum.Material.Air then
						Materials[CeilX][CeilY][CeilZ] = Enum.Material[TargetMaterial]
					end

this indexes a table 6 times in the worst case, you should put Materials[CeilX] and Materials[CeilY] as a variable in the loops

					local CeilX = (X - RegionStart.X) / 4 + 0.5
					local CeilY = (Y - RegionStart.Y) / 4 + 0.5
					local CeilZ = (Z - RegionStart.Z) / 4 + 0.5

can’t these variables be moved up? also, put RegionStart.X, Y and Z values in a variable

local function GetShapeAffect(Shape, BrushSize, BrushCFrame, VP_X, VP_Y, VP_Z)

instead of doing BrushCFrame.X, .Y, .Z the function should be passed the X Y Z (which should be stored in a variable) and so won’t need to read them every time
also, isn’t BrushCFrame actually a Vector3 here? and the function should be called Effect, not Affect

= Enum.Material[TargetMaterial]
this should be stored as a variable, so it’s not indexed every time

GetShapeAffect should be split into separate functions like GetBallEffect, and the other shapes, so it’s not comparing strings every time + it’s more readable

Is OccupanciesModified supposed to be a clone of Occupancies? If so, use table.clone instead

Your test is flawed in that you construct a new Vector3 every time. The majority of the time cost you are experiencing in your .Magnitude test is taken up by the construction of the Vector3.

Creating a constant Vector3 that you reference like this will give you more accurate results:

local startTime = os.clock()
local vector = Vector3.new(21, 3, 14)
for a = 1, 1000000, 1 do
	local magnitude = vector.Magnitude
end

And, if you want to take into account caching, doing math on an existing Vector3 is faster than creating it from scratch iirc due to SIMD:

local startTime = os.clock()
local vector = Vector3.new(0.5, 1, 0.8)
for a = 1, 1000000, 1 do
	local magnitude = (vector * a).Magnitude
end

This is also not true due to SIMD. Doing math on a Vector3 is roughly as slow as doing math on a single number, because the instructions are done simultaneously using SIMD instructions. This is what the native Vector3 update added, and it is also the update where .Magnitude’s speed greatly exceeded the speed of doing the calculation manually.

This is true when you already have the variables available, but for an existing Vector3, it isn’t faster anymore, because you have to access each component individually.

The main additional bulk of time in doing .Magnitude is simply the index operation, doing .Magnitude. If you try to do the same benchmark like this it will be much slower than doing .Magnitude:

local vector = Vector3.new(21, 3, 14)
local startTime = os.clock()
for a = 1, 1000000, 1 do
	local magnitude = (vector.X^2 + vector.Y^2 + vector.Z^2)^0.5
end
endTime = os.clock()
print("Square function with variables used " .. endTime - startTime)

What? You just said the test was flawed for that reason

Your test is flawed in that you construct a new Vector3 every time.

When you construct a brand new Vector3 during the test via Vector3.new, yes. Doing math on Vector3s is close to the cost of doing math on numbers. Constructing a new Vector3 is a lot more expensive.

(For example, Vector3.one * 3 is faster than Vector3.new(3, 3, 3))

P.s. I don’t know why this is the case, and, I don’t believe it is significant in practice, but, for these tests it does contribute to a large percentage of the time (since we’re sampling over millions of iterations).

This Is my attempt at optimizing the code though I can’t find any other issues
hope this helps


function GetShapeAffect(Shape, BrushSize, BrushCFrame, VP_X, VP_Y, VP_Z)
    if Shape == "Ball" then 
        local X = VP_X - BrushCFrame.X
        local Y = VP_Y - BrushCFrame.Y
        local Z = VP_Z - BrushCFrame.Z
        local Distance = (X*X + Y*Y + Z*Z)^0.5
        
        if Distance <= BrushSize - 2 then 
            return 1 
        end
        
        local BrushOccupancy = math.clamp((BrushSize - Distance) / 4 + 0.5, 0, 1)
        
        return (BrushOccupancy >= 0.15) and BrushOccupancy or 0
        
    end 
end

if BrushPart.Shape == Enum.PartType.Ball then
    local Radius = ToolSettings[Tool]["Brush"]["Size"][2].X / 2

    local RegionStart = Vector3.new(
        math.floor((BrushPart.Position.X - BrushPart.Size.X / 2) / 4) * 4,
        math.floor((BrushPart.Position.Y - BrushPart.Size.Y / 2) / 4) * 4,
        math.floor((BrushPart.Position.Z - BrushPart.Size.Z / 2) / 4) * 4
    ) - Vector3.new(4, 4, 4)
    
    local RegionEnd = Vector3.new(
        math.ceil((BrushPart.Position.X + BrushPart.Size.X / 2) / 4) * 4,
        math.ceil((BrushPart.Position.Y + BrushPart.Size.Y / 2) / 4) * 4,
        math.ceil((BrushPart.Position.Z + BrushPart.Size.Z / 2) / 4) * 4
    ) + Vector3.new(4, 4, 4)

    local RegionCenter = RegionStart:Lerp(RegionEnd, 0.5)
    local Voxels = Region3.new(RegionStart, RegionEnd)
    local Materials, Occupancies = workspace.Terrain:ReadVoxels(Voxels, 4)

    local OccupanciesModified = {} 
    for X = 1, #Occupancies, 1 do
        OccupanciesModified[X] = {}
        for Y = 1, #Occupancies[1], 1 do
            OccupanciesModified[X][Y] = {}
            for Z = 1, #Occupancies[1][1], 1 do
                OccupanciesModified[X][Y][Z] = Occupancies[X][Y][Z]
            end
        end
    end

    local BrushPosition = BrushPart.Position
    local TargetMaterial = ToolSettings[Tool]["Materials"]["Materials"][2]

    for X = RegionStart.X + 2, RegionEnd.X - 2, 4 do
        for Y = RegionStart.Y + 2, RegionEnd.Y - 2, 4 do
            for Z = RegionStart.Z + 2, RegionEnd.Z - 2, 4 do
                local Result = GetShapeAffect("Ball", Radius, BrushPosition, X, Y, Z)
                if Result > 0 then
                    local CeilX = (X - RegionStart.X) / 4 + 0.5
                    local CeilY = (Y - RegionStart.Y) / 4 + 0.5
                    local CeilZ = (Z - RegionStart.Z) / 4 + 0.5
                    if Materials[CeilX][CeilY][CeilZ] == Enum.Material.Air then
                        Materials[CeilX][CeilY][CeilZ] = Enum.Material[TargetMaterial]
                    end
                    OccupanciesModified[CeilX][CeilY][CeilZ] = math.max(Result, OccupanciesModified[CeilX][CeilY][CeilZ])
                end
            end
        end
    end

    workspace.Terrain:WriteVoxels(Voxels, 4, Materials, OccupanciesModified)

else
end