TD game entity system, help on optimization

OK but you’re spawing 1K parts, close together. They still have events they fire, thousands of times.

How could i turn those events off then? i don’t really need them (never had to turn thosr stuff off)

image
oh wait

Anchor the part, turn off collisions, etc.

i have already done all of that tho

Enum.BulkMoveMode.FireCFrameChanged should also turn of the rest


didn’t change anything sadly

I haven’t looked through your code super closely, as this article seems quite lengthy. So forgive any ignorance I might have in making this post, I see this as more of an educational post for anyone who might be curious about this topic.

Bones, Skinned Meshes
One thing that will help a lot, is to reduce the number of parts on your character models. If you use Bones you can have an entire, animated “R6” rig in a single part. Additionally, you can have a “control bone” which can be used to control position and orientation of your character entities. Bones are very efficient in terms of CPU draw calls and are actually quite fast to move. I have an ocean simulation with 3.4k bones and it takes about 1.2ms to set their Transform each frame (if I actually move all of them). You cannot use BulkMoveTo with bones, however, they are still VERY fast to set the transforms, since this requires no update to their physics data, only rendering information.

Since you likely don’t require accurate collision information, like what you would need for raycast collisions, this should work quite well.

Bones are almost free to render as well. With a Skinned Mesh via control bone, using code similar to what is shown in the example below, I was able to move 10k R6 bodies and the update render step took just shy of 0.6ms, compare that with parts! In fact, I even had shadows on my 10k bodies and the entire render time was still less than 3ms.

Native Code Generation
Take a look at Luau Native Code Generation for more details on this.

This makes a significant performance improvement for arithmetic operations for this particular use case. In my testing I saw about a 2x performance improvement by utilizing it.

Interpolation
Assuming you aren’t already doing this, having a CFrame at each point direction change in your track would help to cut down on the calculation cost for determining the entity’s next position. Given that you have a sequence of points, from the start of the track to the end of the track, each entity needs to keep track of the current index point they’re on (the furthest point index along the track that they have passed.), given that index, you know the past waypoint and the next waypoint. Each entity also needs to track when they crossed the last waypoint. Given that information, and a known distance (magnitude) between waypoints you can determine for any given time exactly where the entity should be along the entire track.

This will provide you with a very fast, very cacheable position lookup. It would look a bit like this:

--!strict
--!native

-- Notice we're using native code generation here,
-- This will, in most cases, drastically speed up arithmetic operations.
-- Note that you'll need to enable this, as it is only a beta feature currently.
-- See https://devforum.roblox.com/t/luau-native-code-generation-preview-studio-beta/2572587
-- for details about native code gen.

type Waypoint = {
	Position: Vector3,
	DistanceToNext: number,
	Index: number,
	Entities: { [number]: number },
}

type TrackSegment = {
	StartWaypoint: Waypoint,
	EndWaypoint: Waypoint,
}

type Entity = {
	Index: number,
	Speed: number,
	Health: number,
	MaxHealth: number,
	ControlBone: Bone,
	WaypointReachedTime: number,
	NextWaypointReachedTime: number,
	WaypointAlpha: number, -- The alpha value we last lerped by.
	TotalAlpha: number, -- The cumulative alpha value this entity has traversed.
	WaypointIndex: number,
}

type TargetType = 'First' | 'Last' | 'Strongest' | 'Weakest'

type TrackIntersection = {
	WaypointIndex: number,
	MinAlpha: number,
	MaxAlpha: number,
}

-- This would probably be generated at runtime for each track,
-- waypoints should be inserted (or sorted) from beginninng of the track to the end.
local waypoints: {Waypoint} = {}

local track_segments: {TrackSegment} = {} -- Represents the edge (line) between two vertices (waypoints) along the track.

local entities: {Entity} = {}
local entity_index: {[number]: Entity} = {}
--- Sets up necessary information for the entity to traverse to the next waypoint.
-- This function should only be called once when the entity reaches the waypoint.
-- @returns a boolean representing whether the entity has reached the end of the track.
local function waypoint_reached(entity: Entity, waypoint_reached_time: number): boolean
	local next_waypoint = waypoints[entity.WaypointIndex + 1]
	if (next_waypoint) then
		entity.WaypointReachedTime = waypoint_reached_time
		entity.WaypointIndex += 1
		entity.WaypointAlpha = 0
		
		--next_waypoint.Entities[entity.Index] = entity.WaypointAlpha

		if (next_waypoint.DistanceToNext > 0) then
			entity.NextWaypointReachedTime = waypoint_reached_time + (next_waypoint.DistanceToNext / entity.Speed)
		end
		return true
	end

	return false
end

local function get_entity_transform(current_time: number, entity: Entity): Vector3
	local alpha = (entity.WaypointReachedTime - current_time) / (entity.WaypointReachedTime - entity.NextWaypointReachedTime)
	
	entity.TotalAlpha += math.min(alpha, 1) - entity.WaypointAlpha
	
	if (alpha >= 1) then
		-- Get the corrected time in which we reached the next waypoint (assuming alpha is > 1 then we actually overshot in this simulation.)
		-- To get the correct time, we correct by (t1 + (t2 - t1) / alpha)
		local has_next_waypoint = waypoint_reached(entity, entity.WaypointReachedTime + ((entity.NextWaypointReachedTime - entity.WaypointReachedTime) / alpha))
		
		if (not has_next_waypoint) then
			entity.WaypointAlpha = 1
			return waypoints[entity.WaypointIndex].Position
		end

		-- We need to get a new transfrom since we crossed over into the next waypoint.
		return get_entity_transform(current_time, entity)
	end

	local wp1 = waypoints[entity.WaypointIndex]
	local wp2 = waypoints[entity.WaypointIndex + 1]

	entity.WaypointAlpha = alpha
	
	-- You may actually want to break up Position and Orientation to control the rotation of the entity separate from the position.
	-- You may benefit from doing the arithmetic operations on the X, Y, and Z values
	-- independently. This is because native code generation does not support Vector3 (yet) or CFrame (yet? probably never.)
	-- This would likely be a micro-optimization.
	return wp1.Position:Lerp(wp2.Position, alpha)
end

local function update_entities()
	local current_time = workspace:GetServerTimeNow()

	for _, entity in entities do
		-- You can also add support for rotation here.
		entity.ControlBone.Transform = CFrame.new(get_entity_transform(current_time, entity))
	end
end

local function add_entity(instance: BasePart, ControlBone: Bone, speed: number, health: number): Entity?
	local current_time = workspace:GetServerTimeNow()

	total_entities += 1

	instance.CFrame = CFrame.identity -- This will make converting Bone Transform to world space trivial.

	local entity: Entity = {
		Index = total_entities,
		WaypointIndex = 0, -- We'll start this at 0 so that the waypoint_reachd function can be called to handle setup for us.
		Speed = speed,
		MaxHealth = health,
		Health = health,
		ControlBone = ControlBone,
		NextWaypointReachedTime = 0,
		TotalAlpha = 0,
		WaypointAlpha = 0,
		WaypointReachedTime = 0,
	}

	waypoint_reached(entity, current_time)

	table.insert(entities, entity)
	entity_index[total_entities] = entity
	
	return entity
end

-- Adds a waypoint to the track provided a position, and an index (it's waypoint index along the track.)
-- Optionally takes a from_waypoint Waypoint object which denotes the waypoint linked tp this one previously.
-- If a from_waypoint is passed in, a TrackSegment is also created for the track.
local function add_waypoint(position: Vector3, index: number, from_waypoint: Waypoint?): Waypoint
	local distance_to_next = 0
	
	if (from_waypoint) then
		distance_to_next = (from_waypoint.Position - position).Magnitude
		from_waypoint.DistanceToNext = distance_to_next
	end
	
	local waypoint: Waypoint = {
		Position = position,
		DistanceToNext = 0,
		Index = index,
		Entities = {},
	}
	
	if (from_waypoint) then
		local track_segment: TrackSegment = {
			StartWaypoint = from_waypoint,
			EndWaypoint = waypoint,
		}
		
		table.insert(track_segments, track_segment)
	end
	
	table.insert(waypoints, waypoint)
	
	return waypoint
end

Bear in mind that this is pseudocode.

With a similar version to this code, I was able to move 150,000 entities (with no moving bodies) every frame and maintain just below 16ms (still 60FPS). A more reasonable 10,000 entities with moving bodies takes about 6.5ms on my computer. You mentioned it took 21-26ms on average for 20,000 entities with no bodies, I was able to run about 250,000 in that same timeframe (~23ms). It’s totally worth mentioning that native code generation gives about a 2x performance improvement here as well.

“Micro Optimizations”

In most cases, diving into what a lot of people call “micro optimizations” is a bad thing. However, in this particular case, I believe you’ll benefit from it.

One “micro optimization” in this example is in reducing index (read) and newindex (write) calls to Roblox instances. In this code example, almost all of the necessary read information is cached, this avoids redundantly reading from a Roblox instance, which is actually fairly slow, believe it or not. In the case of the update_entities function, we want to minimize, as much as possible, reading and writing on Roblox instances, since those tend to be slow. The only place where that’s necessary in our update step is in setting the Bone Transform which you’ll see a lot in the microprofiler if you implement a similar approach. Most of the information is cached in the entities and waypoints arrays.

Other “micro optimizations” that could be made are in how the Vector3 Lerp is handled. The new native code generation takes advantage of a lot of CPU optimizations around arithmetic and greatly speed up any arithmetic operations, however, this is currently excluding CFrame and Vector3 so moving the arithmetic operations out of Vector3 and into the individual components might yield faster results, I have not tested this though. Additionally, it is not clear whether or not CFrame will be supported by native code generation in the future, so such an “optimization” may be short lived, and it might actually be faster with CFrame in the future. Take a look at Luau Native Code Generation for more details on this.

4 Likes

already on native, the lag isn’t from the model being unoptimised in therm of triangles/polygons since i get the same performance with a literal cube, the entire lag is simply by moving the cframe of an object, not calculating the cframe/vector3 and no idea what “micro optimisation” i could actually do now i think (on my current version since the one on this post is a little outdated but like a little) kinda like already have the most optimised calculations and variables that i can do. Like i keep saying the only lag is legit coming from object.CFrame = cframe (updating/assigning the cframe to an object), BulkMoveTo() got the best performance tbh with the cframe and stuff. About the bones i have literally no idea how to use those since i never looked at this but those looks really interesting.

Edit: my bad didn’t read your message entirely and my calculation do take a lot of performance when i get to amount like 10,000, but your stuff seems really interesting i’m gonna look further into it, thanks.
worth mentioning tho that tbose are probably gonna be amount of enemies that are never gonna be here in actual gameplay, this is mostly just stress test.

One of the biggest optimizations mine has over yours is the use of bones and transforming the entire mesh with that. When you update a CFrame on a part, it has to update all of the data involved in physics, such as the BVH tree, the KD tree, and many other structures to improve physics performance. Since Bone transforms do not have physics this is not part of the update. By contrast, updating the CFrame of a part vs updating the Transform (or just CFrame, for that matter) of a bone is about a 15x difference in performance. There’s a LOT of work that is done when updating the CFrame of a part, even if you’re using BulkMoveTo, physics still applies. With Bones that is not the case.

See this simple example:

local Cube = workspace.Cube.Bone
local Cube2 = workspace.Cube2

game:GetService('RunService').Heartbeat:Connect(function()
	local cframe = CFrame.new(math.random(-100, 100), 0, math.random(-100, 100))
	
	Cube.Transform = cframe
	Cube2.CFrame = cframe
end)

You can see here that the CFrame update on a part is FAR more expensive, BulkMoveTo is better, sure but still not as fast as moving a bone.

Here’s the same example with an additional BulkMoveTo

local Cube = workspace.Cube.Bone
local Cube2 = workspace.Cube2
local Cube3 = workspace.Cube3

local bulk = {Cube3}

game:GetService('RunService').Heartbeat:Connect(function()
	local cframe = CFrame.new(math.random(-100, 100), 0, math.random(-100, 100))
	
	Cube.Transform = cframe
	Cube2.CFrame = cframe
	workspace:BulkMoveTo(bulk, {cframe}, Enum.BulkMoveMode.FireCFrameChanged)
end)

In every case, the bone takes 0.002ms to move, the direct CFrame setting takes roughly ~0.025-0.03ms, the BulkMoveTo is 0.012-0.015ms

This means the bone movement is ~12.5-15x faster than normal CFrame and 6-7.5x faster than BulkMoveTo in this case. As noted in my post, my system was easily able to handle 10k entities with bodies controlled via Bone Transform.

To be honest, I feel like you sort of glossed over my post. I’m sort of frustrated by that because I actually spent a lot of time making that example for you. There’s a lot of really good information in there that would really help you if you take the time to understand it. I appreciate that you at least acknowledged it in your edit though.

Definitely take some time to look into Bones, and take a look at how I’m handling entity updates, I keep data processing to a bare minimum, every if statement is necessary, every mathematical operation I do is necessary. By contrast, in your update statement you’re checking if .Parent is set. Take a moment to look at just how much time that actually takes for an if statement that could be completely avoided. You can totally decouple your entity objects from Roblox instances. Definitely don’t rely on any properties of an instance being set to something, as I mentioned, looking up property values on instances is slow, cache where possible, and avoid unnecessary lookups within large loops, like your core update loop.

Attached is a cube model that has a bone which you can use to control it.
cube_test.fbx (18.7 KB)

Additionally, you mentioned that you were “already on native”, I just noticed that you are in fact adding the tag. However, there’s actually further performance improvements available if you’re taking advantage of Luau’s typing. If you read the article about Luau Native Code Generation it talks a bit about how the compiler can further optimize function calls with cues its able to get from typing.

We also are starting to utilize type annotations during execution and expect that code that uses correct type annotations will perform better than code that doesn’t when native code generation is used.

3 Likes

My bad for kinda glossing it out and tbh i should really have a better look at this, this actually looks really interesting and sorry for my previous response.
The difference in time with bones is actually really huge compared to normal cframe assignment and bulkmoveto.
it’s true that i could remove some still remove some useless/unnecessary statement and other stuff and i actually never looked at how much time a statement was taking but it will probably be higher than what i thought it would have been.
I didn’t knew you could actually do further optimisations with native and i will for sure look at those too
Overall both of this one and previous comments seems really useful and really interesting, thanks!
I did download your cube and try to move the bones and stuff with Transfrom to see if it was really any different in the mechanic and it seems like it is “faking” the movement of an object, it is like moving it without actually moving it and raycast doesn’t seems to be detecting this sort of moved cube which means that i cannot hover enemies anymore now which is kinda an issue.

1 Like

I did download your cube and try to move the bones and stuff with Transfrom to see if it was really any different in the mechanic and it seems like it is “faking” the movement of an object, it is like moving it without actually moving it and raycast doesn’t seems to be detecting this sort of moved cube which means that i cannot hover enemies anymore now which is kinda an issue.

This is because raycasting relies on the physics engine, and bones don’t update physics information, which is kind of the advantage they have. You still know the position of every entity in real time so I’m not entirely sure why this is a big issue. In my testing with this example I was able to determine which ones were within tower range with no issue.

Here’s 18k entities being tracked in real time by 79 towers at 60FPS:

Sad that raycasting doesn’t work with this, but the issue isn’t from the towers not being able to detect if the enemy inside the range (i’m probably capable of doing that since like you said we still update the position in real time i could mostly just store the “position” (on the server) to know where it is, because right now this is for the client, after i can still keep the main calculus for the server without actually moving any enemies), it’s mostly about hovering the enemies with the player mouse so we can see their health.
but damn that much with 60 fps??? appart from this small issue with hovering this is probably gonna help me out a lot, thanks (again)!

Not a perfect solution to that problem, but could you just get the nearest one to where the mouse cursor hits? If you have a massive number of entities then that doesn’t work great, but neither does raycasting.

true, this would not work perfectly since the cast could be hitting something that is way further than your original target

Yeah, if you had a relatively extreme angle that could definitely happen. Such an angle seems unlikely in a game like this, but I don’t know your exact use case.

If you go in a game (in example: Tower Defense Simulator, Tower battles, most of the td games out here in roblox) you can point your mouse on the enemies and you will be able to see how much hp they currently have. Wait 1 sec i’ll record a video.

1 Like

You can see that when i point my cursor on some enemies i am able to see their hp
looking from certain angles could kinda make this not really work.
(I kinda suck at making ui and some of the objects here are kinda old probably gonna redo some stuff and don’t mind the base health being some weird red-green color i just forgot to fix something)

You could also create a “custom raycaster” that works on blocks. Essentially you’d need to get the UnitRay from the mouse, and check if that ray ever crosses through any of the character models’ “bounding box.” You should be able to get a point of intercept this way and with that you can get the nearest unit that your mouse’s UnitRay intersects.

A few issues I can see with that are performance, you would really not want to check against every unit every frame, so it would be ideal to employ some kind of KD-tree or Octree like data structure to limit which units would even likely be along that mouse UnitRay.

Provided that you are willing to use axis-aligned bounding boxes for your units then I think you could simply do a line-cuboid test like so:

-- Function to check if a line intersects a 3D rectangle (box or cuboid).
function does_line_intersect_3D_rectangle(line_start: Vector3, line_end: Vector3, rect_min: Vector3, rect_max: Vector3)
	local dx = (line_end.x - line_start.x)
	local dy = (line_end.y - line_start.y)
	local dz = (line_end.z - line_start.z)
	
	local t1 = (rect_min.x - line_start.x) / dx
	local t2 = (rect_max.x - line_start.x) / dx
	local t3 = (rect_min.y - line_start.y) / dy
	local t4 = (rect_max.y - line_start.y) / dy
	local t5 = (rect_min.z - line_start.z) / dz
	local t6 = (rect_max.z - line_start.z) / dz

	local t_min = math.max(math.max(math.min(t1, t2), math.min(t3, t4)), math.min(t5, t6))
	local t_max = math.min(math.min(math.max(t1, t2), math.max(t3, t4)), math.max(t5, t6))

	-- Check if the line segment is outside the box.
	if t_max < 0 or t_min > t_max then
		return false
	end

	return true
end

local Player = game:GetService('Players').LocalPlayer
local Camera = workspace.CurrentCamera
local Mouse = Player:GetMouse()

local rectangle = workspace:WaitForChild('Rectangle')
local half_rectangle_size = rectangle.Size / 2

local selection_distance = 1000

game:GetService('RunService').RenderStepped:Connect(function()
	local mouse_ray = Mouse.UnitRay
	local line_start = mouse_ray.Origin
	local line_end = line_start + (mouse_ray.Direction * selection_distance)
	
	local rectangle_position = rectangle.Position
	
	local rect_minus_size = rectangle_position - half_rectangle_size
	local rect_plus_size = rectangle_position + half_rectangle_size
	
	if (does_line_intersect_3D_rectangle(line_start, line_end, rect_minus_size:Min(rect_plus_size), rect_plus_size:Max(rect_minus_size))) then
		print('Intersects!')
	end
end)
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.