Ray.new(originX, originY, originZ, directionX, directionY, directionZ) and ray:components()

Tomarty · April 3, 2018, 2:01am

It is currently impossible to create a new ray without creating 2 potentially unnecessary Vector3’s that need to be cleaned up during garbage collection.

Currently we are required to do this:

local ray = Ray.new(Vector3.new(originX, originY, originZ), Vector3.new(directionX, directionY, directionZ))

I’m suggesting an overload that lets us do this:

local ray = Ray.new(originX, originY, originZ, directionX, directionY, directionZ)

Imagine if when creating UDim2’s, you needed to do this every time (this is the same idea):

local udim2 = UDim2.new(UDim.new(scaleX, offsetX), UDim.new(scaleY, offsetY))

A method similar to cframe:components() for other userdata types would be very useful for getting the values that compose a userdata without creating unnecessary userdatas:

Currently we are required to do this:

-- Ray
local origin = ray.Origin -- creates Vector3
local direction = ray.Direction -- creates Vector3
local originX, originY, originZ, directionX, directionY, directionZ = origin.X, origin.Y, origin.Z, direction.X, direction.Y, direction.Z
-- UDim2
local udimX = udim2.X -- creates UDim
local udimY = udim2.Y -- creates UDim
local scaleX, offsetX, scaleY, offsetY = udimX.Scale, udimX.Offset, udimY.Scale, udimY.Offset
-- vector types
local x, y, z = vector3.X, vector3.Y, vector3.Z
local x, y = vector2.X, vector2.Y

I’m suggesting this:

-- Ray
local originX, originY, originZ, directionX, directionY, directionZ = ray:components()
-- UDim2
local scaleX, offsetX, scaleY, offsetY = udim2:components()
-- perhaps even vector types for consistency:
local x, y, z = vector3:components()
local x, y = vector2:components()

This feature request can extend to a few other userdata types such as Region3’s and Rect’s.

EchoReaper · April 3, 2018, 6:01am

Spoke to an engineer and this wouldn’t have any performance gain. The garbage collector is actually really good at cleaning up vectors/etc, and micro-optimizing it like this would hurt performance, and more importantly, readability. If you’re having performance issues with rays, there’s room for improvement in the internal implementation that basically gives performance improvements to developers for free. You’d probably see that optimized before something like this.

Tomarty · April 3, 2018, 11:48pm

I don’t believe they impede readability. These API’s are pretty intuitive, especially considering that they’re consistent with cframe:components(), and the current fast overloads for UDim2.new and Rect.new.

These suggestions are also definitely not micro-optimizations, these new API’s would be much faster regardless of garbage collector performance.

I just ran some benchmarks on the live game using my profiler and I am amazed by the results:

Using cframe:components() just to get the position is over twice as fast as indexing X, Y, and Z on a Vector3 individually. All vector types should be given a :components() method.
Getting a ray’s components by creating extra Vector3’s is extremely slow. Rays should also be given a :components() method.
UDim2.new(0,0,0,0) and UDim2.new(UDim.new(0,0),UDim.new(0,0)) are a realistic analog for how Ray.new(0,0,0,0,0,0) would perform, and I predict the Ray.new overload would be over twice as fast. Many games create hundreds or even thousands of rays per second, this is very significant.

The performance results may be in hundreds-of-nanoseconds, but these results would be far worse on low-end devices, and developers like myself will always try to push the engine to its limits.

If you’re wondering how I’m getting these numbers, this is what my Test modules look like:

My profiler randomizes call ordering so everything is accurate, this is how the tests are run:

Fractality_alt · April 4, 2018, 1:14am

Are you currently in a situation where userdata indexing is an actual bottleneck?

Tomarty · April 4, 2018, 3:10am

I’ve come across a few situations:

Heavy Raycasting

vector3:components() would make reading the position and normal faster.
Ray.new(originX, originY, originZ, directionX, directionY, directionZ) would make dividing rays into segments for things like projectile paths and bullet-penetration faster.

A few use-cases: Mini-map rendering, simulating many projectiles, customized physics for cars/characters/quadrupeds, simulating rainfall, light-weight invisicam and fast camera collisions.

Storing large vectors

It’s surprisingly much more efficient to create userdatas than it is to create tables filled with numbers, although it is much slower to read values. The usefulness of this case scales directly with how fast userdatas can be indexed.

Some use-cases: Neural networks, storing game data.

This is unconventional, and cframe:components() already works (assuming you don’t need to replicate the vector).

Custom hit-detection

Similar to the previous case, except less unconventional.
How many collisions can be tested scales directly with how fast userdata indexing is.

Some use-cases: Safe/consistent building tools for houses, cuboid collision detection, custom raycasting.

Userdata hashing

Userdatas can be equal, but still reference different objects. For example:

local t = {}
local key1 = Vector2.new(1, 2)
local key2 = Vector2.new(1, 2)
t[key1] = true
print(t[key2])
print(t[key1])
> nil
> true

Some of the CoreScripts might still assume otherwise: BubbleChat PlayerScript 'TextSizeCache' memory leak

I don’t expect roblox to store every userdata in an internal hash table, but sometimes I need this behavior so I do it myself using weak tables.
Here’s how I might implement this for rays (ignoring hash collisions):

local hashLookup = setmetatable({}, {__mode = "v"})
local function hashGet(v)
	local origin = v.Origin
	local direction = v.Direction
	local hash =
		origin.X +
		origin.Y*25.039570006870 +
		origin.Z*2.4636903606487 +
		direction.X*25872.624154663 +
		direction.Y*404.89306915749 +
		direction.Z*120.16744135364
	
	local v2 = hashLookup[hash]
	if not v2 then
		hashLookup[hash] = v
		return v
	elseif v == v2 then
		return v2
	else
		-- hash collision
	end
end

A :GetHash() method would be more ideal for this use-case, but :components() is more useful to everyone. Regardless of performance, :components() can be much easier to type, especially for ray’s and udim2’s.

Even if in most cases these optimizations aren’t significant, I want to know I can create twice as many rays and read twice as many vector3’s before impacting performance, and thus simulate twice as many characters and twice as many bullets before impacting performance.

Seemingly-tiny optimizations become huge once they’re applied to a case who’s usefulness scales with speed.

The bottleneck often surfaces when many scripts are doing many different things on low-end devices (especially on devices like the iPhone 4S).

The performance benefits are similar in magnitude to the __namecall optimization, where the goal is to minimize trips between Lua and the engine. A potential alternative would be to implement userdatas as Lua types internally, where library accesses like ‘Vector3.new’ would need to somehow be elevated to keyword-status to make use of the global constant table that Lua uses, but that would be quite the undertaking. Lua’s speed and flexibility is a reason to use Roblox over other game engines, but Lua is still much slower than compiled machine code.

It also doesn’t hurt that these optimizations are already consistent with the API, as they’re analogous to cframe:components() and UDim2.new(0,0,0,0).

Fractality_alt · April 5, 2018, 10:29am

Those things are a relatively minor part of what makes raycasts slow.

The cost of Ray.new is insignificant compared to the time spent in FindPartOnRay–for the latter, you have the fixed cost of bridging the arguments and return values to and from Lua, as well as the cost of the spatial query itself which scales linearly to the number of parts near the ray.

I’m working on some changes to FindPartOnRay that reduce the fixed cost by ~4x. There are also other changes in the works that will help it scale better in large, detailed levels.

Generally, if a system is slow, the “roblox way” is to just make it faster instead of dancing around the issue with api contortions.

Tomarty · May 26, 2018, 9:33am

This is great! Raycasting is a huge bottleneck for games.

I understand the hesitation, but I ran the benchmarks again on mobile and these features would have massive performance gains.

Here are the tests run on my relatively new Samsung Galaxy Note8 after over 6 million iterations:

Creating a few hundred rays is enough to start competing with rendering for cpu time.

It’s even worse on older devices.
Here are the raw results:

Methods like UDim2.new(scaleX, offsetX, scaleY, offsetY) are both performant and easy to read, and I think they should be applied to other types as well. cframe:components() performs insanely well for reading values in bulk, and I’d love to see that applied elsewhere too. The only alternative with comparable performance would be to bake these userdata types into Lua itself, which probably won’t happen any time soon.

I don’t believe these are API contortions, they are powerful features that will make low-level Lua code run a lot faster, which is essential when developing for mobile devices.