Release Notes for 435

These compound operators look really cool!

However, will we ever see “javascript” styled assignment chaining.

local x = y = ...

image

1 Like

If I had to guess I’d say that it’s unlikely this will happen, but not impossible. It is likely easy to set up a little transpiler plugin to do this if you’d want to do it yourself:

local x = y = 5
a = b = c = 7
local y = 5
local x = y
c = 7
b = c
a = b
1 Like

In Lua, assignments are statements, not expressions. We intentionally preserved this for compound assignments, as we believe that this makes the syntax less error prone (and harder to abuse).

Spot the bug:

let a = 2
if (a = 1) { console.log("oh no"); }

So - no.

12 Likes

Will UICorner support altering the radius of individual corners? Judging by the DevHub, it only accepts a UDim, so I assume not. I can imagine it being extremely helpful if individual corners could be altered though - I do so in my UI via the use of 9-slices.

3 Likes

It looks like that at the moment that’s a no, but I’d love to see maybe an IndividualCorners property that’d enable properties such as TopLeftCornerRadius, TopRightCornerRadius, BottomLeftCornerRadius, and BottomRightCornerRadius instead of CornerRadius, sort of like how constraints and some other instances hide properties unless you enable them to avoid clutter.

3 Likes

It would have to be more than just a little bit useful though, as that would have performance implications for all GUIs using UICorner. The corners are implemented at the shader level, and any additional branching shaders have to do has a real cost.

5 Likes

Would a function that could apply multiple properties at the same time help with performance? I’m creating my own terrain generator and a huge amount of time is spent setting the Size and CFrame of wedge parts

this is just setting the size and cframe, the cframe and size vector are already created in the neon green label to the left of “triangle”

1 Like

Setting the size shouldn’t cost that much. Are you setting the size after parenting the parts to the world? If so, then change your code to set the size (and every other prop including CFrame) before parenting and the setting of size should be a lot more performant.

A terrain generator isn’t a case that such an API would help with, because you’re generating fresh instances, and assigning properties of instances which haven’t been parented to the world yet is about as cheap as it can get.

There will always be some cost, since putting all those parts into the world requires all the physics information to be set up for them.

1 Like

So if I’m reusing wedge instances instead of creating/destroying, would it be better to parent to nil, set size & cframe then reparent to the world?

1 Like

No, because then the engine will have to re-do all of the work it did parenting it to the world and setting up physics / rendering data for it.

You could still create a prototype wedge part, and :Clone() from it before ever parenting it to the world, there’s nothing stopping you from cloning from something that isn’t in the world yet.

This is great! Setting .CFrame is a pretty significant bottleneck in my game. Most of my parts are local and graphics-only so they don’t need any expensive updates. I plan to clear the 2 tables and reuse them every frame to reduce allocation overhead.

Would it be reasonable to request an optional offsetList parameter? Almost every use-case (including Lua Draggers Beta) is going to need an additional ToWorldSpace operation for each part anyways, and it would greatly reduce the number of CFrames that need to be allocated and garbage collected for each transformation.

2 Likes

This matters way less than you think as long as you allocate your tables in a single allocation using table.create rather than growing them from an empty table. You may even make things slower if you try to dynamically change the number of things in the array and have to loop over part of it to clear it.

The CFrame garbage didn’t seem to be a bottleneck for me in my use case so I didn’t investigate something like that. I could reconsider this, you’re right that most use cases would benefit from it.

1 Like

(And @tnavarts, I believe this is what you were saying)
I believe the most performant way to cleanup arrays is actually to discard the array. Sure, it’ll cause more GC, and sure, it’ll need to do a whole new table allocation. However, if you look at what has to happen, you’ve got two choices. Firstly, on GC you may get a ton of CFrame collections when your table is discarded, however, this will also happen if you clear it. An easy solution to this is to hold all of your CFrames in a “keep-alive” table to prevent them from GCing and slowly cleaning out that table, but, it’s not really necessary to do this as GC in a majority of cases is quite speedy. Secondly, clearing out a table takes the length of that table iterations. An allocation usually can just involve multiplying
couple numbers together and throwing that into an allocation function. So, basically, ignoring what goes on in the allocation itself since that can’t be controlled, that’s one iteration vs hundreds or thousands of iterations to clear out that table, and thus, it’s likely many many times more efficient to simply discard the table and leave the rest to the garbage collector and the allocation functions used.

Additionally, I prefer to avoid table.insert in as many cases as it might make sense as it’ll make an extra __len call every time which isn’t exactly the most performant in some of my cases (particularly massive data processing in my compression algorithm benchmarks) where I could be doing upwards of a hundred thousand to maybe even a million table.insert calls depending on input data.

Lastly, rather than clearing out your table and rebuilding, you could instead use two tables. One which maps keys to values and values to keys in the array, and one that is the array. This way if you need to update the CFrame of a specific object, simply lookup its array index and bam set the CFrame in the cframes table. If you want to delete the entry,

-- Note, again it'd probably be best to use table.create if you want to populate the table. If you need to populate it and it has old stuff you want to keep, table.move is the best option and you can just allocate a new table with the add length and the old length and copy the values from the old table to the newly allocated table.
local cframes = {}
local parts = {}
local indexMap = {}
local bulkMoveListLength = 0

-- In order to BulkMoveTo
workspace:BulkMoveTo(parts, cframes, Enum.BulkMoveMode.FireCFrame)

-- In order to add an entry to the table
local part, cframe -- Assume these aren't nil
bulkMoveListLength += 1 -- Increment the length
local index = bulkMoveListLength -- Store it in a new variable for the index
-- Store the part and cframe in the lists
parts[index] = part
cframes[index] = cframe
-- Map the indexes to their values in the arrays
indexMap[part] = index
indexMap[cframe] = index -- Note: Even if the CFrame values are unique and cf1 == cf2, they are still unique userdatas which is useful in this case, however, setting two part's CFrames to the same CFrame object will not work so it's important to copy the CFrame in that case or just simply ignore the cframe index in the map.

-- In order to update a part's cframe
local part, cframe -- Again, assume non nil
local index = indexMap[part] -- This will be the old index
indexMap[cframe] = index -- See note above too
cframes[index] = cframe
-- You can do a hybrid of this and the above by checking "if indexMap[part]" (so if index here for example)
1 Like

Where did you hear this? According to the Lua 5.1 source, table.insert never does a call to __len. In fact, it just checks the length of the table without invoking metamethods. If your table only has an array part, this behavior is O(1).

1 Like

I was actually going to use a global table for all parts, call BulkMoveTo at the end of the frame, then clear the table. This approach seems best for updating unknown quantities of arbitrary parts within small/medium assemblies, as well as within my foliage system where hundreds of simple animated objects are stored in an intrusive-doubly-linked-list and updated/throttled with a fixed time budget.

Here’s what the system I’m ready to test looks like:

local GlobalBulkMoveToCount = {0}
local GlobalBulkMoveToPartList = {}
local GlobalBulkMoveToCFrameList = {}

game:GetService("RunService").RenderStepped:Connect(function()
	
	-- [Update characters and UI here]
	
	-- End of frame:
	local count = GlobalBulkMoveToCount[1]--$const
	if count >= 1 then
		GlobalBulkMoveToCount[1] = 0 -- Reset counter
		
		workspace:BulkMoveTo(GlobalBulkMoveToPartList, GlobalBulkMoveToCFrameList, Enum.BulkMoveMode.FireNoEvents)
		
		for i = 1, count do -- Clear tables
			GlobalBulkMoveToPartList[i] = nil
			GlobalBulkMoveToCFrameList[i] = nil
		end
	end
end)

Here’s how a part is added:

local count = GlobalBulkMoveToCount[1] + 1
GlobalBulkMoveToCount[1] = count
GlobalBulkMoveToPartList[count] = part
GlobalBulkMoveToCFrameList[count] = partCFrame

Here’s an overview of how my skeletal animation system works and how it would use it:

local GetComponents = CFrame.new().GetComponents

-- This creates a fast bone update function for its specific state. This is cheap compared to how many times it will be called on average.
local createFastBoneUpdateFunction = function(bone)
	local getBoneOffset = bone[2] -- This is a function that returns the bone's current offset cframe.
	local partList = bone[3] -- The list of parts connected to this bone and their corresponding offsets. Stored {part1, offset1, part2, offset2, ...}
	local staticFunctionList = bone[4] -- A list of functions connected to this bone, that only need to be updated when the bone's cframe changes. Primarily used by bones with a fixed offset.
	local mobileFunctionList = bone[5] -- A list of functions connected to this bone that need to be called even if the bone's cframe doesn't change. Primarily used by bones that are currently animating.
	
	-- There are a few hundred common cases with auto-generated source code. Here's a documented example:
	if #partList == 2 and #staticFunctionList == 1 and #mobileFunctionList == 1 then
		
		local part1, offset1, staticFunction1, mobileFunction1 = partList[1], partList[2], staticFunctionList[1], mobileFunctionList[1]
		
		return function(cframe, positionThreshold, rotationThreshold)
			-- 'cframe' is the parent bone's resulting CFrame, with the matrix transformed by CFrame.new(0,0,0, scaleFactor,0,0, 0,scaleFactor,0, 0,0,scaleFactor)
			
			cframe = cframe * getBoneOffset() -- Animate the cframe
			
			-- The thresholds are used to reduce the number of part updates, especially for characters that are far away or recently off-screen.
			--  positionThreshold = (basePositionThresholdInStuds * scaleFactor)^2
			--  potationThreshold = math.cos(rotationThresholdInRadians) * (scaleFactor^2)
			
			-- Here we test to see if the cframe has changed enough to update.
			--  This is huge optimization for characters that are standing still, but would benefit an API like 'cframe:FuzzyEq(other, positionThreshold, matrixThreshold)'
			
			local px0, py0, pz0, xx0, yx0, zx0, xy0, yy0, zy0 = GetComponents(bone[1]) -- bone[1] is the bone's last updated cframe
			local px1, py1, pz1, xx1, yx1, zx1, xy1, yy1, zy1 = GetComponents(cframe)
			if (px1 - px0)^2 + (py1 - py0)^2 + (pz1 - pz0)^2 > positionThreshold or xx0*xx1 + yx0*yx1 + zx0*zx1 < rotationThreshold or xy0*xy1 + yy0*yy1 + zy0*zy1 < rotationThreshold then
				bone[1] = cframe -- Set the last updated cframe
				
				staticFunction1(cframe, positionThreshold, rotationThreshold) -- Update staticFunctionList 
				
				-- Update partList
				local count = GlobalBulkMoveToCount[1] + 1
				GlobalBulkMoveToCount[1] = count
				GlobalBulkMoveToPartList[count] = part1
				GlobalBulkMoveToCFrameList[count] = cframe * offset1
			end
			
			mobileFunction1(cframe, positionThreshold, rotationThreshold) -- Update mobileFunctionList
		end
	end
end
1 Like

I did a recent benchmark and found that foo[#foo + 1] = bar is still faster than table.insert(foo, bar), even when foo has a metatable:

table.insert(foo, bar) can take up to 70% longer than len += 1; foo[len] = bar, so incrementing the length yourself is still fastest.

EDIT: I accidentally left my distributed profiler open all day and now I have 3 billion iterations for each test spread across dozens of clients that followed me into the server:

This is what the tests look like if someone’s curious:

-- reallocate/4/table.insert
Test = function(count, spoof)
	local v = spoof() -- This is nil
	local v1 = spoof(true) -- This is true
	local tick0 = os.clock()
	for _ = 1, count do
		local t = {}
		if v then -- This never runs
			t = spoof(t, v)
		end
		table.insert(t, v1)
		table.insert(t, v1)
		table.insert(t, v1)
		table.insert(t, v1)
	end
	local tick1 = os.clock()
	return tick1 - tick0
end,
TestControl = function(count, spoof)
	local v = spoof() -- This is nil
	local tick0 = os.clock()
	for _ = 1, count do
		local t = {}
		if v then -- This never runs, but ensures the above expression isn't compiled-away because it has no side-effects
			t = spoof(t, v) 
		end
	end
	local tick1 = os.clock()
	return tick1 - tick0
end
2 Likes

FYI: I had to remove the FireNoEvents option in v437, it was not stable enough. It turns out that in some edge cases the rendering system does need CFrame changed to fire in order to not get graphics artifacts. Fortunately you should still be able to capture enough performance improvement to make it worth it even with FireCFrameChanged.

		for i = 1, count do -- Clear tables
			GlobalBulkMoveToPartList[i] = nil
			GlobalBulkMoveToCFrameList[i] = nil
		end

That’s the part I’m talking about. Try changing this to:

	GlobalBulkMoveToPartList = table.create(512)
	GlobalBulkMoveToCFrameList = table.create(512)

With whatever number comfortably fits your typical part load. I suspect that reallocating like that will actually be faster than looping through the whole table to clear it.

1 Like

I haven’t heard anywhere that it invokes __len I just know that it invokes a length check and I assumed it involves using the same internal mechanism that the # operator uses (in the docs it states that the default value is equivalent to “#table” which is why I’d assumed this in the first place) and it’d use __len in that case. When I had asked @zeuxcg prior he had said tables don’t store an array length counter internally already which I’d suggested would likely be an optimization for table.insert because I’d assumed my efficiency issue was related to the length mechanism being used by the function (If I’m entirely wrong about the slight efficiency issue I’ve experienced with table.insert being related to the length mechanism lmk).

I also believe that it isn’t the case that it’s O(1) to make a table.insert call because again in my benchmarks I noticed table.insert was adding a (relatively) significant slowdown at a few hundred thousand calls or so. I just know that the length variable method I suggested had strongly improved my benchmarks at high iterations. and thus, I try to avoid table.insert where I can at this point. It’s likely overkill when you aren’t doing it on a mass scale but at the same time, it’s also not painful to implement or unreadable or anything, especially with += (thank you for this by the way this is my favorite syntactic change so far).

Additionally, if you refer to @Tomarty’s benchmark above he states that table.insert is about 70% slower. Again, I can’t be 100% sure this has to do with the length mechanism, Zeuxcg will probably be the one to confirm or deny that, but, that’s my assumption.

Also @tnavarts when I saw FireNoEvents get removed I had the big sad because I wanted to be able to desync parts from the server without doing mega hacky stuff. (Mainly because I can keep my terrain translation server side and per player I can translate terrain and players around them to massively mitigate deadlands artifacting, as well as being able to prevent wallhacks through making players appear in other locations)

1 Like

It seems that it’s partly right-ish. From looking at the source it’s unclear whether table.insert would append to the array or hash part, but it looks like for non-prealllocated tables it will prefer the hash part. It doesn’t call or search for __len, but it does count up how many items are present with the same binary search behavior as #. It could very well be part of the overhead you see, but at least it’s not linear in time. I reckon most of the overhead is from the call to table.insert itself, since Lua function calls are relatively expensive.

The part about tables not storing array length internally though doesn’t seem right. The 5.1 source code has a field called sizearray under the table struct which is used. When using # or getting the length of a table with only an array part, this value is read, which is why it’s O(1) behavior.

1 Like

I still really like the idea of BulkMoveTo from a performance perspective, but I feel like the Position/Orientation/Rotation performance problem could be solved through internal design changes.

I remember it looking something like this:

void BasePart::setCFrame(CFrame value)
{
	if (value != this->getCFrame())
	{
		Math::Orthonormalize(value);
		
		updatePhysics(value);
		updateGraphics(value);
		
		propertyChanged(property_CFrame);
		
		// This is almost always unnecessary for demanding use-cases
		propertyChanged(property_Position);
		propertyChanged(property_Orientation);
		propertyChanged(property_Rotation);
	}
}

Would be possible for these events be non-existent, but then piggy back off of CFrameChanged when they are requested or connected to? The Position/Orientation/Rotation updates are already unnecessary in almost all cases.

I’d be very interested in the possibility of BulkMoveMove.GraphicsOnly that doesn’t orthonormalize the cframe or update physics. One would be able to cheaply animate the resizing of a part/mesh graphically by transforming its cframe by CFrame.new(0,0,0, scaleX,0,0, 0,scaleY,0, 0,0,scaleZ). This would be useful for efficient character and foliage graphics animations that don’t require physics or raycasting, as well as for efficient model resize previews (although it’s likely material scale wouldn’t update, but would still be great for character resizing.) Cheap scale/shear graphical transformations are pretty standard in most engines.


The problem is that the global tables are referenced by various modules.

I like this implementation because it uses fewer globals/upvalues, but it does require 2 extra __index operations. I may need to profile against some of these to see if there’s any difference, but I suspect it’s insignificant relative to the BulkMoveTo operation.

local GlobalBulkMoveToState = {0, {}, {}}

game:GetService("RunService").RenderStepped:Connect(function()
	
	-- ...
	
	-- End of frame:
	local len = GlobalBulkMoveToState[1]
	if len >= 1 then
		GlobalBulkMoveToCount[1] = 0
		workspace:BulkMoveTo(GlobalBulkMoveToState[2], GlobalBulkMoveToState[3], Enum.BulkMoveMode.FireCFrameChanged)
		
		len += 32 -- Add padding to reduce the chance of reallocation
		GlobalBulkMoveToState[2] = table.create(len)
		GlobalBulkMoveToState[3] = table.create(len)
	end
end)

Usage:

local len = GlobalBulkMoveToState[1] + 1
GlobalBulkMoveToState[1] = len
GlobalBulkMoveToState[2][len] = part
GlobalBulkMoveToState[3][len] = partCFrame

Does it matter if the part has any relation to the WorldRoot? If so I may need to come up with a different solution for ViewportFrame characters.

This is great, but I’m hoping vertex deformation will eventually allow me to seriously cut back on the number of parts I’m using. I’m curious if updating bones could be optimized in a similar way.

Definitely take your time with this. There’s a lot of potential for great optimizations, but it’s also a slippery slope for API bloat long-term.

2 Likes