Inline caching optimization unobserved?

Luau supposedly has inline caching as stated here, but when I benchmark to test, I dont see the optimization taking effect.

Results:

Code:

local Class = {}
Class.__index = Class

function Class.new()
	return setmetatable({ bool = true }, Class)
end

function Class:Method()
	self.bool = not self.bool
end

local new = Class.new

return {

	ParameterGenerator = function()
		return
	end,

	Functions = {
		["normal call"] = function(Profiler)
			for _ = 1, count do
				local c = Class.new()
			end
		end,
		["cached call"] = function(Profiler)
			for _ = 1, count do
				local c = new()
			end
		end,

	},
}

We can clearly see that the cached new function runs much faster than Class.new, which shows that inline caching is doing absolutely nothing?


I went ahead and tested the global access chains optimization which has the same concept as inline caching (syntactically speaking at least) and it works as advertised.

Results:

Code:

return {

	ParameterGenerator = function()
		return
	end,

	Functions = {
		["inline"] = function(Profiler)
			Profiler.Begin("Create")
			local t = table.create(count / 2)
			Profiler.End()

			Profiler.Begin("Insert")
			for _ = 1, count do
				table.insert(t, true)
			end
			Profiler.End()

			Profiler.Begin("Clone")
			local clone = table.clone(t)
			Profiler.End()

			Profiler.Begin("Clear")
			table.clear(t)
			table.clear(clone)
			Profiler.End()
		end,
		["localized"] = function(Profiler)
			Profiler.Begin("Create")
			local t = create(count / 2)
			Profiler.End()

			Profiler.Begin("Insert")
			for _ = 1, count do
				insert(t, true)
			end
			Profiler.End()

			Profiler.Begin("Clone")
			local clone = clone(t)
			Profiler.End()

			Profiler.Begin("Clear")
			clear(t)
			clear(clone)
			Profiler.End()
		end,
	},
}

As we can see from that benchmark, caching a global field by localizing is unnecessary as it is already cached for us by that optimization.


So what is happening in the first benchmark? Is that not an example of what inline caching optimizes or?

All benchmarks performed by Benchmarker Plugin

2 Likes

In the case of inline caching its only an optimisation that happens when you’re accessing a string key in a table as long as it fits certain conditions, it will generate a LOP_GETTABLEKS instruction where C is the node slot and the aux data is the constant location of the string key. cached_call will be faster because it does not have the cost of accessing the table. It’s observed always because any table you access will attempt to do it as long as it does not have metatable, and if Luau cannot find the value in the expected slot it will attempt to dynamically change C so that subsequent accesses will be faster.

In the case of global access chains its an optimisation for globals and has a special opcode and constant to go alongside it, The LOP_GETIMPORT has a fast-path where the constant it uses is the function it is importing. If in the case safeenv is false or the global being imported is not a built-in function there is an aux data that contains 3 constant indices that are accessed. In the cases where you do math.floor or similar you will be using LOP_GETIMPORT’s fast-path but in the case you do something like game.Players.LocalPlayer you will be using the slow-path however both paths are faster than just not have global access chains and using LOP_GETGLOBAL.

Small note when doing benchmarks like this, µs is a microsecond! That’s 0.001 ms or 0.000001 seconds.

https://github.com/Roblox/luau/blob/master/VM/src/lvmexecute.cpp#L451-L478
https://github.com/Roblox/luau/blob/master/VM/src/lvmexecute.cpp#L415-L439
https://github.com/Roblox/luau/blob/master/VM/src/lvmexecute.cpp#L319-L348

2 Likes

Benchmarker uses os.clock for timings which is only accurate to a single microsecond, so if your benchmarks are 0.X microseconds then your data is unreliable and you should throw some loops around it.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.