Why does Random behave this way?

MettaurSp · February 15, 2018, 4:59pm

The time varies even more with only a single execution. I’d guess probably differences in GC timings and allocations
.

> for i=1,10 do local t=tick() for i=1,100000 do local t=v2.lerp end print(tick()-t) end
0.025724649429321
0.024367809295654
0.036292552947998
0.021488666534424
0.034527778625488
0.02149510383606
0.034675121307373
0.021718263626099
0.035181999206543
0.021867752075195

acreol · February 16, 2018, 4:57am

What if it has something to do with namecall?
Maybe before namecall existed, the functions were cached. (either in a metatable with .__index or in each of the individual objects - which is faster but stores extra pointers to the functions per object so is a bit more memory expensive) But after namecall, they thought that there was no need to actually store pointers to the functions at all on the lua side since most people will just use namecall and those that do cache the functions will cache them themselves once (and only once), so there is no need for the internal side of rbxlua to cache pointers to the functions in the metatables or objects - especially if those functions end up never being called on without namecall.

Autterfly · February 16, 2018, 5:10am

Not at all the case. They aren’t being “cache’d”; the metatable is simply pointing to the same CClosure holding a C function.

It isn’t faster. Unless pointers are absurdly far away in memory (which I both doubt causes an issue or is the case here), there is no point in this. The Instance functions had already been the same prior to __namecall, and __namecall depends on SELF anyways, so it’d have nothing to do with normal __index operations.

(In fact, the most they “cache’d” previously was putting proxy references to the functions in the global state, strictly for the same functions, and strictly for __index.)

zeuxcg · February 16, 2018, 5:37am

Quick note on the method resolution results you guys are seeing.

In our Instance bridge (that applies to all Instance-derived types, like Part etc.), we cache C closures we return as a result of __index lookup. This is a legacy optimization - in a way it’s been superseeded by namecall. If you don’t do that, you get a closure allocation every time you query the element (since we push a C closure on the Lua stack) - so it was important for performance back in the day. With namecall, you don’t even go to __index, so it’s not that important - but it’s still in the code.

In our AtomicClass bridge (that applies to all non-Instance-derived types, like Vector3, Random, etc.), we never cached C closures in __index lookup - this is mostly because they didn’t go through a unified code flow so it was non-trivial for us to do this. When we added __namecall support we restructured the code to unify method resolution for atomic classes (that supports __namecall), but we didn’t go back and add __index caching because we figured __namecall is the dominant pattern anyway.

So you’re now seeing the inconsistency you’re seeing. Use __namecall (as in, just call methods like v:lerp() and don’t try to cache them)

acreol · February 16, 2018, 6:03am

I’m pretty sure, for me at least, namecall isn’t faster than caching the functions once (even for instances) Do you guys plan on completely removing the option eventually to cache the functions/will you make them have different behavior than the namecall ones?

also @Rerumu

Isn’t this caching then because its storing a pointer to the “CClosure holding a C function”?

Also storing values (pointers to CClosures holding C functions in this case) inside of a table as opposed to its metatable (which you wouldn’t even need if you’re storing inside of a table unless you need other metamethods) and using .__index is faster as can be seen

here:

local tick = tick local count = 1000000

do
local t = {}
local mt = {}
setmetatable(t,mt)
mt.__index = mt
mt.k = true

local start = tick()
for _=1,count do
	_=t.k
end
print(tick() - start)

end

do
local t = {}
t.k = true

local start = tick()
for _=1,count do
	_=t.k
end
print(tick() - start)

end

Theres almost a 2x speedup with per table pointers

That seems logical because going off of the model described in the previous post, if it weren’t this way, then every time you do a name call (index+call) you’re creating a new function.
But it seems illogical that it was never cached in the other non Instance userdatas (I guess its since its

?)

zeuxcg · February 16, 2018, 6:19am

__namecall should be on par with calls via cached function resolved via __index (they can be slightly slower since we do need to look up the method, they can be slightly faster since stack setup is more optimal and we might do somewhat more direct object calls for this in the future). We don’t really have plans to remove the function cache but we might do it at some point - it shouldn’t affect the performance for scripts that do cache the method (since allocation will only happen once per a large block of code anyway).

Autterfly · February 16, 2018, 6:32am

What?

But that’s not how it works; we’re talking about userdata, not other tables. The metamethod is always getting invoked in this case.