Yeah I suppose I worded that poorly, I’m glad you pointed that out. I meant to emphasize that it’s a little more descriptive about intent, e.g. the input is expected to be be a table (which will have key value pairs) vs any old iterable, which can definitely matter.
There’s definitely no objectively correct choice for tables because functionally they’re the same, and which one you use probably won’t won’t really change how easy it is to scale up code or anything, but, it might sometimes be preferable to use pairs because of its explicitness.
I am definitely happy all around with the feature so far
I have a problem with the __iter metamethod’s implementation.
I have a sandboxing tool here which wraps around objects with metamethods. In order to do this though, I need to be able to essentially generate a copy of the original metamethod without touching that metamethod (I use this for values going in AND out of the sandbox, so some values I need to wrap are going to be unmanaged!)
With the __iter method, there does not exist any function which can return the generator, state, or index produced. pairs throws because the input is not a table, as I feel it it should.
In the RFC, it is noted that the equivalent of how t[index] is to rawget(t, index) is as in t is to in pairs(t). This is true, however, t[index] can be invoked as an expression, meanwhile getmetatable(t).__iter(t) cannot be safely invoked as an expression, and I can’t access the generator, state, and index and therefore these values go unsandboxed, providing a way for users to define code in completely unmanaged space which can access unmanaged values, even from my own managed tables.
Additionally, I am even unable to do something smart like the following, wrapping a real iterator inside of a coroutine, and returning a function which advances the state by resuming it repeatedly. The generator can return a variable number of results but I can only capture a finite number of results.
This example which mimicks the structure I require runs as expected, but only if the iterator uses two or less arguments. A vararg is not valid syntax. Very very thankfully, iterators cannot yield, but if they could, this example would be invalid for that reason because it would cause the two iterators that end up processing to lose their synchronization.
local metatable = {}
local proxy = {}
-- Psuedo-code
local function getUnmanaged(value)
return {
abc = 123,
cde = 234
}
end
local function sandboxAllTheResults(...)
return ...
end
metatable.__iter = function(sandboxed)
local real = getUnmanaged(sandboxed)
return coroutine.wrap(function(x)
-- Instead of index, value, if a vararg (...) were placed here it would cause a syntax error
for index, value in x do
coroutine.yield(sandboxAllTheResults(index, value))
end
end), real
end
setmetatable(proxy, metatable)
-- cde 234
-- abc 123
for index, value in proxy do
print(index, value)
end
So, there are two solutions that solve this:
Allow varargs in for loops
Provide a way to access the results of the __iter metamethod directly (Preferable to me since it allows me to manage the generator itself)
Why not? I’m a little confused at the description above, but short of tables with locked metatables (which you can’t introspect reliably, but neither can you introspect any other metamethod so I don’t see how you can wrap an object with a locked metatable in general), you should be able to return a proxy that forwards __iter. For example:
local function proxy(v)
local function proxyiter()
print("proxyiter")
assert(type(v) == "userdata" or type(v) == "table")
local mt = getmetatable(v)
if mt and mt.__iter then
return mt.__iter(v)
else
assert(type(v) == "table")
return next, v
end
end
return setmetatable({}, { __iter = proxyiter })
end
for k,v in proxy({1,2,3}) do
print(k,v)
end
local mt = {}
function mt:__iter()
local index = 0
return function()
if index >= self.count then
return
end
index += 1
return index
end
end
for i in proxy(setmetatable({count = 3}, mt)) do
print(i)
end
P.S. Maybe the confusion is that you aren’t sure how many results __iter can return, but it can return at most three, so if you want to wrap/proxy functions somehow you can instead do:
local gen, state, index = mt.__iter(v)
-- do some work on gen/state/index
return gen, state, index
The Lua iteration protocol, which __iter follows, only uses three values - generator, state, and index (which is fed into the generator repeatedly on every iteration and becomes the first loop variable). __iter doesn’t change that.
Thank you, this is exactly what it is, I never knew this somehow haha. I guess I’ve never actually tried to use more than three values from an iterator.
Basically, I just need to define some code that will invoke what every metamethod would normally do, and I need to cover every case, no matter the inputs/outputs. I don’t need any access to the metamethod at all, or even what it returns, as long as sandboxed code can’t access what it returns either.
For example, the functionality of each metamethod can be fully described like so: __index - return target[index] __call - return target(...) __len - #target __add - return target + subject __newindex - target[index] = value (no return value)
etc, and apparently, since there are only three results, in this case, __iter - for a, b, c in target do (with some coroutine magic)
What is nice about this is that even for something that isn’t a metamethod or behaves completely differently like __metatable or __mode, it generalizes: __metatable - getmetatable(target) __mode - nothing, you can’t access the value of __mode unless you have a reference to the metatable (just like any other metamethod)
All I need to do is gaurantee I can invoke the above, and insert my own code before and after. This allows me capture and modify the values entering the sandbox, and the values exiting, which essentially means I have complete control over everything the code running inside may/may not do.
My usage of “safely” is very misleading and I didn’t really think it through haha. The thought process was return getmetatable(target).__iter(target) would describe the metamethod except when target had __metatable set on its metatable, and that is “unsafe” because I am not describing the metamethod in a way that I can manage the inputs and outputs. (So, I guess, it literally is an “unsafe” way to represent it in my sandbox, but that makes zero sense without any context whatsoever)
Thank you for the reply, I apologize for my confusion.
P.S.
Here is my current solution as implemented in my code, which I believe covers every case correctly now, if you’re curious about what I am actually even doing.
The rawequal check covers the fact that the methods being called (:Import()/:GetClean()) are capable of returning nil, and the result is what the value should be functionally equivalent to.
(Except for something functionally equivalent to pairs where the value becomes nil, but, there wouldn’t really a be a case could solve this no matter what I do)
-- External -> Sandbox
self:CheckTermination()
self:TrackThread()
local real = self:GetClean(object)
return self:Import(coroutine.wrap(function(object)
local real = self:GetClean(object)
for index, value, extra in real do
index = self:Import(index)
if not rawequal(index, nil) then
coroutine.yield(index, self:Import(value), self:Import(extra))
end
end
end)), self:Import(real)
-- Sandbox -> External
self:CheckTermination()
self:TrackThread()
local real = self:GetClean(object)
return coroutine.wrap(function(object)
local real = self:GetClean(object)
for index, value, extra in real do
index = self:GetClean(index)
if not rawequal(index, nil) then
coroutine.yield(index, self:GetClean(value), self:GetClean(extra))
end
end
end), real
To cover cases that use getmetatable, rawset, rawget, etc, I just wrap functions too. I don’t even need to re-define them, it just works!
…except for code clarity. The reader does not have the entire type system in their head. They won’t always know the structure of the table, or the exact method of iteration that’s intended. Additionally, I don’t trust Luau to pick the right solution for every table. Omitting an explicit iterator function seems like it could have undesired effects at runtime. I don’t trust it.
So using pairs and ipairs explicitly has to do with intent and predictable behavior.
I wanted to address this because there’s some misconceptions here - again, as said before, you’re of course free to continue using pairs/ipairs. But I think it’s important to clarify that wrt intent that you refer to, they are more significant as a stylistic choice - a possibly very reasonable one! - and not really something that has to do with predictability. The reason why I want to clarify this is because generalized iteration has somewhat different properties from what you imply:
Generalized iteration is not dependent on the type system, the results of type inference, etc. The type inference engine supports it for the sake of type checking, but if you were to completely disable the type checker and remove all type annotations, no program will change its iteration behavior as a result.
Generalized iteration can not pick the wrong solution for a given table because it doesn’t pick. It uses a single, general, iteration mechanism. This is important because there’s nothing about it that isn’t predictable or isn’t trustworthy in the way that pairs/ipairs aren’t - it doesn’t select one of pairs/ipairs based on what it feels like, it’s a single algorithm.
The algorithm generalized iteration uses is a perfect superset of iteration behavior specified by pairs/ipairs in Lua. It will traverse all key/value pairs (just like pairs), but unlike pairs (which in Lua doesn’t specify the iteration order), it guarantees the iteration order for elements with indices 1..#t (which, by definition of #t, will traverse all the elements traversed by ipairs in the same order).
The cases where the behavior is going to be different between generalized iteration and ipairs are when you have holes in the array portion of the table - ipairs stops at the first hole, generalized iteration continues - or if you have a mixed table. However, this is usually an odd side-effect of ipairs, and rarely you’d want to intentionally iterate over mixed tables or tables with holes with ipairs, which is what the statement “you likely don’t need to use pairs/ipairs anymore” is based on.
So, it’s fine to prefer pairs/ipairs because you want to signal the intent to the reader of the code, but it’s usually superficial because generalized iteration can’t suddenly decide it wants to iterate over your table in a way that you didn’t expect, and it will always traverse the same elements that ipairs traverses in the same order - it will simply also traverse all other elements after that.
My take on this is a lot simpler: If you’re iterating over a table that has both an array part and a hash part 99% of the time that’s unintentional and you have a bug. In that case where you have a bug, using ipairs or pairs is not going to fix the bug, the bug will still be there. If anything you’d rather that the bug show itself as early as possible in the code so that you can fix it, and hiding it with ipairs is counterproductive in that regard.