All you need to know about Iterator Functions

starmaq · May 13, 2020, 4:44pm

In this article we’ll be covering an important chapter of the lua pil, iterator functions. And for those wanting to dash through this, here is a tl;dr version.
You might’ve encountered this term before, when mentioning either pairs or ipairs.

`I. Introduction`

First of all, I want you to change the way you view a generic for loop. A generic for loop is the for ... in x loop. Most of the time, you’d find yourself writing this.

for i, v in pairs(t) do

end

where t is the table you wanna loop through (or in otherwords, iterate through). Most beginners learn this and just think it’s a solid set of keywords including pairs that can’t be changed (e.g. calling this last snippet of code an in pairs loop), and same thing when they encounter ipairs. Would you be suprised if I tell you that pairs (and ipairs) is just a function, and that you can put any other function you made instead of it. Earlier I described the generic for loop as

for ... in x do

end

I think this is a brilliant way to think about it. ... and x are the only independant parts that can be changed. x is the function (and to not cause confusion, it’s not the iterator function here, we’ll talk about that later) that takes the table to iterate through as input. ... are the pieces of info that the function gives back (it’s not actually the function that gives those pieces of info, it’s the iterator function that does, again we’ll talk about it later), for example pairs gives back the current value’s index (commonly called the variable i) and the current value (commonly called the variable v), you can actually make a function that gives back more than 2 pieces of info, or even less!

Now let’s talk about how you would write a function like these ones that can be used in a generic for loop.
Something that you might’ve knew before is next, which is also another function. pairs actually returns next, next is the iterator function here.

Ok so wait, let’s wrap it up. The function x that we want to write needs to return a function, and not any function, an iterator function. An iterator function determines the next value to be returned from the inputted table given the previous index, you can see that’s actually how next works, it takes two inputs, the table being iterated and the current and the previous index, if I wanted the 3rd index, I do next({1,3,4,5}, 2) which is basically saying I was previously at the 2nd index get me the value from the next index. And that’s how the generic for loop works, it uses the iterator function returned from x, starting with 0 (it starts with 0 so at the first iteration the determined value is 1, I hope that makes sense) and keeps on incrementing that value until the iterator function returns nil, because technically if we had a table {1,2,3,4} and we said I was previously at 4, it will give the 5th value which doesn’t exist, in otherwords it’s nil, thus nil is returned and the loop stops. As you can see we just provide an iterator function and the generic for loop does the repeating job.
Also, if the returned function is called an iterator function, then what is x called? It’s actually called a factory, in lua, a factory is a function that returns a function, just like in our case. You’ll see me use this term a lot.

Let’s write some code to see what’s happening. I’m gonna implement something simple to loop through a table.

local function xpairs(t) --I called the factory xpairs
    local idx = 0
    local function iterator()
        idx = idx + 1
        return t[idx]
    end
    return iterator
end

You can see exactly what we talked about put into action. The factory is called xpairs and the iterator function it returns is called iterator. I keep a variable called idx, this is our state, it’s used to keep track of the previous index, you can see at the start idx is 0 and not 1, because iterator increments before returning the value, so at the first iteration the index is 0+1 thus 1. The way iterator is written is pretty straight forward, it increments the previous index by 1, and returns whatever value is at that index.

for v in xpairs({"hi",true,3,4}) do
    print(v) -- actually prints the elements of the table sequently
end

v is the current element returned by iterator, you can see here we only have one piece of info which is the current value, unlike next which returns the current index and its value. If I wanted to do that, I would replace the return line by return idx,t[idx] and additionally check if we hit the table’s length yet (e.g. idx == #t) because t[idx] would be nil if we’re out but idx is gonna be always an increasing number thus not only nil is returned and the loop keeps on going.

`II. Iterator examples`

Now that we know how to properly write iterators, we can write even more iterators!

Let’s start by trying to make a replica of pairs. Wait, haven’t we already done that with xpairs? Well there are some key features that pairs, or I should actually say next, and we don’t. For example, with next, if you had a table like this {1, 2, nil, 3}, it actually traverses through the whole table ignoring nil at index 3 and carrying on later to the 4th index (which is 3) and stops. With our iterator in xpairs, we would stop as soon as we return nil, when we get to index 3 it returns nil and stops looping even though there is more ahead. How would we do that? Well since #{1,2,nil,3} is actually 4 and not 3, we can simply check if the current value is nil and we still haven’t hit the length of the table yet (e.g. idx ~= #t), if so, we increment idx yet again, until there is no nil and we hit the table’s length.

local function xpairs(t) --I called the factory xpairs
    local idx = 0
    local function iterator()
        idx = idx + 1
        while t[idx] == nil and idx ~= #t do --as long as the value is nil and we're not done
          idx = idx + 1 --keep on adding one, this while loop will stop as soon as we hit a non-nil value 
        end
        return t[idx]
    end
    return iterator
end

And you’ll see that it works! What about keys? next can traverse the dictionarry part of a table, how would we do that? Well, unfortunately, we can’t. There isn’t really a way to get back the keys of a table in plain lua. How does next do it then? Let me remind you that any gloabl lua function (e.g. print, unpack, pairs…) is written in the C-side of lua, meaning it’s written in the C language, which means they might have stuff that we don’t have access to in lua.

What if I wanted to implement ipairs? Well funny enough the first xpairs we wrote does exactly what ipairs does! ipairs will iterate through only the array part of the table (meaning the values with numerical indices) and ignore the dictionarry part (keys), which is also what we’re not doing. Also, ipairs stops as soon as it hits a nil value, even if there is more ahead, which is also what we were doing. ipairs doesn’t return next, it returns a different iterator, which is obvious since as you can see it has a different behaviour. We don’t really know the name of the iterator that it returns.

Let’s try implementing more stuff!

Let’s make an iterator that iterates through every character in a string.

local function spairs(str) 
  local idx = 0 
  local function iterator()
    idx = idx + 1
    if idx <= #str then --as long as we didn't hit the string's length, if we did then the iterator will return nothing, in otherwords return nil and the loop stops
      return idx, string.sub(str, idx, idx) --this is a way to return a certain character of a string, I hope you understand it
    end
  end
  return iterator
end

for i, char in spairs("starmaq101") do
  print(i, char)
end

string.gmatch is a factory in fact, and it returns an iterator which iterates through every matching string with the pattern you gave. Info

for match in string.gmatch("hi hiya hiyo", "hi") do

end

How would we create one? Well very simple. Utilising the third argument of string.match, which is from where to start searching for matches, we can keep a state variable just like idx, we would check the first time for any matches with the given pattern, if one exists we would keep track of where that match ended, and return it, and keep on searching depending on the state variable.

local function gmatch(str, pattern) --you can pass more than a table
  local last = 1 --this is our state, from where we should search
  local function iterator()
    local match = string.match(str, pattern, last) --this is the current match
    if match then --if a match existed, then we'll return it
        last = select(2,string.find(str, match, last))+1 --this is how I determine the next position to search from, string.finding where the current match using the previous last as the third argument, string.find returns two things, the start position of the found match and the end position as the second reutnred value, I use `select` to select the returned value I want, which is the second, hence I do select(2, the two values string.find returns), and of course I add 1 to it so we start searching after it
        return match
    end
  end
  return iterator
end

for match in gmatch("ghi, hi", "%l+") do --%l+ is a string patterns, it's basically saying grab each set of letters, which are ghi and hi
  print(match)
end

And you can get creative with this however you want! What if I wanted to make a an iterator which only goes through string values of a table

local function OnlyStrings(t) 
  local idx = 0
  local function iterator()
    idx = idx + 1
    while type(t[idx]) ~= "string" do
      idx = idx + 1
    end
    return t[idx]
  end
  return iterator
end

for v in OnlyStrings({2,"hi",true,"hgf","kno"}) do
    print(v)
end

What about an interesting iterator

local GetProperties(obj)
  local properties = {"Name","Anchored", "Transparency"} --keep a list of the properties that you wanna include when looping
  local idx = 0 
  
  local function iterator()
    idx = idx + 1
    local property = properties[idx] --the current property
    return property, obj[property] --return the property's name and the property's value, obj[property] is basically the same as obj.property (e.g. obj.Transparency can be written obj["Tansparency"])
  end
  return iterator
end

for property, value in GetProperties(workspace.Part) do
  print(property, value)
end

Anyways, let’s continue!

`III. Stateless iterators`

This is going to be the most important section, so buckle up.

I’m gonna bring up something interesting, let’s try printing what pairs returns.

print(pairs({1,2,3,4}))

And the result is

function: 0x1016460 table: 0x101c350 nil

Wait… It returns three things? Did I lie to you this whole time by telling you that it returns an iterator function and just that?

Well, this whole time we were using stateful iterators. Stateful iterators are iterators that keep their state. What does that mean? It’s the idx variable we always used. With stateful iterators, you keep your state internally, you’re the one who takes care of it by creating it and incrementing it. With stateless iterators, you don’t keep your state, it’s given to you. Given by who? Well the generic for loop, but we’ll talk about that later. Let’s demonstrate what I mean.
I’m gonna define a simple factory and iterator function. I’m not gonna use them in a for loop, I’m just gonna use the iterator function returned manually to see its behaviour.

local function xpairs(t) 
    local idx = 0
    local function iterator()
        idx = idx + 1
        return t[idx]
    end
    return iterator
end

local iter = xpairs({1,2,3,4})
print(iter()) --1
print(iter()) --2
print(iter()) --3

As you see, this is a stateful iterator, it keeps its state internally. Whenever we call it we don’t need to pass an argument or anything, it just knows what value to return next because it’s the one keeping the state.

I think I made it clear earlier that pairs is -oh wait- next is a stateless iterator. Remember at the start, next needs to get passed the previous index, meaning the state, as a second argument. It’s statless! It doesn’t save the state, it’s given to it.

Ok so we just talked about all this theoretically, let’s write actual code. I’m gonna forget about pairs because it’s a bad example, I’m gonna write my own factory with a stateless iterator. As we’ve seen with pairs, a factory with a stateless iterator returns three things: the stateless iterator itself, the invariant which is basically the inputted table to iterate through (the table won’t be affected or changed which is why it’s called the invariant) and the control variable which is technically speaking our state, at first we need to return the initial starting control variable, basically the starting value for the state which in most cases we covered is 0. In our factory, we need to return these three things for the generic for loop to use. In return, each iteration the stateless iterator gets passed the invariant and previous control variable as arguments (exactly why it’s stateless) in order to determine the next value, increment the control variable, and, this is important, it has to return the incremented control variable which will be passed as the control argument in the next iteration and the current value as a second value, which means it is essential to return the current index/state before any other value. And just like stateful it will keep on looping until the iterator returns nil. Let’s re-write our xpairs so it returns a stateless iterators.

local function xpairs(t)
  local function iterator(invariant, control) --each iteration, the table and the previous control variable are passed as arguments
    control = control + 1 --we increment the control variable each time, at the first iteration the control variable will be 0, we increment it so it becomes 1 use it and return it, so in the next iteration we get passed 1 increment it again so it's 2 and so on.
    if control <= #invariant then --we have to add this now beacuse we are returning the index as I explained at the start
        return control, invariant[control] --we return the current index so it's passed again in the next iteration and the pieces of info which is the value in this case
    end
  end
  return iterator, t, 0 
end

for v in xpairs({1,2,3,4}) do
  print(v) -- actually prints the elements of the table sequently
end

Fascinating!
If we were to break down what’s happening, the factory xpairs first returns the invariant which is {1,2,3,4} and initial control variable which is 0, so {1,2,3,4} and 0 are passed to iterator, iterator increments control variable, returns and returns the current value, and the next iteration the incremented control variable returned from the previous iteration is passed again which is now 1, it’s incremented, returned, and you get the picture.

When I said the stateful version of xpairs is a replica of ipairs I kind of lied because ipairs returns a stateless iterator just like pairs. The above code is actually how ipairs is written. string.gmatch for example returns a stateful iterator. How do I know that? Well if you printed what it returns you can see it just returns a function and nothing else.

Sometimes you’d see people writing something fancy like this

for i, v in next, {1,2,3,4}, nil do
  
end

These are really just the values that pairs({1,2,3,4}) would’ve returned, but instead of calling pairs to get those we write them straight away, it’s just some way to pretend to be like an epic scripter and confuse beginners. Note that the , nil could’ve been omitted. pairs is really just written like this, next does the juicy stuff.

local function pairs(t)
    return next, t, nil
end

Also don’t you find it confusing that pairs returns nil as an initial control variable? That’s why I said it was a bad example. Well pairs is written in the C-side as I said so we can’t really know what’s happening. But logically, it does this so it can make looping through keys and numerical indices at the same time easy or something.

Also, what about stateful iterators? When a factory returns only a stateful iterator doesn’t that disrespect the fact that the in x do part takes three values? Well really, if it’s not returning anything for those two values, it’s kind of like returning nil, so basically nil and nil are being passed to the iterator function, and since we are not using them anyways it’s not a problem.

So, which is better to use, a stateful iterator or a stateless iterator? Well use which one you find fitting. It’s a choice really. Personally I find stateless iterators cooler due to the notion of being passed the previous index. The only disadvantage to stateless iterators, as I said earlier, you are forced to return an index (state) first along with a value, if you wanted to make an iterator that only returns the value and not the index well you have to use statefuls.

Anywho! Let’s move on and try to re-write some of the previous iterators we wrote earlier to transform them into stateless ones. The loop through a string one.

local function spairs(str)
  local function iterator(inva, ctrl)
    ctrl = ctrl + 1 
    if ctrl <= #inva then
      return ctrl, string.sub(inva,ctrl,ctrl)
    end
  end
  return iterator, str, 0 
end

for i, char in spairs("starmaq101") do
  print(i, char)
end

The OnlyStrings one

local function OnlyStrings(t) 
  local function iterator(inva, ctrl)
    ctrl = ctrl + 1
    while type(t[ctrl]) ~= "string" do
      ctrl = ctrl + 1
    end
    return ctrl, t[ctrl]
  end
  return iterator, t, 0
end

for i, v in OnlyStrings({2,"hi",true,"hgf","kno"}) do
    print(i, v)
end

For practice try re-writing some of these.

`IV. Iterators with complex state`

With stateless iterators, you can only pass two things as an argument to the iterator which are the invariant and the control variable. What if you wanted to pass more info? For example let’s say you have an iterator which will loop through a table and stop until it’s done iterating through all the elements or it iterated through 4 odd numbers that were inside of that table. You need to keep a state to keep track of how many odds there have been. Well we can do what we do with stateful iterators and store that state within the factory (an upvalue would be a better term).

local function LoopUntil4Odds(t) 
  local odds = 0 
  
  local function iterator(inva, ctrl)
    ctrl = ctrl + 1 
    
    if odds ==4 then
      return 
    end 
    
    if ctrl <= #inva then 
      if inva[ctrl]%2 ~= 0 then --if it's odd
        odds = odds + 1
      end
      return ctrl, inva[ctrl], odds --additionally I return how many odds there are each time
    end
  end
  return iterator, t, 0
end

But we don’t wanna do that, there is a mix of statefulness and statelessness in there, we want to be a fully stateless iterator.

Well, I hope you realise that the invariant and control variable arguments that are passed can be anything, they don’t need to be precisly a table and a number. Using that fact, can we pack the invariant and control variable into one thing, say a dictionarry, so it’s like {inva = t, ctrl = 0}, this dictionarry will be returned by the factory instead of the invariant, and nil is returned instead of the control variable. That dictionarry will be passed to the iterator as an argument instead of the invariant. Let’s see what I’m talking about.

local function xpairs(t)
  local function iterator(dict) --dict is what's in the invariant's place which is the dictionarry
    dict.ctrl = dict.ctrl + 1 
    if dict.ctrl <= #dict.inva then 
        return dict.ctrl, dict.inva[dict.ctrl]
    end
  end
  return iterator, {inva = t, ctrl = 0}, nil --the initial values are put into a dictionarry, note that the ", nil" can be omitted
end

As you can see, {inva = t, ctrl = 0} is gonna be passed as an invariant each time, and each iteration we increment the ctrl key by 1, and since the invariant doesn’t change there is no problem. This is actually a way to bypass the fact that you need to return an index with a stateless iterator.
Remember that the invariant and the control variable are both considered states, here instead of having two seperate states, we have one complex one.

We can definitely pack more info into that dictionarry. We can store more states into it, like the odds state. It’s initial value is gonna be 0 of course.

local function LoopUntil4Odds(t) 
  local function iterator(dict)
    dict.ctrl = dict.ctrl + 1 
    
    if dict.odds == 4 then
      return 
    end 
    
    if dict.ctrl <= #dict.inva then 
      if dict.inva[dict.ctrl]%2 ~= 0 then
        dict.odds = dict.odds + 1
      end
      return dict.ctrl, dict.inva[dict.ctrl], dict.odds
    end
  end
  return iterator, {inva = t, ctrl = 0, odds = 0}, nil
end

for i, v, odds in LoopUntil4Odds({1,2,3,4,5,6,7,8,9}) do 
  print(i, v, odds)
end

`V. True iterators`

Isn’t it weird that we call iterator functions iterators even though they aren’t the ones doing the iteration, the generic for loop does . The iterator function just determines what value to return next, the generic for loop uses it to do the iteration. That’s why the lua PIL prefers to give the iterator name to the generic for loop. What would we name the previously-named iterator function then? Well the lua PIL likes to call them generators, because they generate the next value, and this naming convention is present in many languages (e.g. python’s generators or generator functions).

Did you know that at a certain point in lua generic for loops didn’t exist yet? You had to use table.foreach (table.foreachi exists as well which is equivilant to ipairs) instead. Essentially, it takes two parameters, the table to iterate through, and a function to apply to each element. So if you had something arbitrary like this in the modern lua

for i, v in pairs(t) do --where t is some table
  if v > 10 then 
    print(tostring(v).." is greater than 10")
  else
    print(tostring(v).." is less than 10")
  end
end

This almost looks like that the body of the for loop is a function where each element is passed as an argument to that function each time. If I were to write this using table.foreach, I would do this

local function f(i, v) --i is the index of the element each time and v is the elemet from the table each time
  if v > 10 then 
    print(tostring(v).." is greater than 10")
  else
    print(tostring(v).." is less than 10")
  end
end

table.foreach(t, f) --where t is some table

table.foreach does the iteration on its own, it doesn’t need a for loop to do so. Cool right? For that reason table.foreach and table.foreachi are called true iterators, they are actual iterators that do the iteration on their own, unlike the iterators we have been talking about this whole time that are just plugged in a generic for loop.

That’s it! I hope you found this article informative (without doubt it is) and as usual, have a wonderful day!

TheCarbyneUniverse · May 13, 2020, 6:25pm

Excellent tutorial!

I never knew knew the depth of iterator function beyond the good ol’ for i, v in pairs() do thing.

Keep up the great work with these scripting tutorials!

iGiDgames · June 1, 2021, 6:46pm

This type of info is the stuff that differentiates intermediates from professionals. This is a great explanation of a wildly un-discussed topic, that many developers don’t know, including myself up until this point. If you look at my past topic history, you’ll see that I recently asked a question about THIS VERY THING, but the responses I got were lackluster and/or confusing. This tutorial was a god send. good work and thank you.

Syntheix · January 3, 2022, 9:14pm

I was reading through this a while back and I always wondered how idx is still able to be accessed even though we are only returning the iterator and not including idx. Is it due to the closures generated by functions in Lua?

starmaq · January 5, 2022, 11:47am

Yeah pretty much, idx becomes what’s called an upvalue, it is enclosed by the closure formed by returning iterator, it can still be accessed. This is called static scoping.

firedup13579 · July 13, 2022, 7:05pm

This was an amazing read. Can you give an example of an application that needed you to create an iterator function?

starmaq · July 14, 2022, 12:31pm

They’re mostly useful in two cases, if you want a special way to parse a table that follows a certain format.
For example you table looks like {{Name = "hi", Rarity = 5}, {Name = 8, Rarity = 6}}, say you want to only loop through the names of these items. You can make an iterator for that, although personally this is a trivial example and I would just loop through the items and index the Name. For more complicated tables this is a nice idea.

Another interesting use case is maybe extracting other things from a piece of info and looping through it. Say you want to generate all the permutations that a string of characters can have (e.g. an iterator permutations() that given “ABC” would make you loop through “ABC”, “ACB”, “CAB”, “CBA”, “BCA”, “BAC”), you can make an iterator for this. This is kind of a niche use case. Here is more info about this.

I’d like to add that perhaps all this is somewhat obsolete, roblox added a new somewhat obscure feature lately that is still hiding in the shadows from what I’ve seen, which is generalized iteration. Basically you can now define the default iteration behavior for your tables via a metamethod called __iter, so basically the generic for loop syntax becomes for info1, info2 in table_here do, without a pairs, or an ipairs, or any sort of fancy factory/iterator. You just define that table’s iteration behavior via a metamethod.

You want to use this if your table will probably have a single default iteration behavior that you’re always gonna use. You can define additional iterators for it, but for the default iteration behavior, you may want to change it with __iter, more more info.

I need to update the article informing about this perhaps.

firedup13579 · July 15, 2022, 9:10pm

Thank you so much! I’ve had this article bookmarked for a while because it seemed like an obscure, yet really useful topic.

starmaq · July 15, 2022, 9:33pm

Indeed, it’s really if you want to write fancy code, which for production code is mostly useless unless you want to make a rather robust codebase. It’s a good idea to make iterators for a public module of yours for example to facilitate some features for users.