Introduction
Lexer
In the last tutorial we created a lexer to convert strings into a table of tokens. Now we’re going to use these tokens to prepare to evaluate expressions.
What is a parser?
A parser allows us to analyze the tokens created by a lexer and generate a parse tree. In our mathematical evaluator we won’t be created a parse tree but rather making a parser that utilizes the Shunting-yard algorithm to convert from infix to postfix notation. It’s worth noting it possible to generate an abstract syntax tree using this algorithm but we’ll just be using it for notation conversion. This will make evaluating the expression extremely easy.
What is infix and postfix notation?
Infix notation is when operators are placed between operands. Think of 3 + 6
. The +
operator is placed in between the operands 3
and 6
. This is difficult for a computer to evaluate because ambiguity exists. The computer doesn’t know what operator precedence or associativity is. This is solved with postfix notation.
Postfix notation places operators after their operands. For example, 3 + 6
becomes 3 6 +
. This is far easier for a computer to evaluate and ambiguity is eliminated. Here’s a great graphic demonstrating the different in the notations:
Creating Stack and Queue Collection Classes
For the Shunting-yard algorithm to work we need to create two collection classes that Lua doesn’t natively have available: a stack and queue. These are common in other programming & scripting languages but Lua is dynamic and just provides tables.
Creating the Stack Module
Think of a stack like a Lua table except you can only push items in to the top and remove from the top. This is very simple to create but will be very useful when writing our parser.
--[[
Stack
Lua implementation of a
stack.
]]
local Stack = {}
local StackClass = {}
--[[
Create a new stack
]]
function Stack.new(initial: {any}?)
local self = setmetatable({}, {
__index = StackClass,
__tostring = function(t)
return string.format('Stack(%d)', #t._raw)
end,
})
self._raw = initial or {}
return self
end
--[[
Push a value onto the stack
]]
function StackClass:push(value: any)
table.insert(self._raw, value)
end
--[[
Pop a value from the stack
]]
function StackClass:pop()
return table.remove(self._raw)
end
--[[
Get the stack
]]
function StackClass:get()
return self._raw
end
--[[
Get the top of the stack
]]
function StackClass:top()
return self._raw[#self._raw]
end
--[[
Check if a stack is empty
]]
function StackClass:isEmpty()
return #self._raw == 0
end
return Stack
Creating the Queue Module
A queue is another type of collection. The way it works is you insert objects in from the top and remove from the bottom. This is even simpler than a stack to create.
--[[
Queue
Lua implementation of a
queue.
]]
local Queue = {}
local QueueClass = {}
--[[
Create a new queue
]]
function Queue.new(initial: {any}?)
local self = setmetatable({}, {
__index = QueueClass,
__tostring = function(t)
return string.format('Queue(%d)', #t._raw)
end,
})
self._raw = initial or {}
return self
end
--[[
Enqueue a value
]]
function QueueClass:enqueue(value: any)
table.insert(self._raw, value)
end
--[[
Dequeue a value
]]
function QueueClass:dequeue()
return table.remove(self._raw, 1)
end
--[[
Get the queue
]]
function QueueClass:get()
return self._raw
end
return Queue
With these collection classes create we’re now ready to start making our parser.
Building the Parser
Parser class
Similar to our lexer the parser will utilize an OOP approach.
function Parser.new(lexer)
-- The passed lexer must already have tokens
assert(lexer.tokens ~= nil and (#lexer.tokens ~= 0), 'Cannot parse an empty lexer')
local self = setmetatable({}, {__index = Parser})
self.lexer = lexer -- Save the lexer for later
self.position = 0 -- Store position of current token
self.queue = Queue.new() -- Create a new queue
self.stack = Stack.new() -- Create a new stack
return self
end
Operators, precedence, and associativity
As defined before operators are characters such as +
, -
, etc. Each operator has a certain precedence (think PEMDAS like you learned in school) that defines its priority. Here’s a graphic with a good example of how precedence works:
Associativity is a bit more complicated. When you’re evaluating certain operators have right to left (RTL) or left to right associativity (LTR). This defines the order the operands are evaluated in. In our math evaluator we’re only going to have operators with LTR but we will add support for RTL operators (such as ^
for exponents).
Let’s define what tokens are operators, the precedence of the operators, and the associativity of the operators in our parser.
-- Operators and their token names
local OPERATORS = {
'PLUS',
'MINUS',
'ASTERIK',
'SLASH'
}
-- Higher numbers = more priority
local PRECEDENCE = {
['PLUS'] = 0,
['MINUS'] = 0,
['ASTERIK'] = 1,
['SLASH'] = 1
}
-- 0 = LTR
-- 1 = RTL
local ASSOCIATIVITY = {
['PLUS'] = 0,
['MINUS'] = 0,
['ASTERIK'] = 0,
['SLASH'] = 0
}
Notice how we’re now writing out each operator rather than using their symbol/character such as +
or -
. This is because our parser exclusively works with tokens and not characters like our lexer.
We’ll need utility functions to determine if a token is an operator.
local function isOperator(token)
return table.find(OPERATORS, token[1]) and true or false
end
Parsing
We’re nearly ready to start parsing our tokens. We need a way to remove from the initial token onto the next until we reach the end of the table. We’ll write a :next()
method similar to that in our lexer.
function Parser:next()
-- Don't pass the last token
if (self.position == #self.lexer.tokens) then
self.token = nil
return
end
-- Increase position & set the new current token
self.position += 1
self.token = self.lexer.tokens[self.position]
end
Now we’re ready to write our :parse()
method where the magic happens. Let’s look at some pseudocode to better understand how the parser will work.
while there are tokens to be read:
read a token
if the token is:
- a number:
put it into the output queue
- an operator o1:
while (
there is an operator o2 other than the left parenthesis at the top
of the operator stack, and (o2 has greater precedence than o1
or they have the same precedence and o1 is left-associative)
):
pop o2 from the operator stack into the output queue
push o1 onto the operator stack
- a left parenthesis (i.e. "("):
push it onto the operator stack
- a right parenthesis (i.e. ")"):
while the operator at the top of the operator stack is not a left parenthesis:
{assert the operator stack is not empty}
/* If the stack runs out without finding a left parenthesis, then there are mismatched parentheses. */
pop the operator from the operator stack into the output queue
{assert there is a left parenthesis at the top of the operator stack}
pop the left parenthesis from the operator stack and discard it
/* After the while loop, pop the remaining items from the operator stack into the output queue. */
while there are tokens on the operator stack:
/* If the operator token on the top of the stack is a parenthesis, then there are mismatched parentheses. */
{assert the operator on top of the stack is not a (left) parenthesis}
pop the operator from the operator stack onto the output queue
So let’s get to turning this pseudocode into actual code.
function Parser:parse()
self:next()
while (self.token) do
local token = self.token
if (token[1] == 'NUMBER') then
self.queue:enqueue(tonumber(token[2])) -- Enqueue the value of the token
elseif (isOperator(token)) then
while (not self.stack:isEmpty()
and ASSOCIATIVITY[token[1]] -- Associativity exists for the operator
and PRECEDENCE[token[1]] -- Precedence exists for the operator
and ASSOCIATIVITY[self.stack:top()[1]] -- Top is an operator with asssociativity
and PRECEDENCE[self.stack:top()[1]] -- Top is an operator with precedence
-- Associativity of token is LTR and precedence is <= precedence of the top operator
and ((ASSOCIATIVITY[token[1]] == 0 and PRECEDENCE[token[1]] <= PRECEDENCE[self.stack:top()[1]])
-- or associativity of token is RTL and precedence is < precedence of the top operator
or (ASSOCIATIVITY[token[1]] == 1 and PRECEDENCE[token[1]] < PRECEDENCE[self.stack:top()[1]]))) do
-- Then we'll pop the operator from the stack and put it onto the output queue
self.queue:enqueue(self.stack:pop())
end
-- Push the current token onto the operator stack
self.stack:push(token)
-- Push '(' onto the output stack
elseif (token[1] == 'LPAREN') then
self.stack:push(token)
elseif (token[1] == 'RPAREN') then
if (not self.stack:isEmpty()) then
-- Work backwards until we reach the left parentheses
while (self.stack:top()[1] ~= 'LPAREN') do
self.queue:enqueue(self.stack:pop())
end
-- Pop the left parentheses
self.stack:pop()
else
error('Expected \'(\' to close \')\'')
end
end
self:next() -- Move to the next token
end
-- Empty the operator stack and enqueue everything onto the output queue
while (not self.stack:isEmpty()) do
self.queue:enqueue(self.stack:pop())
end
return self.queue
end
And you now have your parser using the shunting-yard algorithm! Go through all the comments I left in the code to understand what’s really going on. There are a ton of really great graphics, guides, and videos to better understand the algorithm available online so don’t be afraid to research.
Testing
Here’s the code I’ll be using to test.
local lexer = Lexer.new('3 + 6.6 * 2')
lexer:tokenize()
local postfix = Parser.new(lexer):parse()
for i, v in pairs(postfix:get()) do -- remember the returned result is a queue
if typeof(v) == 'table' then -- v is a token if it's a table
for i2, v2 in pairs(v) do
print(i, '|', v2)
end
else -- otherwise it's a number
print(i, '|', v)
end
end
If you did everything correctly your output should look like so:
We’ve successfully parsed our tokens and converted from infix to postfix notation!
Conclusion
This tutorial was a bit long but there was a lot to learn. Unfortunately it’s very hard to teach so much information in text like this but I did my best. While our parser converts from infix to postfix there are parsers that use other methods to generate parse trees for other use cases. For example, programming languages generate complex abstract syntax trees for their languages. Feel free to read up more on that if interested. The lexer made in the last tutorial can be adapted with a parser specifically for that if needed.
In the next part of the series we’ll create an interpreter to actually evaluate the postfix notation. I hope this was useful or interesting. Take care!