Creating a Math Evaluator (Lexer)

AstrealDev · October 25, 2021, 2:43am

Introduction

What is a lexer?

To create a lexer one must first understand what a lexer even is. In computer science a lexer is used to accomplish lexical analysis. What this means is converting sequences of characters into something called tokens. Tokens are objects that give the analyzed characters additional meaning.

In this three part series of tutorials we will create a basic mathematical expression evaluator with a fully functioning lexer, parser, and interpreter.

What are tokens?

Let’s say I have the following expression:

3 + 7.2

If I were to pass this through a lexer the following tokens would be generated:

{ 'NUMBER', 3 }
{ 'PLUS' } 
{ 'NUMBER', 7.2 }

So each token has a string identifying its type and an optional second parameter representing a value. Operators such as +, -, *, / don’t have values but numbers such as 1.4, 92, and 36 do. Tokens are useful because they give meaning to these different characters. A computer has no way of knowing that + means addition but through a lexer and tokens we can provide that meaning.

Building the Lexer

Lexer class

Let’s get started on creating the actual lexer now that we understand what it does and how it works. Create a module dedicated to your lexer. We’ll use Lua metatables to simulate OOP and create a Lexer “class.”

function Lexer.new(text: string)
    local self = setmetatable({}, {__index = Lexer})

    -- Strip the string of any new line characters & create a position variable
    self.text = string.gsub(text, '\n', '')
    self.position = 0

    -- We'll define this method in a second
    self:next()

    return self
end

The position variable in our lexer represents the current character position as we tokenize the passed text. The :next() method will be used to advance the lexer to the next character in the string.

function Lexer:next()
    -- Prevent the lexer from going over the initial text's length
    if (self.position == string.len(self.text)) then
        self.character = nil
        return
    end

    -- Increase the position variable & set the current character to that position
    self.position += 1
    self.character = string.sub(self.text, self.position, self.position)
end

Great we now have a way to traverse through our string of text. Let’s start tokenizing this string. To do this we need to loop through the string until we reach the end and generate a token for each character until we have a list of tokens.

Preparing to tokenize

Let’s create the :tokenize() method for our lexer.

function Lexer:tokenize()
    local tokens = {}

    -- Loop until the end of the string is reached
    while (self.character ~= nil) do
        -- We'll define this method in a second
        local result = self:resolve(self.character)

        -- If we get a valid result insert it into our final token table
        if (result) then
            table.insert(tokens, result)
        end
    end

    -- Add the table to our lexer object
	self.tokens = tokens

    return self
end

With our loop created we now need to start resolving each character. However, before we do this we should define some utility functions.

Utility functions

Our first function will check to see if a character is a whitespace. We want to ignore whitespaces and only focus on valid characters when tokenizing.

local WHITESPACES = {'', ' ', '\n', '\t'}
local function isWhitespace(char: string)
    return table.find(WHITESPACES, char) and true or false
end

Next we need a function to check if a character is a digit (0 - 9).

local DIGITS = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
local function isDigit(char: string)
    return table.find(DIGITS, tostring(char)) and true or false
end

Next we’ll make a function to check if a character is a digit or contains a decimal.

local function isNumberStream(char: string)
    return (char ~= nil and (char == '.' or isDigit(char)))
end

Then we’ll make a function to determine if a table of characters contains a decimal point either at the front or the end. If so this function will prepend or append a 0. For example, .01 turns to 0.01 and 4. turns into 4.0.

local function resolveIndependentDecimal(array: {string})
	if (array[1] == '.') then
		table.insert(array, 1, '0')
	end

	if (array[#array] == '.') then
		table.insert(array, '0')
	end

	return array
end

Creating tokens

With our utility functions done we are nearly ready to actually convert characters into tokens. But we haven’t actually created any tokens. For the sake of simplicity in this tutorial our tokens will be a table containing a string and optional value as shown in the introduction. Although, we’ll still define a function to create this table just so we know where we’re actually using tokens versus tables.

local function Token(name: string, value: number?)
    return { name, value }
end

Tokenizing

Finally we can convert characters into tokens!

function Lexer:resolve(char: string)
	if (isWhitespace(char)) then -- Ignore whitespaces
		self:next()
	elseif (char == '.' or isDigit(char)) then -- Numbers
		return self:resolveNumber() -- We'll define this next
	elseif (char == '+') then -- Plus (for addition)
		self:next()
		return Token('PLUS')
	elseif (char == '-') then -- Minus (for subtraction)
		self:next()
		return Token('MINUS')
	elseif (char == '*') then -- Asterik (for multiplication)
		self:next()
		return Token('ASTERIK')
	elseif (char == '/') then -- Slash (for division)
		self:next()
		return Token('SLASH')
	elseif (char == '(') then -- Left parentheses
		self:next()
		return Token('LPAREN')
	elseif (char == ')') then -- Right parentheses
		self:next()
		return Token('RPAREN')
	else
		error('Unknown character: ' .. char)
	end
end

You’ll notice when a character is a digit or decimal point we call a nonexistent method :resolveNumber(). Let’s write out this function now. It will be used to create a number from multiple characters.

function Lexer:resolveNumber()
	local array = { self.character } -- Represents a number being built
    local decimals = 0 -- Keep track of how many decimals have been found

    -- Move on to the next character
    self:next()
    
    -- While there is a valid digit or decimal point
    while (isNumberStream(self.character)) do
        -- Increase decimal count
        if (self.character == '.') then
            decimals += 1
            -- Numbers don't have more than 1 decimal point
            if (decimals > 1) then
                break
            end
        end

        -- Add the character to our array
		table.insert(array, self.character)
        -- Move on to the next character
        self:next()
    end

    -- If there's a decimal check if there is a leading or trailing decimal point and fix it
    if (decimals > 0) then
		array = resolveIndependentDecimal(array)
	end

    -- Finally return our completed number as a token
	return Token('NUMBER', tonumber(table.concat(array)))
end

Believe it or not, that’s our completed lexer! Let’s move on to testing it.

Testing

Here I have a script requiring our lexer module to test it out.

local Lexer = require(script.Lexer)

-- Create our lexer with the expression .8 + 2.7
local lexer = Lexer.new('.8 + 2.7')

-- Tokenize the string
lexer:tokenize()

-- Let's see what we get!
for i, v in pairs(lexer.tokens) do
	print(i, '|', v[1], '(', v[2], ')') -- Remember our tokens are just arrays
end

If you did everything correctly your output should look like so:

Whitespaces were ignored and the numbers were successfully built into tokens as well as the plus sign. If you try to tokenize an invalid character you’ll receive an unknown character error.

Conclusion

This is a very basic lexer and you can add so much more functionality. If you’re interested in learning about lexical analysis outside of math evaluation you can check out the following resources:

In the next part of this series we’ll create a parser to convert our tokens into postfix notation for easy evaluation. This will be further explained then but for now I hope you found this tutorial useful or at least interesting.

LuaPrime8 · October 25, 2021, 10:18am

Thanks for the tutorial! I’ll make sure to use this at some time while making my game