If you’re interested in how this works this section is for you. While I doubt most people have a use for something like this it was still a fun project. This interpreter uses a lexer, parser, and interpreter to resolve the expressions given.
Lexer
The lexer will go character by character in the expression and generate tokens. For example the + character generates a token of type PLUS. A character of 82.81 generates a token of type NUMBER with value 82.81. These tokens are then passed to the parser.
Parser
The parser knows how to resolve these tokens. It will basically group these tokens into nodes. There was a lot that went into making this parser. The original version used something called a parse tree. A parse tree functions by generating a tree based off grammar rules applied to tokens. The current version uses an abstract syntax tree (AST). An abstract syntax tree is similar to a parse tree except it doesn’t use grammatical rules and is significantly more dense. It’s far cheaper to generate and is less of a headache to work with.
Let’s say you have the following expression: 2 + 3 * 8. We know this will evaluate to 26 so how does it look as an AST?
Order of operations is maintained here since multiplication will be placed lower than addition. The interpreter will then interpret this tree to provide a result. If you would like to learn more I recommend reading this blog post. It provided me with a lot of information when making this module.
Interpreter
The interpreter takes the tree and executes logic. It will start at the root then recursively traverses its way through. It’s honestly really simple and there isn’t much to it.
This module loops forever on some invalid strings, for instance h. The Lexer:tokenize function will keep calling Lexer:resolve until Lexer.character is nil, which means that if Lexer:resolve doesn’t do anything (as in the case of when it doesn’t understand the character) then it will loop forever. Errors in general are handled very inconsistently, passing . will result in nil but most invalid strings will error or generate weird answers (1..+2 results in 1)?
The precedence and associativity of ^ is wrong. -2^2 should be -4, not 4. 2^2^3 should be 256, not 64.
I also noticed several bad practices that make code quite slow:
The utf8.len function is expensive, the complexity increasing linearly with the length of the string. Since the string is constant throughout the function, this call should be done only once. Furthermore, since string.sub is used and the other utf8 functions aren’t used, the # operator should be used to get the length of the string. The purpose of utf8.len is to count the amount of characters in a string while considering multibyte unicode characters to only be one character. The rest of the code doesn’t handle unicode characters so utf8.len is pointless in this case and far slower (#'s complexity is constant).
Temporary arrays that are just passed to table.find are very slow, as a new table has to get created and collected each time the call to table.find is reached. These tables should be only created once and the table should be passed to table.find each time, so that the creation and collection of the table only happens once.
Building strings piece by piece (usually with concatenation) is very slow because strings are immutable, many temporary strings have to be created and collected to build the result. The table.concat function should be used when a new string needs to get constructed from the combination of many smaller strings, as it creates the result without creating many temporary strings to build it.
It’s been nearly a month but today I decided to sit down and completely refactor the code. Thanks for everything you posted I greatly optimized the original parser but I decided to rewrite it using the shunting yard algorithm which is better suited for a smaller parser like this. It’s a bit cheaper to convert from infix to postfix directly and infix works well enough. I did fix the issues you were seeing with the old parser and decided to leave it in however I’m still having a minor issue regarding precedence which I’ll probably fix in the future.