Create your own programming language with P+!…Sort of

XOLT1268 · July 20, 2021, 5:34am

Yo! Introducing P+, a parser generator derived from nearley.js!

Before you go any further, you can’t create a full programming language with this, yet it makes up about 2/3 of the process, next being interpretation or execution of some sort. P+ is simply the parser generator, also providing a lexer, in the plugin.

A while ago I created this topic

And since then, I’ve been really hard at work researching and building a parser. It’s been just over a month now and I’m happy to say it’s finished. Or at least working…

Psst! I’ve also made a lexer, derived from moo.js

Use P+ to create your own parser with informative error messages, and optimized algorithm execution.

Simply define a grammar, and input and boom, your own top down parser. For the convenience of all you lovely developers, i’ve also made a syntax highlighted grammar editor so you can avoid the ugly regular syntax.

Module and Plugin for the impatient:
Module

Plugin

Lexer, for the hell of it

Time Complexity

A little background, P+ uses the earley algorithm, which means it can parse ambiguous or un-ambiguous grammars. This means things such as right recursion:

prod -> prod %token

Are possible! The only caveat being that it runs in O(n^3) time or a quadratic time complexity. Generally, it’s good practice to shy away from left recursion. Yet using Joop Leo’s right recursion optimization, any grammars that avoid left recursion can run in O(n) time or a linear time complexity.

The Earley Algorithm

While I can’t give you a full breakdown of parsers, lexers and the pack, I’ll give a brief explanation of how it works.

I’ll be providing a tutorial soon on the algorithm, I’ll link it here
For now, refer to the resources provided here:
Parsing Generator using Earley Algorithm

P+

In this section i’ll explain a bit more about the module itself.

Requiring the module returns a dictionary with Parser, and Rule, for usual use you can disregard rule.

We call the Parser function with a grammar

A grammar is a set of production rules used to show P+ the syntax of your generation. If you’d like some resources explaining this stuff refer here Parsing Generator using Earley Algorithm

Calling our parser function returns a table with the most important function being :Parse(input)

“Alright enough of this explain the module already ” Alright got it got it.

First i’ll explain the grammar. A typical grammar looks like this, unless you’re using the plugin:

{
{name = "", symbols = {

}, postProcess = function},
…
lexer = lexer or nil,
startRule = "" or nil
}

For starters our grammar is a mixed table composed of rules, being our numbered indices, and extra data being stringed keys.

Each rule has a name, a compilation of symbols, and a postProcess.

The name is exactly what it sounds like the name, the symbols table is composed of string or table indices.
Terminal being the former, Non terminal being the latter.

A terminal symbol refers to another rule, while a non terminal refers to our input stream. A typical symbol table looks like this.

{
{literal = “!”}, - -Checks current token value for exactly this value
{token = “myToken”}, - -Checks current token *type* for this value
“yo” - -Checks another rule for input validation.

}

Next, our postprocess is a function, it provides the data of the completed rule, which is useful for generating senematic derivatives like an abstract syntax tree.

Notice: Non terminal symbols data provide the token, which is an object, refer to my lexer above to learn more about that

A typical postProcess would look like this:

function(data)
- -Assume production is: "term", {literal = "!"}
return { word = data[1], --"term"
exclamation = data[2].value }
end

Symbols created and defined with a table return their tokens, meaning you will probably have refer to the lexer documenatation, regular terminal symbols or symbols defined with a string return the result of the postprocess of the completed rule stated, by default post processes returns the data table previously showed.
The internals of the parser will be explained later as I’ve said before.

The Plugin

The plugin has three sections, Parse, Lexer, and Load

Parser uses readable syntax to help write grammars. The syntax is:

prodName -> "string" %token terminal

where prodName refers to the name, the arrow symbolizes the start of the symbols, “string” are literals, %token check token types and terminal are terminal rules specified before.

Post processes are defined as {code here} after each expression.
Sub expressions are also supported, it can be used to do the logic of or with the “|” character

prodName -> "string" | %token

String or token of type “token”

Use parentheses to wrap subexpressions into a single token:

prodName -> ("string" | %token) "xolt"

Lexer gives you a list of drop downs to define tokens, refer to the lexer to learn more, but basically you have a match, name and error, match being the regular expression of the token, and error being if the lexer should error if encountering this token.

Finally the loader allows you to load generated parsers into the plugin for editing.

Writing this has shown me just how terrible I am at explaining, but I believe this is a great resource so I wanted to share. For a far better explanation, you can use the nearley explanation: Here

Anyways, hope y’all find this useful and if you find any bugs or things I could improve, please let me know, nicely. Also, you can find a place to try out the parser, it parses arithmetic expressions using the parser in the flesh. Find it in the Earley Algorithm section.

TODO:

Type/Value transforms in Lexer
~~Regular lua code syntax highlighting for postProcesses in plugin~~
~~Loading post processes in plugin~~
General code clean up
Memory Optimizations

Fixes:

mostbootifulgirlever · July 20, 2021, 5:39am

This is cool and all and I used to wanted to make my own language but what possible use does this have?

XOLT1268 · July 20, 2021, 5:45am

I mean like you said a programming language, if you’ve ever wondered about how lua or js work, this is a great stepping stone. General implementations could be an arithmetic parser, sandboxing code, trans-compilation etc.

debugger57 · July 20, 2021, 8:29am

Cool, but absolutely useless for Roblox game development.

XOLT1268 · July 20, 2021, 3:20pm

I can understand that, to be honest, this was mostly a learning experience as I said in my Levenshtein algorithm tutorial, but it’d be even more useless if I kept it tucked away. Even so, I’ve seen a few topics in the forum about parsers and such, so maybe it isn’t so useless.

mostbootifulgirlever · August 16, 2021, 7:42pm

Ngl I regret what I said about this being useless. I was looking for something like this and I remembered this topic. Please continue working and improving this and possible have a “Trans-Compiler” where you can have code in a custom language be ported to Lua.

Also I want to pat you on the back for how much research and trial and error it took to create a Parser and a Lexer in Roblox.

Also it would be funny if someone created C inside of this. Then it would be C → Lua → C

XOLT1268 · August 17, 2021, 3:11am

I appreciate the words, you have inspired me to continue work on this, thank you. I had previously taken a break from development. I’ll take your suggestions into consideration.

reimakesgames · February 17, 2022, 10:44am

i’m chuffed to bits, this is awesome!
when i get the time and actual thinking if i should make my language and how i’ll make it useful!

i should’ve seen this earlier, anyways, thank you for this great resource! any updates for it?

XOLT1268 · February 21, 2022, 4:46am

I’ve decided to move this to a github project; As of now, the roblox module will remain the most stable and and should work perfectly, though it’s pretty roblox-dependent and I’d like to make it more useful in native lua.

The module should be tested and finalized in github in the coming weeks.

Thanks for the kind words @reimakesgames , no real plans to keep it updated other than maybe bug fixes, but who knows, maybe I’ll be inspired again to refactor the many … many errors and shortcuts I made at the original time.

reimakesgames · February 21, 2022, 7:11am

thank you for the response!!

and since you’re moving it to native Lua, It might also be possible to create a new language for VSCode and its own TypeChecking/TextColoring!

idkhihruimean · February 21, 2022, 8:28am

Its really amazing! to see what people create.
This most be one of most intresting topics I’ve come across. : )