I made a lexer similar too moo.js!

I made a lexer that takes in custom token rules, and outputs a stream of tokens. Similar too moo.js, my first take at this was a pretty good success, though it was very slow. (There was a bottle neck when matching results, wasn’t using roblox’s pattern references at the time). Coming in at about 3 milliseconds for a 3 lined string, and scaled linearly just about based on the token size. I was comparing it to boat bombers lexer speeds, which was extremely faster, about 0.1 milliseconds for that same string, and only seeing upwards of ~4 milliseconds after I had had enough, seeing my lexer was reaching 30-40 milliseconds. I went back to the drawing board, and after a few frustrating roblox crashes (Had so much trouble with coroutines), I’ve made a faster lexer which can take in custom token and supports type and value transforms. With speeds comparable to boat’s lexer. Some example code is below, same string, invoked in separate coroutines, and outputs how fast both lexers did.
—Edit just realized one of my tokens is messed up :thinking: , as I said i’ll do more tests and make sure everything is optimal for both sides


local tokens = {
	{token = "WS", match = "[ \\t]+"};
	{token = "Comment", match = "//.*\\n?"};
	{token = "String", match = "(['\"])[^\\n]*%1", type = function(inp)
		if inp == "\"\"" or inp == "''" then return "Empty-String" end
	end};
	{token = "INC-String", match = "(['\"])[^\n]+%1", shouldThrow = true};

	{token = "Iden", match = "[a-zA-Z_][a-zA-Z%d_]*", type = function(inp)
		local keyword = {"while", "var", "for", "if", "local"}
		local builtin = {"print", "string"}
		if table.find(keywords, inp) then return "Keywords" elseif table.find(builtins, inp) then return "Global" end
	end};
	{token = "Number", match = "%d(.?)%d*", type = function(inp)
		if string.find(inp, "%.") then
			return "Float"
		end
		return "Number"
	end};
	{token = "Operators", match = "[:;<>/~%*%(%)\\%-=,{}%.#^%+%%]"};
	{token = "newline", match = "\n"};
	{token = "exception", match = ".+",error = true}
}
local lexer = lex.compile(tokens)
local testLex = require(script.ModuleScript)
texbox:GetPropertyChangedSignal("Text"):Connect(function()
	lexer.reset(texbox.Text)
	--My lexer
	local nav = testLex.navigator()
	nav:SetSource(texbox.Text) 
	--Boats lexer
	coroutine.resume(coroutine.create(function()
		local start1 = os.clock()
		for token, src in nav.Next do
			
		end
		print(string.format("Boat's lexer took %.2f ms", (os.clock()-start1)*1000 ))
	end))
	
	coroutine.resume(coroutine.create(function()
		local start2 = os.clock()
		for token in lexer.next do
			
		end
		
		print(string.format("My lexer took %.2f ms", (os.clock()-start2)*1000 ))
	end))
	
	
end)

And after (very limited, i’ll be doing way more and heavier benchmarking later) running and pasting a few simple code samples…BOOM!
image

I’m pretty proud of it, although i’m not too hyped yet, after the further tests, i’ll have to either go back to the drawing board, or I’ll probably release it for public use. Thanks for reading, i’ll have a test link in a few minutes, where you can just open the developer console and see the prints as you type.

Also: @boatbomber , thanks for the inspiration man, your work really inspires me as a programmer.

1 Like

Your lexer looks pretty cool!

However, your speed comparisons are unfair- you compare your stream to my navigator. My navigator has additional features and tables that slow it down. If you want a better comparison, just use my lexers stream (lexer.scan) and not the nav object.

-- typed on my phone
texbox:GetPropertyChangedSignal("Text"):Connect(function()

    lexer.reset(texbox.Text)

    local start1 = os.clock()
    for token, src in testLex.scan(texbox.Text) do end
    print(string.format("Boat's lexer took %.2f ms", (os.clock()-start1)*1000 ))

    local start2 = os.clock()
    for token in lexer.next do end
    print(string.format("My lexer took %.2f ms", (os.clock()-start2)*1000 ))

end)

PS: It’s awesome to see people inspired by my work, and that inspires me in turn! Thank you!

6 Likes

Wow, this looks really cool! And interesting to learn from. As always, glad to see that boatbomber is inspiring everyone to create modules/plugins (me included lol)

I find those whole token system confusing, but it‘s probably because I know nothing about lexers.

Why? That‘s already something!

1 Like

Got it, i’ll be testing with that instead of navigator, thanks!

Because my main goal is faster speeds than boat’s, if i get blown out of the water later i’d be much more bummed out, but I see what you mean. Thank you!

1 Like