I made a lexer that takes in custom token rules, and outputs a stream of tokens. Similar too moo.js, my first take at this was a pretty good success, though it was very slow. (There was a bottle neck when matching results, wasn’t using roblox’s pattern references at the time). Coming in at about 3 milliseconds for a 3 lined string, and scaled linearly just about based on the token size. I was comparing it to boat bombers lexer speeds, which was extremely faster, about 0.1 milliseconds for that same string, and only seeing upwards of ~4 milliseconds after I had had enough, seeing my lexer was reaching 30-40 milliseconds. I went back to the drawing board, and after a few frustrating roblox crashes (Had so much trouble with coroutines), I’ve made a faster lexer which can take in custom token and supports type and value transforms. With speeds comparable to boat’s lexer. Some example code is below, same string, invoked in separate coroutines, and outputs how fast both lexers did.
—Edit just realized one of my tokens is messed up , as I said i’ll do more tests and make sure everything is optimal for both sides
local tokens = {
{token = "WS", match = "[ \\t]+"};
{token = "Comment", match = "//.*\\n?"};
{token = "String", match = "(['\"])[^\\n]*%1", type = function(inp)
if inp == "\"\"" or inp == "''" then return "Empty-String" end
end};
{token = "INC-String", match = "(['\"])[^\n]+%1", shouldThrow = true};
{token = "Iden", match = "[a-zA-Z_][a-zA-Z%d_]*", type = function(inp)
local keyword = {"while", "var", "for", "if", "local"}
local builtin = {"print", "string"}
if table.find(keywords, inp) then return "Keywords" elseif table.find(builtins, inp) then return "Global" end
end};
{token = "Number", match = "%d(.?)%d*", type = function(inp)
if string.find(inp, "%.") then
return "Float"
end
return "Number"
end};
{token = "Operators", match = "[:;<>/~%*%(%)\\%-=,{}%.#^%+%%]"};
{token = "newline", match = "\n"};
{token = "exception", match = ".+",error = true}
}
local lexer = lex.compile(tokens)
local testLex = require(script.ModuleScript)
texbox:GetPropertyChangedSignal("Text"):Connect(function()
lexer.reset(texbox.Text)
--My lexer
local nav = testLex.navigator()
nav:SetSource(texbox.Text)
--Boats lexer
coroutine.resume(coroutine.create(function()
local start1 = os.clock()
for token, src in nav.Next do
end
print(string.format("Boat's lexer took %.2f ms", (os.clock()-start1)*1000 ))
end))
coroutine.resume(coroutine.create(function()
local start2 = os.clock()
for token in lexer.next do
end
print(string.format("My lexer took %.2f ms", (os.clock()-start2)*1000 ))
end))
end)
And after (very limited, i’ll be doing way more and heavier benchmarking later) running and pasting a few simple code samples…BOOM!
I’m pretty proud of it, although i’m not too hyped yet, after the further tests, i’ll have to either go back to the drawing board, or I’ll probably release it for public use. Thanks for reading, i’ll have a test link in a few minutes, where you can just open the developer console and see the prints as you type.
Also: @boatbomber , thanks for the inspiration man, your work really inspires me as a programmer.