Parsing string to get indexes and set them

How would I go about writing a script that converts select 1 thru 5 and 15 at 100
What I want to do is set all of the values (1,2,3,4,5,15) to 100
I would like to make it as dynamic as possible so if I wanted the command to be select 10 and 5 and 100 thru 500 at 50

I am currently trying to make a command system for a lighting console, that wont make much sense to you but as an example the command select 1 thru 10 at 100 at 100 will select lights with a channel 1 thru 10 and set the intensity of the light too at 100 or whatever is specified

I would shamelessly self-advertise by GLR parser again, but it doesn’t have semantic analysis yet, so while it could parse it, you wouldn’t be able to do anything with the parse.

With any parsing it is easiest to break it down into three steps: lexical, syntactic, and semantic analysis. Lexical analysis is commonly called tokenizing and can be performed using regular expressions. It combines the stream of characters into keywords, punctuation, and similar items. Once this is done, syntactic analysis can be done using any context free grammar, meaning that a more traditional parser is needed. It includes putting the tokens produced in the previous stage together to form if statements, loops, assignments, expressions, and so on. Semantic analysis is taking the syntax and determining what everything means when put together. It can only be performed by machines able to operate on context-sensitive grammars, like the Turing machine. Luckily, most programing languages (besides like HTML, XML, CSS) are Turing complete. This is the most complex stage, but performing the previous steps makes it much, much easier. 5 on its own means nothing, but with context around it, ‘return 5’ or ‘foo = 5’ does.

From what I can tell, the grammar that defines your language is very simple and can be easily handled by any parser, almost no matter what method you choose. Here is a suitable grammar:

STATEMENT => select VALUES at NUMBER
VALUES => VALUE RELATION VALUES | VALUE
VALUE => NUMBER | RANGE
RANGE => NUMBER thru NUMBER
RELATION => and | or

And here is what I would use to tokenize the input (untested):

local function tokenize(input)
    local tokens = {}
    local tokenTypes = {}

    for token in input:gmatch '%w+' do
        local i = #tokens + 1
        tokens[i] = token
        tokenTypes[i] = token:match '^%d+$' and 'NUMBER' or 'KEYWORD'
    end

    return tokens, tokenTypes
end

I’d recommend making what is commonly called a LL(1) parser. As far as parsers go, this is one of the easiest to create. It is a recursive descent parser, and while not the most powerful, it will work for the grammar described above and I believe most grammars describing query-language-like languages.

Your LL(1) parser should probably end up being about 100-150 lines. I’d recommend reading a bit about them online (parsers are a well explored field with plenty of material). Here is a basic framework to get you going. You basically create a function for every left-hand side production in the grammar above:

EDIT: turns out I had so much fun making the example framework, and that the framework made it so easy to add the productions, that I just made it all… This parser should work for you (but note that it is untested, I’ll leave that to you):

local tokenize = require(script.tokenize)

local getToken
local getType
local nextToken

local success

local function is(literal)
    local token = getToken()
    success = token == literal
    if success then
        nextToken()
        return token
    end
end

local function isA(tokenType)
    success = getType() == tokenType
    if success then
        local token = getToken()
        nextToken()
        return token
    end
end

local setMTOnCall = {
    __call = function (mt, t)
        return setmetatable(t, mt)
    end
}

-- op types
local OR = {
    default = false;
    cont = function() return success end;
}
OR.__index = OR
setmetatable(OR, setMTOnCall)

local AND = {
    default = true;
    cont = function (option)
        if success then
            return true
        elseif not option then
            return false
        elseif option == 'STOP' then
            success = true
            return false
        elseif option == 'CONT' then
            return true
        else
            error 'Unrecognized AND operation'
        end
    end;
}
AND.__index = AND
setmetatable(AND, setMTOnCall)

local branch
local productions = {
    STATEMENT = AND {
        {is, 'select'};
        {branch, 'VALUES', 'values'};
        {is, 'as'};
        {isA, 'NUMBER', 'as'};
    };
    VALUES = AND {
        {branch, 'VALUE', 'value'};
        {branch, 'RELATION', 'relation', 'STOP'};
        {branch, 'VALUES', 'next'};
    };
    VALUE = OR {
        {isA, 'NUMBER', 'number'};
        {branch, 'RANGE', 'range'};
    };
    RANGE = AND {
        {isA, 'NUMBER', 'start'};
        {is, 'thru'};
        {isA, 'NUMBER', 'finish'};
    };
    RELATION = OR {
        {is, 'or', 'op'};
        {is, 'and', 'op'};
    };
}

local root
function branch(name)
    local newRoot = {
        root = root;
        name = name;
    }
    root = newRoot

    local actions = productions[name]
    for _, action in ipairs(actions) do
        local v = action[1](action[2])
        if action[3] then
            root[action[3]] = v
        end

        if not actions.cont(actions[4]) then
            return
        end
    end
    success = actions.default

    root = root.root
end

return function(input)
    local tokens, types = tokenize(input)
    local i = 1

    function getToken() return tokens[i] end
    function getType()  return types[i]  end
    function nextToken() i = i + 1 end

    root = { name = 'ROOT' }

    branch 'STATEMENT'
    return success, root
end

When included as a module and called with the input string, it will spit out two values. The first is if the input string is a part of the language defined by the grammar. The second is the root of the parse tree. It is your job to navigate through the parse tree to convert it into the actions that you want (semantic analysis). I’d recommend feeding it a simple test string and then printing out the resulting tree to get a feel for the trees it produces. The third element of each production action controls where the parsed token is stored in the parse tree, if present.

LMK if you run into problems debugging the parser (I have lots of practice making parsers in Lua, so it should be close to working).

Edit: fixed the option variable in AND.cont. It referred to action[4], which DNE in the context, but is exposed through the option parameter.

8 Likes

I’m not sure if the above solution worked for you (I see that the question still isn’t marked as resolved), so here is a solution that I know works! I used your grammar as a simple test for semantic actions in my GLR parser. I’ve uploaded the GLR parser and the files for your specific grammar here. The files for your grammar are located in grammars/example/*. I’ll go through the files I added with some explanations here.

/grammars/example/tokenize.lua
return function (input)
    local literals = {}
    local types = {}

    for token in input:gmatch '%w+' do
        local i = #literals + 1
        literals[i] = token
        types[i] = token:match '^%d+$' and 'number' or 'keyword'
    end

    local lastI = #literals + 1
    literals[lastI] = ''
    types[lastI] = 'eof'

    return {
        literals = literals;
        types = types;
    }
end

This file takes in an input string, and returns the tokens that make it up, along with each tokens type. For this grammar, tokens are simply either whole words or numbers, separated by whitespace. The token type is a lowercase string which makes specifying which tokens are acceptable and which are not in parts of the syntax later on much, much easier. Note that we also add an ‘end of file’ token type to the end of the tokens. You could add any unique token type here than you want. We will be referring to it in the syntax file.

/grammars/example/syntax.lua
local syntax = require('Syntax').define()

START = STATEMENT * eof
STATEMENT = keyword.select * VALUES * keyword.at * number
VALUES = VALUE * keyword['and'] * VALUES + VALUE
VALUE = number + number * keyword.thru * number

return syntax

This file is almost an exact copy of the grammar I gave you. In order for a grammar to be used in an LR parser, it must be augmented. This means that we append a start symbol to it so the parser knows where to start, and we append a unique final token. Because most LR parsers use 1 token of lookahead, each production, like STATEMENT, is recognized only after the token after it is seen. Appending a ending token allows STATEMENT to be recognized and if we were to allow multiple statements, it would allow the parser to know when it has parsed the final statement. Note that the ‘eof’ token type was appended to the tokens in tokenizer.lua.

* means that the two productions are required
+ means that one or the other production is required
The normal Lua order of operations applies. () can be used to change the order of operations.
You can even save an expression of productions (which I call an assembly) to a local variable and reuse it in multiple places. Lowercase variables mean that any token of that type can be included there, while uppercase variables are nonterminal productions that must be defined and can be reused and have semantic actions. Indexing a lowercase variable requires a specific token of that token type. Also, the syntax is so dynamic that if needed you could include a local function like this:

local function list(sep, prod)
    return prod * (sep * prod) '*'
end

which could be used to define lists. Calling a production or assembly with '*' means ‘zero or more’, and '?' means ‘zero or one’.

/grammars/example/semantics.lua
local syntax = require 'grammars/example/syntax'
syntax:extend()

function number(token, parentRecords)
	parentRecords[#parentRecords + 1] = tonumber(token)
end

function VALUE(records, parentRecords)
	local indexes = {}

	if records[2] then
		for i = records[1], records[2] do
			indexes[i] = true
		end
	else
		indexes[records[1]] = true
	end

	parentRecords.value = indexes
end

function VALUES(records, parentRecords)
	local result

	if records.values then
		result = records.values
		for i in next, records.value do
			result[i] = true
		end
	else
		result = records.value
	end

	parentRecords.values = result
end

function STATEMENT(records, parentRecords)
	parentRecords.at = records[1]
	parentRecords.values = records.values
end

return syntax

The semantic actions here are an example of syntax directed translation. When a production is recognized, the parser checks if the production has a semantic action associated with it. If so, it calls the action and stores the results. Once all the productions in a parse tree to make up another production have been recognized, the results of all the children’s semantic actions (called semantic records) are given to the parent semantic action, and so on all the way to the top production of the tree.

Note here that the first semantic action is defined for a token type. Since it represents a single token, and isn’t made up of productions, it is called with the token it represents rather than the semantic records of its non-existant production children. :slight_smile: As the parse tree gets constructed, the values propagate up until the final production, STATEMENT. At that point, STATEMENT’s parentRecords is the output the parser will give. STATEMENT is literally determining what the parser’s output will be.

/main.lua
-- Required Modules
local syntax = require 'grammars/example/semantics'
local tokenize = require 'grammars/example/tokenize'
local createDFA = require 'generators/SLR(1)'
local parse = require 'parsers/GLR'

-- Perform parse
local DFA = createDFA(syntax)
local tokens = tokenize 'select 1 thru 5 and 15 at 100'
local results = parse(DFA, syntax, tokens)

-- Print results
local at = results.at
for index in next, results.values do
	print(index, at)
end

The GLR parser is highly modular. A GLR (generic LR) parser can use many different types of LR parser generators. I’ve included the SLR(1) parser generator, which I give the syntax to. It generates a parse table, also known as a DFA or deterministic finite automaton. I also call the tokenizer with an input string. Finally, the parse table, syntax, and tokens are passed into the GLR parser which returns the results of the last semantic action, STATEMENT in our case.

GLR parsers are renowned for being the easiest parsers to use because you don’t need to mutilate the grammar to get it into more restrictive forms. As you can see, the grammar could be easily extended in any way you want, very quickly.

This parser is my baby right now, and I’d like to make it as usable as possible. I hope that it can help you solve your problem. While I haven’t gotten around to making a quick way to import it into Roblox, it should be pretty simple. After you have copied all the scripts into Roblox modules, you will need to make sure they are able to require eachother. Outside roblox, require takes in a string path to the lua script, but in Roblox it takes in the ModuleScript instance. Once you have changed all those paths, it will work. Please let me know in a direct message if you have any problems putting it into your game, I’ve love to help!