Need help knowing what this String Pattern is (gmatch)

Hi, I am wondering what this string pattern means? It looks very complicated

"[^%s]+"

1 Like

It looks complicated, but it’s not so much when it’s broken down into more manageable pieces.

local quote = "Broken down into more manageable pieces."
for str in quote:gmatch("[^%s]+") do print(str) end

image

  • string.match finds the first match/occurance of a pattern and returns the captures.
  • string.gmatch returns an iterator that is called multiple times and returns captures from all matches.
local str = "a b c"
print(str:match("%a"))
for capture in str:gmatch("%a") do print(capture) end

string.match returns only “a”, while string.gmatch returns all three letters, each in its own iteration (a then b then c).

Explanation of the pattern

  1. %s is a character class for whitespace (" "). A capitalised %S is its opposite, used to exclude all white spaces, respectively include everything but whitespaces.

  2. ^ is called a magic character. It has two meanings:

  • Head of the strings, $ is the polar opposite and matches the end of the string.
print(quote:match("^Broken")) --> Broken (not followed by a whitepsace)
print(quote:match("pieces.$")) --> pieces.
  • Negation - it can negate a character class. If %s represents all white spaces, ^%s represents all but white spaces (same function as %S).
  1. + is a class modifier that matches one or more occurances of the preceeding character class. For example (using class %a, look for at least one or more characters that are upper or lower case letters until you run into a different class):
print(quote:match("%a+")) --> Broken (<-- stops at whitespace)
  1. [ ] finally, the square brackets are used to define sets to combine different character classes.
    Let’s look at some examples with a different string that includes more punctuation.
local str = "One.! Two."
print(str:match("%a+")) --> One
print(str:match("%a%p+")) --> e.!
print(str:match("[%a%p]+")) --> One.!
print(str:match("%a+%p")) --> One.

In line 3 we used no character sets. Luau would interpret the pattern as starting at the beginning of the string, match one upper or lower case letter followed by at least one or more punctuation characters.

In line 4 we created a character set. The interpretation would be different: starting at the beginning of the string, match at least one or more characters that is either an upper/lower case letter or a punctuation character.

In line 5 a set is not necessary because + is only applied to one character class %a. It would match a string consisting of at least one upper or lower case letter and ending with punctuation.

The original quote can be broken into substrings another way:

local quote = "Broken down into more manageable pieces."
for str in quote:gmatch("%S+") do print(str) end

Interpretation: capture all matches of strings that consist of anything but a whitespace.

The best way to really understand string patterns is to practice and experiment.

Edits: All edits are formatting changes.

2 Likes

Thanks! This helped me so much!

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.