Recently I have been working with regex to match some strings for a markdown system I’m making. Currently I’ve been using the following pattern:
%*(.*)%* -- > should match text *this* but not like **this**
However, when using this pattern I receive unintended outputs:
I expected my script to only detect the word world but it seems to be detecting the whole string instead. Heres my code for reference:
local text = "Hello **spectacular** *world*!"
local md = {
["*"] = {
["format"] = "<i>%s</i>",
["query"] = "%*(.*)%*"
},
};
function format(text : string)
for _, replacementData in pairs(md) do
local starts, ends, plainTxt = string.find(text, replacementData.query);
if (starts) then
print("Detected:", text:sub(starts, ends))
end
end
end
format(text)
try [^%*]%*(.-)%*. * is greedy while - is not greedy. I also added that bit to the beginning so that it’s not thrown off by the ** (although I suppose if you are parsing bold first you don’t need to worry about it).
also I don’t think the capture group is actually working; maybe string.match instead of string.find does the trick?
Heres my working code for anyone else who stumbles into this issue:
local example = "*italic* this **works** like a __charm__! ~~strikeout~~ *unfinished md"
local md = {
["**"] = {
["format"] = "<stroke thickness=\".2\">%s</stroke>",
["query"] = "[^%*]%*%*(.-)%*%*"
},
["*"] = {
["format"] = "<i>%s</i>",
["query"] = "[^%*]%*(.-)%*"
},
["__"] = {
["format"] = "<u>%s</u>",
["query"] = "[^%*]__(.-)__"
},
["~~"] = {
["format"] = "<s>%s</s>",
["query"] = "[^%*]~~(.-)~~"
},
};
function format(text : string)
for _, replacementData in pairs(md) do
repeat
local starts, ends, plainTxt = string.find(" "..text, replacementData.query); -- The regex we are using requires at least one space to work
if ((starts and ends) and (not plainTxt:find("^%s*$"))) then -- Make sure our regex starts and ends somewhere AND isnt just whitetext
text = (text:sub(1, starts - 1)..(string.format(replacementData.format, plainTxt))..(text:sub(ends)));
end
until
((not starts or not ends) or (plainTxt:find("^%s*$")));
end
return text
end
print(format(example));
This worked perfectly! However, I still wonder why that regex code specifically requires there to be some pre-existing characters prior to the one I want to use.
I assume that’s because of the [^%*] at the beginning. as I stated if you parse for bold first (as you already did) that part isn’t actually needed, so just go ahead and remove it.
edit: forgot it’s a dictionary so not ordered. bold might not actually run first in the script you posted, so just turn it into an array or something so that bold always comes before italics.