Attempting to separate a string by newline, have two results but unsure of which to use

Edit: Lua does not use regex! Removed that from the thread to avoid confusion.

I’m no good with regex Lua patterns and hadn’t a clue where to even start attempting (other than String Patterns Reference which explains a lot but not everything), so I did my research on the topic (see title) and got pleasant results. Just no journey towards them.

I found two sources, a StackOverflow thread and a GitHub Gist that gave me an answer to my question, so I now have the patterns [^\r\n]+ and ([^\n]*)\n? to use with gmatch. This will allow me to get the content per line of a multiline string.

Both of these captures achieve exactly what I want, but I don’t understand what these patterns are doing and therefore can’t make a decision on what to use. While hypothetically I shouldn’t care since they do the same thing and work, I make it a habit to care and know what my code is doing. I also feel as though it does in fact matter what pattern I use.

To try and help me better understand this pattern, I attempted to use @Halalaluyafail3’s String Pattern Analyzer Plugin which converts a string pattern into English. It’s worked well, provided you actually understand regex character classes and such for complex patterns.

For reference, these are the explanations the plugin gave back:

Pattern: [^\r\n]+
Analysis: A character set which will match anything but one of the following 1+ times, as many times as possible, giving back when necessary (greedy)
The character ‘\r’
The character ‘\n’
Source: https://stackoverflow.com/questions/32847099/split-a-string-by-n-or-r-using-string-gmatch

Pattern: ([^\n]*)\n?
Analysis: Capture 1
A character set which will match anything but one of the following 0+ times, as many times as possible, giving back when necessary (greedy)
The character ‘\n’
Matches the character ‘\n’ 0-1 times, as many times as possible, giving back when necessary (greedy)
Source: https://gist.github.com/iwanbk/5479582

Further searching has just pushed me into constant dead ends; from as complex as understanding (non-)greedy captures, why \r\n is the specific expression for a new line and why not either of those individually: and what the difference between a newline and a carriage return is. No idea.

I would be grateful if someone could enlighten me on this topic. To summarise what I’m asking:

  • What is the difference between these patterns and why might it be relevant? What are these patterns doing that I should be aware of?

  • Are these good methods to use at all or should I opt to use something else? What are the benefits of the other methods over feeding these expressions through gmatch?

  • Are there any resources that dumb down regex that may be helpful for novices? I can easily go read from a regex documentation site or whatever, already considered it and have a few sources, but wondering if anything really sets a starting point for investigating.

3 Likes

I use \n most of the time to seperate lines but other methods could surely be better to use if your familiar with them

Woah, I didn’t know you struggled that hard.
what I do is

local hotdogdescription = "this is delicious and"..
 "good"
print(hotdogdescription)

to be honest is my explanation even relevant

Roblox has a string.split("split char") function. That splits a string.

yes, if it is related to yours

local twostrings = "cheese burger"
twostrings = twostrings:split(" ")
--twostrings = {"cheese", "burger"}

Whenever I’m trying to learn a brand new topic I always like doing a quick video search on YouTube to see if anyone made videos about my desired topic. If you have the time to watch a 37-minute video then maybe watch Corey Schafer’s Regular Expressions (Regex) Tutorial. I watched the first 15 minutes and think his video is a good starting point to understand the basics of Regex. You probably will get the same feeling as you did when first learning print("Hello World!") thinking that it’s too simple but I think it is well worth the patience of understanding basic regex.

Lua doesn’t use Regex, it uses Patterns

But looking at the API reference for Patterns they seem to share similar things to Regex.

Will not return empty lines

Will return empty lines

So to answer your question: it depends on whether you want to return empty lines or not. If you do though, I would juse use the pattern: (.-)\n

1 Like

Let me reply to some of the questions in this topic one at a time.

Difference between carriage return and newline?
Straight to the point answer is, \n creates a new line, while \r gets you back to the first character of a line.

If you did

print("hi \nbye")

The result would be

hi
bye

\n is almost like pressing enter , it returns to the next created line and puts what’s after the \n on that line. Note that there is a " " (space) between "hi" and \n, due to that the first line isn’t "hi" it’s actually "hi ".

If you did

print("hi \rj")

The result would be

ji

Hmm, interesting. Well as I said \r is like using the mouse and positioning the cursor at the start of the current line. And any characters after \r will replace the characters with corresponding positions. In the upper example, "j" replaces the first character "h", hence "ji". If you tried to print "hhhh \rbye" it would print "byeh". It’s like we took “bye”, went to the start of the line, and we put it there thus it replaced “hhh” the first three "h"s. So as you can see, \r or carriage return has nothing to do with new lines.

Then what on earth is \r\n?
Well if you were to explain what this is doing exactly, it’s saying go to the start of the line, and create a new line there (from my understanding \n wouldn’t replace the first nor the second character, it’s just regex magic that’s abstracted away from us). Doesn’t that mean "hi \r\n" would just be


hi

A new line then “hi”, which is not the same as “hi” then a new line. After a little bit of searching it seems that different operating systems handle regexs differently. For example, Mac OS used \r to make new lines. In recent version starting from Mac OS X it changed to \n. (I’m on a windows machine so can’t really confirm). Windows uses \n, but it turns out you can as well use \r\n, which will just be replaced with a \n. Why does it support \r\n to create new lines and it replaces it with a \n, well due to backwards compatibility (more info here). So I think what we’re being told here is \r\n isn’t actually a combination of \r and \n put together, it’s a single special character \r\n.

1 Like

[^\r\n]+ will match 1+ characters that aren’t newlines or return carriages, and then stop.
([^\n]*)\n? will match 0+ characters that aren’t newlines, and then optionally match a newline character.
Aside from the difference with the return carriage, the second pattern will match empty lines, while the first will ignore them. It’s also worth noting that the second pattern will get an empty string on the last match.
Here is an example of what I mean:

for i in string.gmatch("abc\ndef\n\nghi","[^\r\n]+") do
    print(#i,":",i)
end
--> 3 : abc
--> 3 : def
--> 3 : ghi
for i in string.gmatch("abc\ndef\n\nghi","([^\n]*)\n?") do
    print(#i,":",i)
end
--> 3 : abc
--> 3 : def
--> 0 : 
--> 3 : ghi
--> 0 : 

Lua doesn’t have regex, but for Lua patterns the manual does a pretty good job of explaining Lua patterns, although it doesn’t mention frontier patterns which is documented in newer versions of the manual.

2 Likes

@RealyConfus1on I understand, but that doesn’t answer my question. You have to use \n to separate a string into multiple lines: what I’m trying to do is not write out a string but fetch the content of a multiline string on a per-line basis.

@BasedKnowledge Same response as above, but just to inform you that what you’re doing is still regular concatenation. Whitespace in your code doesn’t make a newline, you have to explicitly specify that with \n. Your concatenated string will still be evaluated as a single-line string.

@Luka_Gaming07 Appreciate the response, this is also what I thought to try initially however string.split only splits by plaintext separators, not patterns, so this wouldn’t work in my case. I also feel that it’d be reinventing the wheel to try and make a custom split implementation that functions like find (e.g. a parameter to specify whether to treat the separator string as a pattern or plain string).

@xZylter Thanks for the resource. Although I later learned through these replies and some searching that Lua patterns and regex are not the same thing, this is future knowledge that can come in handy when I actually need to start working with regex (whenever that may be, I don’t know).

@Isocortex Thank you for the elaboration! It is actually more helpful to know that there is a difference and that while the patterns still both separate on a per-line basis, one returns new lines and another doesn’t. I’ll use that explanation as a basis for further research, since I’m still trying to analyse the whole pattern itself by each character class and whatever. On another note, I tried your suggested pattern and it also looks to be viable except for how it skips the last line in a multiline string.

local str = "foo\nbar\n\nqaz\nfrobnicate"
local str2 = str .. "\nbaz"

for s in string.gmatch(str, "(.-)\n") do
    print(s) -- frobnicate skipped
end

print("---")

for s in string.gmatch(str2, "(.-)\n") do
    print(s) -- baz skipped
end

@starmaq The explanation is much appreciated! This helped me better understand why the specific pattern is \r\n. Effectively, returning to the beginning of the string and then starting a new line. I’m not too sure about starting the new line but I’d assume it has to do with how gmatch matches. I’ll still have to do my own research after this, but thank you.

@Halalaluyafail3 Much like above, thank you for the clear explanation. I think this decisively answers what I want to use for my current case, but also helps for future purposes. Thanks for also clearing up that this is not regex and just its patterns. A question: you mentioned that the second pattern will get an empty string on the last match. Is there any way to void that within the pattern or should I just clear off the last entry (skip the iteration or remove it from an array if I put the results in one) if I use it?

1 Like