I’m not sure what string manipulation thing I should use to get the first word of a string.
Example string: “DevForum is a cool place!”
In that case, I need the first word “DevForum”. How will I be able to do this?
Thank you!
I’m not sure what string manipulation thing I should use to get the first word of a string.
Example string: “DevForum is a cool place!”
In that case, I need the first word “DevForum”. How will I be able to do this?
Thank you!
You could use string.split()
.
local String = 'Hello, World!'
local SplitString = String:split(' ')
print(SplitString[1]) -- In this case it prints: "Hello,"
Alternatively you can use string.match
then match a character until it hits a whitespace.
local myString = "Hello, World"
print(myString:match("^(.+)%s")) --> prints "Hello,"
Could you tell me the advantage of using one method over the other?
Hi!
@story246 is right, this is most likely the best way to get the first word from a longer string. To add to that, you can use this code to remove all special characters and punctuation, respectively keep all letters in english alpahbet.
EDIT
The code below is edited and includes seemingly the most efficient version. We’ve come to this conclusion throughout this topic discussion.
local phrase = "Hello, world"
--[[
Split the phrase into a table, and separate elements
wherever there is a white space:
table[1] = "Hello,"
table[2] = "world"
]]
local firstWord = string.split(phrase, " ")[1] -- isolate the first word
firstWord = string.gsub(firstWord, "%A", "") -- keep only alphabetic characters
print(firstWord)
I’ll explain how they work a bit, @slothfulGuy.
With the first method, it splits the string wherever there is a space. So, newString[1]
would always be the content in the string before a space occurs.For example:
local myString = "Hello world, I am Joe Bloggs and play Roblox"
local newString = string.split(myString, " ")
print(newString[1]) -- "Hello"
print(newString[4]) -- "am"
print(newString[8]) -- "Roblox"
Whereas, with string.match
(second method), it just returns every character until it hits a space. With this method, you don’t choose which part it prints/which part you use. So, example:
local myString = "Hello world, I am Joe Bloggs and play Roblox"
print(myString:match("^(.+)%s")) --> prints "Hello"
Hope that helps!
Why do you think it’s the best way to get the first word from a longer string?
Why not just do firstWord:gsub("%A", '')
to remove all but ASCII alphabet?
There’s no reason to call the same function (string.byte
) multiple times, use for loops for this and create a table for this when phrase:gsub("%A", ''):match("^(.+)%s")
works.
I’d do %S+ instead of .+ since your pattern will return all but the last word. Try printing string.match("three word string", "^(.+)%s")
, it’ll be “three word”.
@Blockzez you are right, string.gsub(firstWord, "%A", '')
is indeed the most efficient way of isolating ASCII characters. Unfortunately, at the time of writing above post, I forgot about that option existing.
I was interested in quick comparison of both methods (now my previous post is corrected and doesn’t include it anymore, because as expected, it is fast, but not the most efficient).
In my quick test I ran both examples and measured their process time. The string I chose consisted of 50 randomly generated characters: lower and upper case, numbers and all common puncuation marks. The same procedure was repeated 6 times for each method. After I got all the results, I removed the two most deviant ones and calculated the average:
string.gsub → 3.10E-06 s
loop → 3.42E-05 s
As you can see from the results, gsub method is faster, but the difference is almost negligible, at least when it comes to relatively short strings. gsub method is still recommended, because it is obviously simple, faster, more practical and requires less code. A small disadvantage of loop method is also that it doesn’t respond well to special characters, namely open brackets “(” and close brackets “)” in my case.
As far as the first word isolation goes:
string.match(phrase, "^(.+)%s")
string.split(phrase, " ")[1]
The former option seems easier to read and write.
Final version @slothfulGuy:
local phrase = "Hello, world"
--[[
Split the phrase into a table, and separate elements
wherever there is a white space:
table[1] = "Hello,"
table[2] = "world"
]]
-- Isolate the first word
local firstWord = string.split(phrase, " ")[1]
-- or (same process time)
local firstWord = string.match(phrase, "^%A*(%a+)")
firstWord = string.gsub(firstWord, "%A", "") -- keep only alphabetic characters
print(firstWord)
What? The gsub method is an order of magnitude slower.
(Also, for the record, punctuation is still ascii. I think you meant alphabetic.) Regardless, if you’re looking just for the first alphabetic word, you can do string.match(phrase, "^%A*(%a+)")
. You don’t even need to check for %s in either this or the %S+ pattern I used earlier since + capturing is greedy and neither would capture spaces.
Every character until the last space in the string.
@posatta thank you for correcting me of that! Yes, I did mean alphabetic, and I also corrected string.match(phrase, “^%A*(%a+)”) as you proposed.
Thank you @Blockzez too for reminding me of string.match(str, “%a”, ‘’)!
As far as letter isolation goes, gsub is a standard built-in function, running as privileged code and is part of string library written as a C module. I ran the test again using a faster version of the loop, but I found no way to write a quicker function inside native Roblox studio environment. Most likely LPEG library is faster.
Here are the benchmarks. Keep in mind that this is of rather informative nature.
This time the test was done on 10 * 10^6 characters long string. Again, from six results I removed the two most deviant ones. Average:
loop: 1.521 s
gsub: 0.269 s
gsub is aproximately 5.6 times faster.
Here is the loop code:
local n, bc = 0, nil
local letters = {}
for i, v in pairs(string.split(str, "")) do
bc = string.byte(v)
if (bc >= 65 and bc <= 90) or (bc >= 97 and bc <= 122) then
n += 1
letters[n] = v -- probably faster than table.insert()
end
end
letters = table.concat(letters)
string.split(str, '')
performed better than
string.gmatch(str, '.')
I also tried storing all letters in a dictionary, although that turned out to be slightly slower.
What about concatenating strings? No, that may be slightly faster, but worse for performance and probably takes up much more memory. Strings are hashed, so this is not a better way.
Because the loop is slower, it also has lower limits. I tried extending the string up to 80 * 10^6 characters, but both the Notepad and Roblox hardly handled such large amounts. (For comparison, War and peace by L. N. Tolstoy has around 589,000 words and likely well over 3 million characters.) gsub function still managed to perform pretty well (with higher delays of course), compared to the loop, which exhausted all allowed run time. To prevent that, we have to add a coroutine making sure function waits every now and then.