Hey! I’ve found myself in a situation where I need an alternative to string.split(), not only do I need multiple separators, I want each separator to have variable lengths and symbols
I’m not asking for a script, I’m pretty sure I could write one up that just bruteforces string.find as it collects each occurrence, but I was wondering if this has already been done in a cleaner, more efficient way
EDIT: Another detail that drastically complicates this issue: I’d like the separator to be kept in the string, not deleted like what string.split does
local Splitter = {}
function Splitter.universalSplit(texto, ...)
local separadores = {...}
local resultados = {texto}
for _, sep in ipairs(separadores) do
local temp = {}
for _, fragmento in ipairs(resultados) do
local inicio = 1
local s_inicio, s_fin = string.find(fragmento, sep, inicio, true)
while s_inicio do
local parte = string.sub(fragmento, inicio, s_inicio - 1)
if parte ~= "" then
table.insert(temp, parte)
end
inicio = s_fin + 1
s_inicio, s_fin = string.find(fragmento, sep, inicio, true)
end
local resto = string.sub(fragmento, inicio)
if resto ~= "" then
table.insert(temp, resto)
end
end
resultados = temp
end
return resultados
end
return Splitter
Example of Use:
local Splitter = require(game.ReplicatedStorage:WaitForChild("Splitter"))
local data = "Level1>>Player_Pro---Cash:500>>LOL"
local List= Splitter.universalSplit(data, ">>", "_", "---", ":")
for _, v in ipairs(List) do
print(v)
end
If that’s what you’re looking for, then this is perfect for you.
local StringUtil = {}
--[[
Splits a string using an unlimited number of custom separators
of any length or symbol combination.
]]
function StringUtil.multiSplit(inputString: string, ...: string)
local separators = {...}
local results = {inputString}
-- If no separators are provided, return the original string in a table
if #separators == 0 then
return results
end
for _, sep in ipairs(separators) do
local tempResults = {}
for _, fragment in ipairs(results) do
local searchIndex = 1
-- 'true' enables plain text search (skips Lua patterns for speed)
local startPos, endPos = string.find(fragment, sep, searchIndex, true)
while startPos do
local segment = string.sub(fragment, searchIndex, startPos - 1)
-- Only add non-empty segments to the list
if segment ~= "" then
table.insert(tempResults, segment)
end
searchIndex = endPos + 1
startPos, endPos = string.find(fragment, sep, searchIndex, true)
end
-- Add the remaining part of the string
local remaining = string.sub(fragment, searchIndex)
if remaining ~= "" then
table.insert(tempResults, remaining)
end
end
results = tempResults
end
return results
end
return StringUtil
My first thought is to substitute every delimiter (separator) with a single, universal delimiter, then you can split the string normally. You can generalize this process by keeping each delimiter in an array. When you want to split a string containing a combination of these delimiters, you can loop through each delimiter in the array and replace them with a universal delimiter using gsub before splitting the string.
It’s not the most optimized process because Lua uses string interning (search it up), but as long as your strings aren’t too large then I think your memory usage should be fine.
Something like this:
local UNIVERSAL_DELIM = "@" -- this delimiter can also exist in the delims table for this specific task, but I made it unique for the example
local delims = {".", "?", ":/:"}
local function splitButWithMultipleDelims(str) -- you should probably change this function name to something more concise
local newStr = str
for _, delim in delims do
newStr = string.gsub(newStr, delim, UNIVERSAL_DELIM)
end
return string.split(newStr, UNIVERSAL_DELIM)
end
Quite an interesting problem. I decided to take my own crack at it:
type Characters = {string}
local function get_characters(text: string): Characters
return string.split(text, "")
end
local function get_delimiter_characters(delimiters: {string}): {Characters}
local results = table.create(#delimiters)
for index, delimiter in delimiters do
results[index] = get_characters(delimiter)
end
return results :: any
end
local function split_multi_delimiter(text: string, delimiters: {string}): {string}
local characters = get_characters(text)
local delimiters = get_delimiter_characters(delimiters)
local match_index = 1
local current_index = 1
local results = {}
local function is_terminal(delimiter: Characters): boolean
for offset, symbol in delimiter do
local is_match = symbol == characters[current_index + offset - 1]
if not is_match then
return false
end
end
return true
end
local function match_delimiter(): boolean
for _, delimiter in delimiters do
if is_terminal(delimiter) then
return true
end
end
return false
end
local function record_slice()
local slice = string.sub(text, match_index, current_index - 1)
match_index = current_index + 1
table.insert(results, slice)
end
for _ = 1, #text do
if match_delimiter() then
record_slice()
end
current_index += 1
end
record_slice()
return results
end
print(split_multi_delimiter("Hello, world! My name is Ziffix.", {",", "!"}))
--[[
{
"Hello",
" world",
" My name is Ziffix.",
}
]]
Getting to write this in a more C-like language would have been easier, and I could make more micro-optimizations
I mean, that goes for any delimiter, right? Of course the universal delimiter doesn’t have to be the @ symbol, just anything you know won’t naturally appear in the string. I believe it could be multiple characters as well to further prevent natural occurrence.
Instead of an arbitrary universal delimeter based on “probably won’t be an issue”, it could be let’s say the first real delimeter. I am a bit tired so hopefully this is correct reasoning haha
So for example, if the input delimeters are A, B, and C then choose the universal delimeter as A.
EDIT: Also there is a problem with using gsub since there is no way to make the pattern plain (unless manually). Your example would not work if . (period) is a delimeter because that will match any character.
I like that. Only introduces an extra string.find which is easy to implement.
You’re right. I knew I was forgetting something about Lua patterns. I believe %. should work. If a delimiter consists of multiple characters then I think in my function you can make it insert a % before each character so that the pattern matches the literal characters.
Here is my attempt at a general-purpose solution. I haven’t found a case that could break this yet. Let me know!
local function split(
input: string,
sep: { string }
)
if #sep == 0 then
return { input }
elseif #sep == 1 then
return string.split(input, sep[1])
end
local result: { string } = {}
local len = #input
local i = 1
repeat
local k = 0
local w = 0
local j = len
for _, v in sep do
local a, b = string.find(input, v, i, true)
if a and a <= j then
j = a
w = b - a
k = -1
end
end
local sub = string.sub(input, i, j+k)
table.insert(result, sub)
i = j + w + 1
until k == 0
return result
end
This is what I ended up with after first implementing it using recursion and then converting it to a loop:
local function split(s: string, ...: string): {string}
local parts = {s}
for i = 1, select("#", ...) do
local separator = select(i, ...)
local newParts = {}
for _, part in pairs(parts) do
local subparts = string.split(part, separator)
for _, subpart in pairs(subparts) do
table.insert(newParts, subpart)
end
end
parts = newParts
end
return parts
end
Please use more concise variable names in the future. Single-letter variables are harmful to readability in several ways, with the most damaging consequence being the inability to efficiently discern how the algorithm works
i actually already used this method for another reason funnily enough, but the issue I forgot to specify is that I need to preserve the separator as part of the array item (for context I’m trying to extract every term of a polynomial in the form of a string
i love how concise this one is, the problem does rise once more that I’m looking to preserve the separator in the string (split automatically barres it). This detail wasnt communicated in my post and I do apologize for that
you’re right haha! I’m a little slow today
i could see this function working great for some other usecases of mine, but I just edited the post because I forgot to mention that I’d like the string patterns to be kept, not deleted like the split method does. Any ideas how that could be implemented? I was thinking keeping the previous find index and plugging it into the sub’s start index