I am trying to very efficiently generate a list that has the count of each character in a string, essentially looking like this for the string “Hello world!”
local frequency = {}
for i, c in input:split("") do
frequency[c] = (frequency[c] or 0) + 1
end
(also very interesting that the world input is blue)
It works at about 2 million character per second, which is not good enough for me.
Another solution I have tried to speed up the process is:
local frequency = {}
for c = 0, 255 do
local char = string.char(c)
local count = #input:split(char)-1
if count > 0 then
frequency[char] = count
end
end
But that is much much slower, presumably because of how many times split is going through the string.
Another thing I have tried is instead of iterating through input:split() I iterate through 1, #len and use :sub, but it’s just a tiny bit slower on average.
True, but I don’t see how I would get much better using c or something. Since the string:split() function is c api and its only working at 2 million characters per second, that’s the bottleneck (my code inside the loop is able to work about about 7 million characters per second).
local Str = "Hello World!"
local Frequency = {}
for Character in string.gmatch(Str, ".") do
if Frequency[Character] then
Frequency[Character] += 1
else
Frequency[Character] = 1
end
end
print(Frequency)
I haven’t touched C much yet so I couldn’t say for certain, but I’d imagine that you could do this a lot faster by just iterating over the bytes of the string, basically the same as substring but without all of the messy Lua function calls. Just a loop and a sequential memory read.
Just for curiosity’s sake, since gmatch was only slightly slower, is there any benefit to using gsub instead? I wouldn’t think so but you might as well try it since we’re looking.
local Str = "Hello World!"
local Frequency = {}
Str:gsub(".", function(c) Frequency[c] = (Frequency[c] or 0) + 1 end)
My goodness, it’s running at 16 million characters per second (above my theoretical maximum of 7 million) I would of thought that since it’s calling the function very often it would be slower but apparently not.
Glad to have helped! Here’s another idea I found while digging around if you want to give it a go as well. It’s supposed to be faster than substring.
local input = "Hello World!"
local frequency = {}
for i = 1, #input do
local byte = string.byte(input, i)
frequency[byte] = (frequency[byte] or 0) + 1
end
local Cache = {["."] = {}}
for Charcter = 0, 255 do
Cache["."][Charcter+1] = string.char(Charcter)
end
function GetCharactersFromSet(CharSet: string)
local Characters = {}
for _, C in ipairs(Cache["."]) do
if string.match(C, CharSet) then
Characters[#Characters + 1] = C
end
end
Cache[CharSet] = Characters
return Characters
end
function Random(Length: number?, CharSet: string?)
local Length = ((Length and Length > 0 and Length) or 20)
local CharSet = ((CharSet and CharSet ~= "" and CharSet) or "[%w]")
local CharPattern = (Cache[CharSet] or GetCharactersFromSet(CharSet))
local MaxRange = #CharPattern
local Randomized = ("")
for _ = 1, Length do
Randomized ..= CharPattern[math.random(1, MaxRange)]
end
return Randomized
end
------------------------------------------------------------------------------------------------|
function Benchmark(func, times, ...)
local Start = os.clock()
for _ = 1, (times or 1) do
func(...)
end
local End = os.clock()
return End - Start
end
local Str = Random(10000):rep(1600)
function GMatch()
local Frequency = {}
for Character in string.gmatch(Str, ".") do
if Frequency[Character] then
Frequency[Character] += 1
else
Frequency[Character] = 1
end
end
end
function Split()
local Frequency = {}
for Character in string.split(Str, "") do
if Frequency[Character] then
Frequency[Character] += 1
else
Frequency[Character] = 1
end
end
end
function Sub()
local Frequency = {}
for i = 1, #Str do
local Char = string.sub(Str, i, i)
Frequency[Char] = (Frequency[Char] or 0) + 1
end
end
function GSub()
local Frequency = {}
Str:gsub(".", function(c) Frequency[c] = (Frequency[c] or 0) + 1 end)
end
function Byte()
local Frequency = {}
for i = 1, #Str do
local byte = string.byte(Str, i, i)
Frequency[byte] = (Frequency[byte] or 0) + 1
end
end
warn(string.format("Split - Elapsed time: %.4f seconds", Benchmark(Split)))
task.wait(1)
warn(string.format("GMatch - Elapsed time: %.4f seconds", Benchmark(GMatch)))
task.wait(1)
warn(string.format("Sub - Elapsed time: %.4f seconds", Benchmark(Sub)))
task.wait(1)
warn(string.format("GSub - Elapsed time: %.4f seconds", Benchmark(GSub)))
task.wait(1)
warn(string.format("Byte - Elapsed time: %.4f seconds", Benchmark(Byte)))
Very interesting, I tried a few things and I still get the same results as you. I am still going to use gsub for what I am doing though because it’s working faster.