StringBuilder Library - Much Faster Concatenation

Hello! When I found out about string concatenation’s absurdly poor performance in loops (I estimated concatenation is about 555 times slower) I decided to make this library which makes the process of permanently concatenating a lot of things at once much simpler.

You can find the source code here: StringBuilder - Replit (Only 26 lines of code!) This allows you to seamlessly and permanently concatenate strings within a loop without really changing your code too much. You should keep in mind that the table version has slightly better performance and it has a better memory footprint so you’re better off using that in your projects.

Here are some examples of how you can use this:

local builder = StringBuilder.new("example", "123") -- Creates a string builder with "example 123" as it's text

builder = builder.."456" -- Directly concatenates the given items just like a string. Note that this changes the builder object itself, and does not return a new object!

builder = builder + "abc" + "def" -- Concatenates the given items to the builder separated by spaces. This also changes the actual builder object like above.

builder = tostring(builder) -- Converts the builder to a string. This is also done by print, but this allows you to get its string value.
print(builder) -- Prints "example 123456 abc def"

builder = builder("Stuff!", builder) -- Returns a new StringBuilder generated with the given arguments. Note: This only accepts strings as arguments!

Additionally, this post describes the performance issues I’ve come across, and has a bit of discussion on the topic: https://devforum.roblox.com/t/psa-string-concatenation-is-really-slow-use-table-concat/475583

13 Likes

The library is nice but the way you implemented the object is sub optimal.

You should be defining the metamethods outside of the constructor and using the self parameter to access the instance.

I suggest using a table for this, I’m not sure why userdata is necessary.

3 Likes

It’s not really necessary to be fair, I just did it because it feels nicer to me to use userdatas rather than tables. I do agree that it’s not the best overall, however personally I just don’t have an issue with it myself. I’ve went ahead and made a version of this which does use tables under stringbuilder_table.lua. This should address the issues you brought up.

Seems unnecessary to redefine something to itself like in the examples?

Pushing values into the string builder should bet methods instead of using operators

builder = builder.."abc"
-- vs
builder:append"abc"

Here would be my implementation:

local StringBuilder do
	local StringBuilderMetatable = {
		__tostring = function(self)
			return table.concat(self)
		end,
		__index = {
			append = function(self,str)
				self[#self+1] = tostring(str)
			end,
			appendAll = function(self,...)
				local len = #self
				for i=1,select("#",...) do
					self[len+i] = tostring((select(i,...)))
				end
			end,
			pop = function(self,amt)
				amt = amt or 1
				local len = #self
				for i=len,len-amt+1,-1 do
					self[i] = nil
				end
			end
		},
		__call = function(self,_,key)
			key = key and key+1 or 1
			local value = self[key]
			if value ~= nil then
				return key,value
			end
		end
	}
	function StringBuilder(...)
		return setmetatable({...},StringBuilderMetatable)
	end
end

Example usage:

local builder = StringBuilder("a","b","c")
print(builder) --> abc
builder:append"d"
print(builder) --> abcd
builder:appendAll("e","f")
print(builder) --> abcdef
builder:pop()
print(builder) --> abcde
builder:pop(2)
print(builder) --> abc
for pos,str in builder do -- iterate over everything
	print(pos,str)
end
--> 1 a
--> 2 b
--> 3 c

Also, why not just push everything into an array and call table.concat on that instead?

2 Likes

Here are some benchmarks created out of boredom:

(with steps=100000, numconcats=100)

Lua 5.1:
raw concat      6.094
table concat    6.077
stringbuilder   9.259

LuaJIT 2.0.3:
raw concat      4.988
table concat    0.587
stringbuilder   5.658

Luau (Roblox Lua):
raw concat 3.7896602153778
table concat 3.5889356136322
stringbuilder 5.6604096889496

(with steps=10000, numconcats=1000)

Lua 5.1:
raw concat      13.565
table concat    6.29
stringbuilder   9.548

LuaJIT 2.0.3:
raw concat      11.221
table concat    0.623
stringbuilder   5.711

Luau (Roblox Lua):
raw concat 31.376859903336
table concat 3.8676650524139
stringbuilder 6.0108301639557

Code here

table.concat is definitely more efficient when it comes to larger numbers of concatenations, but your implementation suffers with smaller amounts likely due to the use of metatables. It’s convenient though.

1 Like

I was thinking about including some extra methods to do this and I didn’t have time to do that until now. I was actually in the process of doing something like this right now. I like your implementation as well! It simplifies some of the issues with my implementation. I’ll have another go at writing this and see if I can get something a bit better as it’s currently a little flawed it looks like.

Additionally, I definitely want to work on this this time around @bunny_hop. It’s unfortunate that metatables seem to cause issues as well.

I’m not sure what the benefit here is compared to the regular way of building strings:

local strings = {"example", "123"}

table.insert(strings, "456")
table.insert(strings, "abc")
table.insert(strings, "def")

local str = table.concat(strings)
print(str)

This feels like a solution looking for a problem to be honest; it looks slightly nicer but it’s not like the previous solution was particularly unwieldy or ugly in the first place. Plus, no metatables, method calls or other overhead!

I’ve located the issue! Turns out tostring creates a new string which is the exact issue concatenation has, and I was calling it on concatenation (who would have guessed that tostring creates a string!? :scream:) I’ve additionally created a new branch titled stringbuilder_fast.lua. This branch is about 30-50ms faster than stringbuilder_table.lua on repl.it which is about 30-50ms faster than stringbuilder.lua. I’ve additionally implemented benchmark code which is compatible with Roblox lua under benchmark.lua.

@Elttob
My goal is to make the standard way you’ve shown into a clean wrapper.
I want to make implementing something in existing code seamless as well as easier to read. For example, with code that uses concatenation I want you to be able to simply replace str = "someInitialValue" with str = StringBuilder.new("someInitialValue")

Also here’s some benchmarks from repl.it:

Running benchmarks...Iterations: 10000 Concatenations: 100
    StringBuilder Fast           414ms
    StringBuilder Table          475.218ms
    StringBuilder                565.927ms
    Standard table.concat        327.733ms
    Raw concatenation            189.321ms
    Total       1                972.346ms
Iterations: 1000 Concatenations: 1000
    StringBuilder Fast           379.175ms
    StringBuilder Table          404.869ms
    StringBuilder                423.601ms
    Standard table.concat        314.819ms
    Raw concatenation            2278.725ms
    Total                        3801.343ms
Iterations: 100 Concatenations: 10000
    StringBuilder Fast           503.421ms
    StringBuilder Table          562.407ms
    StringBuilder                580.023ms
    Standard table.concat        345.257ms
    Raw concatenation            9619.466ms
    Total                        11610.747ms
Done!

Since this is wrapper code it does add a bunch of extra time unfortunately. It looks to be about 50-100ms faster to use table.concat by itself.

local StringBuilder = {}

local pack = table.pack or function(...)
	return {...}
end

do -- Methods
	function StringBuilder:concat(value)
		self.length = self.length + 1
		self.data[self.length] = value
	end
	function StringBuilder:append(value)
		self.length = self.length + 2
		self.data[self.length - 1] = " "
		self.data[self.length] = value
	end
	function StringBuilder:pop()
		self.length = self.length - 1
		return table.remove(self.length + 1)
	end
	function StringBuilder:compile(seperator)
		return table.concat(self.data, seperator)
	end
	function StringBuilder:clear()
		self.data = {}
		self.length = 0
	end
	StringBuilder.__tostring = StringBuilder.compile
end

-- Constructor
function StringBuilder.new(...)
	local builder = pack(table.concat(pack(...), " "))

	builder.length = 0
	builder.data = {}

	for index, value in pairs(StringBuilder) do
		builder[index] = value
	end

	return setmetatable(builder, string)
end

return StringBuilder

Is there a reason to explicitly store the length instead of using the length operator?

local pack = table.pack or function(...)
	return {...}
end

Those functions aren’t equivalent, and the n field is never used (which is set with table.pack), so you might just want to create a table with {...} inline instead?

do...end block does nothing

return table.remove(self.length + 1)

???

local builder = pack(table.concat(pack(...), " "))

Since you’re using the data field to store the array of strings, why are the arguments store in the builder itself?

return setmetatable(builder, string)

What is the point of setting the metatable to string?

Also whats the issue with tostring creating a string? Calling tostring with a string just returns it, but it’s useful for something like appending a string builder to another string builder. The issue with concatenation isn’t just that it creates a new string, but that it creates a new string at each step in iteration instead of just creating one string. (table.concat creates a string too)

Since this is the fast version shouldn’t it do some basic optimizations like only getting the length once in the methods?

Here would be a string builder without metatables for methods:

local StringBuilder do
	local StringBuilderMetatable = {
		__tostring = function(self)
			return table.concat(self)
		end
	}
	local function append(self,str)
		self[#self+1] = tostring(str)
	end
	local function appendAll(self,...)
		local len = #self
		for i=1,select("#",...) do
			self[len+i] = tostring((select(i,...)))
		end
	end
	local function pop(self,amt)
		amt = amt or 1
		local len = #self
		for i=len,len-amt+1,-1 do
			self[i] = nil
		end
	end
	function StringBuilder(...)
		return setmetatable({
			append=append,
			appendAll=appendAll,
			pop=pop,
			tostr=table.concat,
			...
		},StringBuilderMetatable)
	end
end

Explicitly storing length is faster. The length operator actually doesn’t cache. Additionally I’m aware that table.pack and the function I provided are not equivalent. The do end block is for organization. The table.remove call removes the last item of the array (I subtract 1 from the length immediately before so I add 1 back).

The line with builder = pack is leftover from when I was using the actual array (I wanted to implement faster a clear method so I switched to the data key). I meant to set the metatable to {__index = string} so you could use string methods, however this is kind of pointless. I am passing numbers so tostring would convert the number to a string, and additionally it seems like tostring creates a new string regardless. Additionally, the fast one is using some optimizations, but it isn’t full micro optimization.

No table is passed, so it errors.

String are interned, so it doesn’t always create a new string, and this is guaranteed when you pass a string to tostring, it just returns it.

One more thing:

The way you implemented table.pack forces it to only use the first argument when constructing array part of the table

local pack = table.pack or function(...)
	return {..., n=select("#", ...)}
end

image
Did you mean to put the ... after the n field?

local pack = table.pack or function(...)
	return {n=select("#",...),...}
end