How to improve my Python string.strip function

So I wrote a strip function like python, where it removes leading and trailing whitespaces, as well as the optional chars argument.

function string_ext.strip(s, chars) -- Strips leading and trailing whitespace, as well as characters specified by the chars string
	return (s:gsub("^%s+", ""):gsub("%s+$", ""):gsub(chars and string.format("[%s]", chars) or "", ""))
end

It works perfect, but I don’t like how I have chained gsub calls. I feel like I could do it in one call, but am unsure of a pattern to use.

EDIT: Warning this code is incorrect, use @General_Scripter's fix instead! 
print(string.gsub("  Hello World  ","^%s*(.*)%s*$", "%1"))

https://www.fhug.org.uk/wiki/wiki/doku.php?id=plugins:understanding_lua_patterns

As described on this page, you can make captures inside your pattern. I’m sure you can work out how to remove trailing provided chars in the same gsub call.

Note that if you chain them in order, selected characters on the edge might leave trailing whitespace after their removal. Also, the current implementation seems to remove all provided chars and not just trailing ones. Nice work though, this function makes a nice addition to anyones library.

Good luck!
Nonaz_jr

For completeness, I’d like to point out that this pattern won’t actually remove trailing whitespace characters. The actual pattern needed for this is "^%s*(.-)%s*$" - very similar, but capturing as few characters as possible up to trailing whitespace or the end of the string.

Furthermore, I’d like to point out that the OP’s code does not function the same way as Python’s strip function. This is because it will remove the characters (specificed in the second parameter) from anywhere in the string, unlike the Python implementation that just removes leading and trailing characters.

To conclude, instead of using %s as the leading and trailing pattern to be removed, we can just use a set to capture whitespace and the characters using string.format("[%%s%s]", chars or "") to combine them as a set.

This results in the function:

function string_ext.strip(s, chars)
	local removePattern = string.format("[%%s%s]", chars or "")
	return (string.gsub(s, "^"..removePattern.."*(.-)"..removePattern.."*$", "%1"))
end

Technically, this still differs from Python’s implementation as string.gsub returns the number of times the pattern was found. In our case, this will always be 1. But there’s no need to even capture this second value when you call the function anyway.

3 Likes

For this you can just wrap the call in parentheses to force exactly 1 return value, but thanks

1 Like

I never even knew you could do that, thanks! I’ll just edit my post to reflect that. I know I replied to a reasonably old thread, but I was implementing the same function and just wanted to post my solution for completeness. :slight_smile:

1 Like

oh wow thanks for spotting my mistake there, this much later :slight_smile: Your solution is correct.

But I don’t understand what’s meant by forcing 1 return value, or the purpose of the parenthesis around the gsub call. In either way, isn’t a tuple returned (with the count still in the second return value)?

1 Like

It’s explained better on this PIL page than I can myself, specifically:

By wrapping the gsub in parentheses, it will only return the first value (the result string).

1 Like

Oooh yes I see, thanks a lot! Indeed it filters the extra return values this way. Cool!