Advanced String Manipulation Help

I’m trying to parse a very long string and divide it by number of words into a table. The script I’ve managed to cobble together so far mostly works, but I am getting a few odd duplicated letters. This is weird, because 99% of all the newly created strings match the original string, but there are a few randomly scattered duplicated letters that should not be there. If anyone can spot the cause of this error, let me know.

String = "Apology By Plato Translated By Benjamin Jowett Socrates' Defense"
--This is how it enters the table later: "Apology By y Plato Translated by Benjamin Jowett Socrates' ' Defense"
local TotalWordCount = (#String:gsub("%S+", ""))
local PageWordCount = 250
local Pages = {}

for i = 1, TotalWordCount/PageWordCount do --Iterate for each page to be created
	
	local WordTable = {}
	
	for i = 1, PageWordCount do --Iterate for each word on a page
		local Word = String:gmatch("%S+")
		table.insert(WordTable, Word())
		String = String:sub(string.len(WordTable[i]) + 2)
	end
	
	NewString = table.concat(WordTable, " ")
	table.insert(Pages, NewString)
	
end
1 Like

This counts the number of spaces, not the number of words. Try getting the second argument gsub returns, which is the number of pattern matches that it replaced.

local _, TotalWordCount = String:gsub("%S+", "")
3 Likes

Thanks, that will definitely help with getting a more accurate word count, but it didn’t stop the main issue I’m having. I imagine it has something to do with how I’m changing the substring, maybe this:

Couldn’t you just use local split = String:Split(" ")? It separates every space in the string and puts it in a table.

2 Likes

That worked, BUT, it freezes the game for over 10 seconds. My full string is literally book length, and for some reason gmatch works instantaneously with no freezing. So, the issue must be the string pattern I’m using.

I think by using split, I’m creating a table of 11,000 words, 250 times, for about 50 pages that it has to create. Gmatch allows me to take one word off the string at a time.

String = "Apology By Plato Translated By Benjamin Jowett Socrates' Defense"

local Word = string.gmatch(String, "%w+")
for w in Word do
    print(w)
end

This prints every word, no weird double characters. It also does it one word at a time, meaning you can your page thing and put a wait() between every e.g. 100 words to make sure your game doesn’t freeze.

2 Likes

The issue with %w is that it doesn’t include punctuation. I’m trying to figure out char/class sets right now, but essentially I need to combine %w, %p, and it would be nice if I could preserve the \n or \r line breaks somehow. I think those are causing the issue, because the string actually looks like this:

String = [[Apology

By Plato

Translated By Benjamin Jowett

Socrates' Defense]]

Using the * modifier instead of the + modifier also seems to fix the issue, but I’d like to preserve the line breaks if possible.

1 Like

Here’s the char set that makes it work:

Word = String:gmatch("[%w%p\n]*")
1 Like