How to filter cursed text?

Hey there!

A game of mine requires text responses.
There are three filters: a two manual and one Roblox filter, but neither can catch text L̴̨̧͙̯̫̗̹̝͈͓͖̦͗̏̆̏̈́̄̇̄͜ͅI̶̛̗͚͇͇̗̭̳͔͇͋͐̾̽̿͌̃̆͘̚͜͝͝K̵̢͍̗̖̯̱̜̽̑́̽̃́͋̅͝Ë̷̛̲͌̆̎́͐͒͊̓̚͘͠ ̶̖̜̩̣͍͐͂̂̌̃̏͑́̆̉̿̿̈́Ṱ̵̨̧̣͔̓̍̍̃̌̈́̽̕H̸̡̧̻̪̙̠̹͈̞̙͊̑̏̽͗͆̆̒̆̌̈́̑̈́͘Ȋ̵̜̲͖͕͇͍͔̥̱͚͓̩́̄͑̂̃̑̇͊͊̋͠͝͝S̶̢̨͙͇̦͎͎̼̋̉̓̉̓̅̍̑̐͘̚͝ͅ.

Is there away to filter cursed text? Thanks :slight_smile:

3 Likes

You could use string.byte to first convert the character to a byte number and then use string.char to convert it back to a string. Repeat this on all of the characters of the string and it should be cleaned up.

> print(string.char(string.byte("L̴̨̧͙̯̫̗̹̝͈͓͖̦͗̏̆̏̈́̄̇̄͜ͅ")))
-- outputs L

The thing is, how are you supposed to tell what “one” character is? “L̴̨̧͙̯̫̗̹̝͈͓͖̦͗̏̆̏̈́̄̇̄͜ͅ” is technically 47 characters if you check its length, so you can’t use that. I’d just stick with Roblox’s filter, since it’s Roblox’s problem if it misses something.

3 Likes

I understood this was just to remove this type of text so simplifying it like this should be enough. It doesn’t have to come out as the “correct” character since its the person’s problem who is using them.

But how are you supposed to tell whether something’s "L̴̨̧͙̯̫̗̹̝͈͓͖̦͗̏̆̏̈́̄̇̄͜ͅ” or a genuine 47 letter string? Something like string.char(string.byte("LOL")) will also return “L”.

2 Likes

I wouldn’t. I’d just filter every character with the above method. Going 1 by 1 each character of the string

Yes, but “every character” will include the zalgo, which will get you right back where you started. The L is a separate character from the zalgo, they just stick it on top of the regular letter.

2 Likes

You’ll want to go through each letter of the string and translate it to normal characters, then moderate the text you get from that.

local StringToModerate = -- string here
local newString = ''
for i = 1, string.len(text) do
newString = newString..string.char(string.byte(string.sub(StringToModerate, i, i)))
end
moderate(newString) -- This function might, for example, send the text through ROBLOX's default moderation, or go through a white/blacklist of allowed words.
2 Likes

Ah now I see what you mean (had to write a test on studio). Let me see if I can work anything else out

You could just drop all the non-ascii characters and run the result through the chat filter.

I would really recommend against this, as you would cut out a massive amount of your international community. Roblox has been striving for localization, and I don’t think they’d be too happy with you censoring people for using another alphabet. Just stick with the default Roblox filter and report bugs in the filter if you find them.

3 Likes

The only way to truly filter out this without removing other Unicode characters (literally any character that isn’t typically in English, including emoji!) is to filter out any character from these blocks:

Block Name
U+0300 - U+036F Combining Diacritical Marks
U+1AB0 - U+1AFF Combining Diacritical Marks Extended
U+1DC0 - U+1DFF Combining Diacritical Marks Suppliment
U+20D0 - U+20FF Combining Diacritical Marks for Symbols
U+FE20 - U+FE2F Combining Half Marks

Here’s some code to do that. Call FilterString to filter out these characters.

local FILTERED_BLOCKS = {
	{ 0x0300, 0x036F },
	{ 0x1AB0, 0x1AFF },
	{ 0x1DC0, 0x1DFF },
	{ 0x20D0, 0x20FF },
	{ 0xFE20, 0xFE2F }
}

local function IsCodepointFiltered(codepoint)
	for _, block in pairs(FILTERED_BLOCKS) do
		if codepoint >= block[1] and codepoint <= block[2] then
			return true
		end
	end
	
	return false
end

local function FilterString(stringToFilter)
	local filteredString = ""

	for _, codepoint in utf8.codes(stringToFilter) do
		if not IsCodepointFiltered(codepoint) then
			filteredString = filteredString .. utf8.char(codepoint)
		end
	end
	
	return filteredString
end

print(FilterString("Ṯ͎̱̜̠̯̄̎̒̈̀ͫḛ̳̹̀ͭ̇̋̉s̪ͬ̔̓͗t̎̐̇ͩͥ̽̀͏͓̠̱̝̗͕͇")) --> Test
8 Likes

That code example does… literally nothing? The reason string.char(string.byte("L̴̨̧͙̯̫̗̹̝͈͓͖̦͗̏̆̏̈́̄̇̄͜ͅI̶̛̗͚͇͇̗̭̳͔͇͋͐̾̽̿͌̃̆͘̚͜͝͝K̵̢͍̗̖̯̱̜̽̑́̽̃́͋̅͝Ë̷̛̲͌̆̎́͐͒͊̓̚͘͠ ̶̖̜̩̣͍͐͂̂̌̃̏͑́̆̉̿̿̈́Ṱ̵̨̧̣͔̓̍̍̃̌̈́̽̕H̸̡̧̻̪̙̠̹͈̞̙͊̑̏̽͗͆̆̒̆̌̈́̑̈́͘Ȋ̵̜̲͖͕͇͍͔̥̱͚͓̩́̄͑̂̃̑̇͊͊̋͠͝͝S̶̢̨͙͇̦͎͎̼̋̉̓̉̓̅̍̑̐͘̚͝ͅ.")) returns “L” is because L is the first character, and string.byte will only return the first character in byte form. Each part of the zalgo is its own character, so converting it to and from byte form doesn’t change anything.

image

Reselim’s solution is the only actual solution here.

1 Like

You didn’t understand what I meant. Drop out any non-ascii characters and pass the result through the filter. If it passes, send the original message.

But yeah, @Reselim 's solution is much better as mine would still ignore any non-English curse words.

1 Like

That would probably be against the rules, since it wouldn’t censor something that uses non-ascii characters in an attempt to avoid the filter. Something like bαd ωord might not get caught by the filter if you just try to test “bd ord”.

2 Likes

Your solution works perfectly! Thanks so much!

1 Like

Well, you could filter both and check if any of them aren’t equal to the original.

Ok so I dont get flagged for spam or whatnot I used this website. How can you see it the more craziness it goes up? Also it be impossible to filter the text because there are so many levels of cursed text.

1 Like

But to be honest I think this has a possible solution. Possible but hard. Find each symbol (aka alt + numberpad) from the cursed text then make the system recognize it as inappropriate text.

Unless if roblox suddenly turns on an option saying “No symbols other than symbols on your keyboard will be allowed” but that might be a little hard because there are foreign languages and with those languages comes with different keyboards.

https://www.google.com/search?q=foreign+keyboard&rlz=1C1CHBF_enUS893US893&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiDiuyKrNToAhWOZd8KHcR9BCMQ_AUoAXoECA0QAw&biw=1366&bih=657#imgrc=g2Wn91UFxnS24M Example of one I found on google

That is the purpose of the for i = 1, string.len(text) loop. There are other loopholes around the extra characters problem, but I’d have to look into it further.