Invisible Unicode characters can be used to bypass the character limit

Iifebit · December 27, 2020, 8:59am

Recently I have been messing with “transparent” unicode character such as the Unicode U+3164 Hangul Filler and I soon realized that it could be abused on the forum to bypass the limit. Every Unicode in a post still counts in the character count. As you might already know many people (I am not saying names) use something like “30” or “charsssss” to bypass the limit. It is incredibly easy for me to recognize the fact that they are definitely bypassing the limit. This is however not the case when a post using the Unicodes to bypass the limit and essential hide the extension to their word count.

I am aware that posts bypassing the limit rarely use the unicodes but in my time on the devforum I am 99% certain that at least one post bypassed the limit using the unicode and avoided their post being flagged.

Now finally onto my request. Blocking the use of all invisible unicodes within different ranges on the forums would be a huge help in preventing a leak to this issue and mass amount of posts using this method.

After doing more research on this issue, I came back with some other unicodes that act the same as the U+3164 Hangul Filler:

U+FFA0
U+1160

Reproduction:

Find a random topic, click reply and insert the following lines into your post:

Unicode ㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ

Click reply and it should not pop up with the error that there is a character limit.

If this is a discourse bug, tell me and I’ll post it there.

TwyPlasma · December 27, 2020, 9:04am

Seems like it is a bug. But this is most probably going to get taken down for teaching players how to break the rule

It is the same with <html>

Iifebit · December 27, 2020, 9:05am

That is why this needs to be fixed. I am pretty sure some people already know or have heard about that unicode (especially if you play Among Us and looked up how to have no username). It is also contains a html entity which means your observation and the Hangul filler may be connected.

Html Entity:

ㅤ
ㅤ

wevetments · December 27, 2020, 9:33am

The system that’s responsible for the minimum char count is handled by Discourse.

Iifebit · December 27, 2020, 9:54am

So is this technically a Discourse bug?

Automationeer · December 27, 2020, 9:55am

Most people that spam don’t have the context, knowledge, or understanding to do something like this, and it’s rarely seen.

@wevetments This could be filed as a bug with Discourse, but it could also be fixed by the DF Team through the word ban list

wevetments · December 27, 2020, 9:56am

It’d be nice to let the meta team know of this issue as well to fix the root issue.

Autterfly · December 27, 2020, 9:57am

This isn’t exactly something I would classify as a bug. There are a million and one ways to bypass the character requirement and probably dozens of invisible unicode characters that can be used to achieve the same. Likewise angle brackets can hide text and someone determined will just fluff up their word count manually.

The requirement message exists as a reminder and warning, not as a solution to the people who will disregard it.

Iifebit · December 27, 2020, 10:05am

Blocking the use of all Unicode characters would fix the use of Unicodes to bypass the limit. The limit is there for a reason and that is to ensure that discussions are meaningful and contain some content worthy of being on the forum. I feel as though it shouldn’t just be a warning but rather be a requirement to ensure meaningful discussions which brings me to the point of this topic: Disabling the use of Unicodes in posts would help with maintaining meaningful topics on every Discourse related forum.

Iifebit · December 27, 2020, 10:07am

It doesn’t matter if people that spam may not know about the Unicode(s), it is still a issue that needs to be solved to maintain purposeful discussions. I am also pretty sure at least ten people that spam have heard or know about the Unicodes(s). I am just worried that the characters will be abused before the DF team has a chance to block it.

Autterfly · December 27, 2020, 10:14am

It might be possible to block ranges of unicode but it’s also cumbersome. If you limit the forum to just ASCII you end up with problems when communicating in languages other than English. Most accents and all non-English letters fall outside of the ASCII character range, and you would have to find and whitelist all the relevant ranges.

Autterfly · December 27, 2020, 10:29am

I think you misunderstand. U+#### is just the notation for a character’s ID. For example, ñ (Spanish) is U+00F1.

Iifebit · December 27, 2020, 10:37am

I still believe blocking a wide range of relevant unicodes will help with maintaining good discussions even if it is a hassle. A solution to this would be to block all unicodes in a certain range of ids as I am pretty sure language unicodes are split into another range of ids. Correct me if I am wrong.

wevetments · December 27, 2020, 10:42am

The effort to create the the plugin/list of unicode characters isn’t necessary. There aren’t a lot of cases where users bypass the char limit and posts that evidently bypass the limit are moderated briefly.

buildthomas · December 27, 2020, 3:11pm

It’s easier to use HTML comments to get past character limit if you need to.

<!-- this stuff counts but doesn't show -->

That one can’t really be fixed by denying certain characters and there are a few more ways to bypass it, so this should probably be brought up to Discourse (if you care enough, I don’t see people abuse this often).

sjr04 · December 27, 2020, 7:16pm

Keep in mind there is a formal rule against bypassing the limit, so if you see a post that does that just flag it for spam.

GamersInternational · December 27, 2020, 10:27pm

Blocking all unicode characters? That’s a bit too much and would never be possible because every post on the DevForum uses a unicode character. For example U+0156 maps to the £ symbol (if you want to test to check this, type the Unicode code into WordPad and press Alt + X).

If you still don’t believe every character you type is Unicode then press Windows key + R and type ‘charmap’. Each font has a set of different unicode characters and if you click on for example the letter ‘a’ you can see that this is a Unicode character.

Iifebit · December 27, 2020, 10:50pm

Updated my topic as most people didn’t see the discussion between me and Autterfly. Blocking the use of ranges of uni codes would most likely solve this problem.

Iifebit · December 27, 2020, 10:51pm

I do know you can flag posts but it is incredibly hard to tell if they are bypassing it using invisible unicode characters.

sjr04 · December 27, 2020, 10:53pm

You can get the raw contents of a post to see if there is any empty spoilers, html tags/comments, etc. You can just count it yourself too

https://devforum.roblox.com/raw/946728/24?u=incapaz