It’d be nice to let the meta team know of this issue as well to fix the root issue.
This isn’t exactly something I would classify as a bug. There are a million and one ways to bypass the character requirement and probably dozens of invisible unicode characters that can be used to achieve the same. Likewise angle brackets can hide text and someone determined will just fluff up their word count manually.
The requirement message exists as a reminder and warning, not as a solution to the people who will disregard it.
Blocking the use of all Unicode characters would fix the use of Unicodes to bypass the limit. The limit is there for a reason and that is to ensure that discussions are meaningful and contain some content worthy of being on the forum. I feel as though it shouldn’t just be a warning but rather be a requirement to ensure meaningful discussions which brings me to the point of this topic: Disabling the use of Unicodes in posts would help with maintaining meaningful topics on every Discourse related forum.
It doesn’t matter if people that spam may not know about the Unicode(s), it is still a issue that needs to be solved to maintain purposeful discussions. I am also pretty sure at least ten people that spam have heard or know about the Unicodes(s). I am just worried that the characters will be abused before the DF team has a chance to block it.
It might be possible to block ranges of unicode but it’s also cumbersome. If you limit the forum to just ASCII you end up with problems when communicating in languages other than English. Most accents and all non-English letters fall outside of the ASCII character range, and you would have to find and whitelist all the relevant ranges.
I think you misunderstand.
U+#### is just the notation for a character’s ID. For example,
ñ (Spanish) is
I still believe blocking a wide range of relevant unicodes will help with maintaining good discussions even if it is a hassle. A solution to this would be to block all unicodes in a certain range of ids as I am pretty sure language unicodes are split into another range of ids. Correct me if I am wrong.
The effort to create the the plugin/list of unicode characters isn’t necessary. There aren’t a lot of cases where users bypass the char limit and posts that evidently bypass the limit are moderated briefly.
It’s easier to use HTML comments to get past character limit if you need to.
<!-- this stuff counts but doesn't show -->
That one can’t really be fixed by denying certain characters and there are a few more ways to bypass it, so this should probably be brought up to Discourse (if you care enough, I don’t see people abuse this often).
Keep in mind there is a formal rule against bypassing the limit, so if you see a post that does that just flag it for spam.
Blocking all unicode characters? That’s a bit too much and would never be possible because every post on the DevForum uses a unicode character. For example U+0156 maps to the £ symbol (if you want to test to check this, type the Unicode code into WordPad and press Alt + X).
If you still don’t believe every character you type is Unicode then press Windows key + R and type ‘charmap’. Each font has a set of different unicode characters and if you click on for example the letter ‘a’ you can see that this is a Unicode character.
Updated my topic as most people didn’t see the discussion between me and Autterfly. Blocking the use of ranges of uni codes would most likely solve this problem.
I do know you can flag posts but it is incredibly hard to tell if they are bypassing it using invisible unicode characters.
You can get the raw contents of a post to see if there is any empty spoilers, html tags/comments, etc. You can just count it yourself too
I still think that is a bit of work in order to see those hidden characters. Plus it is a waste of time to do that on every post that I think could potentially use invisible unicodes/html comments to bypass the limit.
It’s really not a big deal, although it can be a bit frustrating. Just flag these posts for spam. It’s within the rules. Not every post’s character count is <30 and the chances of you finding these kinds of posts are quite small actually.
It still is a problem that should be fixed now in order to solve further issues in the future. While I know I shouldn’t worry about it being abused as of right now, it is better to fix this way to bypass the limit and reduce the amount of ways to actually get around that limit. The limit is there for a reason and reducing the amount of ways to get around it would then cause a decrease in spam posts.
Not many people are informed about this, so this really is a non-problem as the amount of people that actually is low. Low enough that flagging works just fine. It hasn’t been too big of an issue, and it could cause more headaches than benefits. For instance what if someone has a malicious plugin, they find the source, paste it here, asking about it. Some malicious scripts use special characters to hide pieces of the source, but then the forum wouldn’t let them post because the source has forbidden characters.
You could blacklist them but who cares if someone makes a short response, the people that go out of their way to do it probably are enforcing that anything more is unnecessary. More is less.
I wouldn’t even bother checking for invisible padding characters to be honest. Usually if someone’s trying to pad out a short reply, it’s a low quality reply anyway. Alternatively, a great quality short reply wouldn’t warrant flagging.
Ultimately it comes down to whether you judge the content of the post to be substantial or spam, so checking for some arbitrary character count is a waste of time.