It might be possible to block ranges of unicode but it’s also cumbersome. If you limit the forum to just ASCII you end up with problems when communicating in languages other than English. Most accents and all non-English letters fall outside of the ASCII character range, and you would have to find and whitelist all the relevant ranges.
I think you misunderstand.
U+#### is just the notation for a character’s ID. For example,
ñ (Spanish) is
I still believe blocking a wide range of relevant unicodes will help with maintaining good discussions even if it is a hassle. A solution to this would be to block all unicodes in a certain range of ids as I am pretty sure language unicodes are split into another range of ids. Correct me if I am wrong.
The effort to create the the plugin/list of unicode characters isn’t necessary. There aren’t a lot of cases where users bypass the char limit and posts that evidently bypass the limit are moderated briefly.
It’s easier to use HTML comments to get past character limit if you need to.
<!-- this stuff counts but doesn't show -->
That one can’t really be fixed by denying certain characters and there are a few more ways to bypass it, so this should probably be brought up to Discourse (if you care enough, I don’t see people abuse this often).
Keep in mind there is a formal rule against bypassing the limit, so if you see a post that does that just flag it for spam.
Blocking all unicode characters? That’s a bit too much and would never be possible because every post on the DevForum uses a unicode character. For example U+0156 maps to the £ symbol (if you want to test to check this, type the Unicode code into WordPad and press Alt + X).
If you still don’t believe every character you type is Unicode then press Windows key + R and type ‘charmap’. Each font has a set of different unicode characters and if you click on for example the letter ‘a’ you can see that this is a Unicode character.
Updated my topic as most people didn’t see the discussion between me and Autterfly. Blocking the use of ranges of uni codes would most likely solve this problem.
I do know you can flag posts but it is incredibly hard to tell if they are bypassing it using invisible unicode characters.
You can get the raw contents of a post to see if there is any empty spoilers, html tags/comments, etc. You can just count it yourself too
I still think that is a bit of work in order to see those hidden characters. Plus it is a waste of time to do that on every post that I think could potentially use invisible unicodes/html comments to bypass the limit.
It’s really not a big deal, although it can be a bit frustrating. Just flag these posts for spam. It’s within the rules. Not every post’s character count is <30 and the chances of you finding these kinds of posts are quite small actually.
It still is a problem that should be fixed now in order to solve further issues in the future. While I know I shouldn’t worry about it being abused as of right now, it is better to fix this way to bypass the limit and reduce the amount of ways to actually get around that limit. The limit is there for a reason and reducing the amount of ways to get around it would then cause a decrease in spam posts.
Not many people are informed about this, so this really is a non-problem as the amount of people that actually is low. Low enough that flagging works just fine. It hasn’t been too big of an issue, and it could cause more headaches than benefits. For instance what if someone has a malicious plugin, they find the source, paste it here, asking about it. Some malicious scripts use special characters to hide pieces of the source, but then the forum wouldn’t let them post because the source has forbidden characters.
You could blacklist them but who cares if someone makes a short response, the people that go out of their way to do it probably are enforcing that anything more is unnecessary. More is less.
I wouldn’t even bother checking for invisible padding characters to be honest. Usually if someone’s trying to pad out a short reply, it’s a low quality reply anyway. Alternatively, a great quality short reply wouldn’t warrant flagging.
Ultimately it comes down to whether you judge the content of the post to be substantial or spam, so checking for some arbitrary character count is a waste of time.
This is still happening…
P.S, This says 2 years, it’s past 1 year.
This bug has basically no impact, so it makes sense it’s not prioritized.
If you see anyone bypassing the character limit, just flag it.
I’ve seen some people get past this limit without anyone flagging them, some users trying to exploit the limit WITHOUT unicode.
I don’t get it, why is this a problem? Like I personally use the
element to be able to send posts, if a post is spam then it will be flagged with or without this feature being turned off.
This allows me to do something like:
Can you send me the error?<p><p><p><p>
If it is spam then why don’t they just do something like?