Utf8.charpattern uses an invalid escape sequence and is incorrect

The page on the utf8 Lua library has a representation of utf8.charpattern. It looks like this:

There are three major problems with this that prevent it from being accurate:
1. It uses an escape format that isn’t supported in the version of Lua that Roblox uses Roblox now supports hex escapes
2. It lacks an asterisk at the end of the pattern
3. It evidently is missing a character which makes it confusing and generally misleading (x01)

After doing some automation to convert the utf8.charpattern value that’s actually in-game, this string is the equivalent: [%z\1-\127\194-\244][\128-\191]*

This should ideally replace the string on the wiki, even though it’s not great to look at, since it’s factually correct and the one on the devhub is not (seen below).

image

5 Likes

The documentation for utf8.charpattern is very wrong.
It is documented to be [%z-\x7F\xC2-\xF4][\x80-\xBF],
but is actually [%z\x01-\x7F\xC2-\xF4][\x80-\xBF]*.

utf8 library documentation

2 Likes

I’ve updated the pattern to match what was shown in this thread after verifying it in Studio. Thank-you both for bringing this to our attention!

Recently \0 was made to work in string patterns, and it looks like utf8.charpattern was updated due to this change and it’s now [\x00-\x7F\xC2-\xF4][\x80-\xBF]*
image

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.