Luau - String matching false positive

Have a look at this code.

local A = utf8.char(8729)
local B = utf8.char(9)
local C = utf8.char(8594)

print(A:match(`[{B}{C}]`))

These are obviously all different characters, so none of them should match. But when you run this the match function outputs a malformed utf8 character (which itself is also not part of either the input string or the match string) - character 226.

It looks like there’s some kind of confusion about multi-byte codepoints that luau isn’t handling properly here, but I have no idea what that might be. Looking at the binary representation of these strings, nothing is standing out as an obvious gotcha.

1 Like

This is expected behavior when using class sets as each byte in the character is matched for individually, rather than the whole character.

It’s matching the first byte as it’s the same for both strings. A lot of other strings will also result in the same behavior since string library doesn’t differentiate between single-byte and multi-byte characters.
If you want to make sure you only match valid characters, use utf8.charpattern.

1 Like