string.find returns a byte offset into the string, and the character á is encoded with multiple bytes. This is consistent with the other string APIs like string.sub, etc. The whole string library actually doesn’t know anything about Unicode, it treats strings like they’re an array of bytes.
To be more specific, the letter a is equivalent to string.char(0x61), while á is equivalent to string.char(0xc3, 0xa1).
There is a utf8 library that provides functions that work with Unicode in mind.
For example, you can use utf8.len(), which will return the number of codepoints in the string instead of the number of bytes.
utf8.graphemes can be used to break the string down on visual grouping boundaries so that you can perform string.sub and other operations (basically tells you what index each visual character in the string begins and ends at).
utf8.nfcnormalize can make sure combining characters are represented in combined form so that string.find("a")will not find à (But string.find("b") would still find ̀b because there is no combined form of b-with-accent-grave since that’s not a standard character).
Basically, there’s multiple ways to form the character. A basic “a” character + a combining accent character, or a single “a+accent” character. NFC(ombined) / NFD normalization converts as many of the parts of the string to combined / uncombined form respectively as possible.
Combined form is more useful if you want find operations and what not to work naturally, because there is no basic “a” character anywhere in the string anymore.
Split form is more useful if you want to do something like strip out all of the accents to get plain ASCII text.
As for this bug, it is incomprehensible that this exists, since Roblox is multilingual and writing in languages that use accent is common.
I will be spending some hours of my work to overcome this bug.
Making the behavior of the string operations significantly different than standard Lua would cause a ton of issues for code portability. If you have suggestions for extending the utf8 library with additional functions that is possible though.