How to detect if a character/byte is UTF-8?

I created a bug report below stating that the string library is bugged for UTF-8:

As we know, Roblox takes AGES to fix a bug, so I’ll have to create my own functions to replace string.sub and string.find, even if it compromises performance.
But I don’t know how to detect if a byte/character is UTF-8 or not.

Example:

  > print(utf8.codepoint ('à'))
  224
  > print(utf8.codepoint ('a'))
  97

Edit

It seems that if the result of utf8.char > 127 it can be considered as UTF-8…

for a = 1, 256 do 
    print(a, utf8.char(a)) 
end

Can anyone help?

Here’s how:

local a = '1à3ç5'
for first, last in utf8.graphemes(a) do
	local grapheme = string.sub(a, first, last)
	print(first, last, '=', grapheme, first ~= last and 'UTF-8' or '')
end

1 1 = 1
2 3 = à UTF-8
4 4 = 3
5 6 = ç UTF-8
7 7 = 5

1 Like