Getting the length of strings with emojis?

The Problem

Hello! First topic! Yay! I’m having trouble getting the number of characters in a string accurately. I have scoured the internet and the DevForum but there’s a zillion different articles and forums for all sorts of apps and programming languages, and I couldn’t find a real topic about it here.

What I’m trying to do is get the length of a string and check if it is a single character long. I’m doing this for a system where a user can display one letter/symbol/emoji/character of their choice on a flag (filtered with Roblox just in case). However, not all symbols are one character long; they have multiple bytes in them!

For example, if I print the length of the letter e:

print(#'e')

It will output 1 as it should since there is a single letter e. However, if I try printing the length of the zany emoji :crazy_face::

print(#'🤪')

It will output 4 even though there is clearly one emoji there. I also can’t just check if the length is less than 5 characters because then someone can get away with ‘Ayyo’ on their flag since each of those letters is 1 byte each. The user should only be able to put a single character, not 4 symbols!

To add to the problem, emojis like :exclamation: are 3 bytes in length, and I couldn’t find an example of one but there are likely also some emojis that are 2 bytes long! I’m kind of stumped on how to solve this one. I’ve figured out many complicated things before, and yet this seems to be such a simple problem with a lack of a simple solution.

Things That Didn't Work/Give What I Want

Using string.len() instead of the length (#) operator

print(string.len('🤪')) -- 4

Notes: This does not output any differently than just doing print(#'🤪').

Using string.split() and getting the length of the returned table

print(#string.split('🤪','')) -- 4

Notes: string.split() does not respect how the bytes make different characters, and simply splits them apart. Once it does this and tries to display them, it simply gives you the unknown character thingy that looks like <?> as those bytes aren’t meant to make anything by themselves it seems.

HttpService’s JSON functions

local Http = game:GetService("HttpService")
print(#Http:JSONDecode(Http:JSONEncode('🤪'))) -- 4
print(#Http:JSONDecode(Http:JSONEncode({'🤪'}))[1]) -- Also 4

Notes: JSON keeps it just as it was, which is probably how it’s meant to work anyway.

string.byte()

local str = '🤪'
print(#{string.byte(str,1,#str)}) -- 4
print(#{string.byte(str,1,9999)}) -- Also 4

Notes: string.byte() seems to be getting the individual bytes of a symbol, which really isn’t all that helpful as that’s not what want to do, but instead we want to get the individual characters.

Using double quotes

print(#"🤪") -- 4

Notes: There’s obviously no difference lol

Tell me your solutions!

If you have a good solution, please provide it and also explain how it works (if you know) so that I can register it in my mind and use it in different projects. I would preferably want the shortest solution possible, but if you can't shorten it please send it anyway. Thanks! I hope I did good for my first topic.

Use utf8.len.

print(utf8.len("🤪"))

correctly prints 1.

1 Like

Wow, that was amazingly fast! Thank you. For some reason I didn’t know that entire library of functions even existed! I’ll definitely use that. Have a great rest of your day!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.