[1.0.0] UTF8String - Represent strings in utf8

Have you ever wanted to represent strings in UTF8 instead of ANSI? UTF8String might be for you!

Where to get it?

You can get it from here: UTF8String.rbxm (65.2 KiB)

Source: String representation in UTF8 for Roblox Lua. · GitHub

How to create one

Simple, just run UTF8String.new()

local UTF8String = require(/Path/to/UTF8String)
local u = UTF8String.new

or you can create UTF8String via u"" or u''

UTF8String constructors

UTF8String UTF8String.new(object value)
Creates a UTF8String

UTF8String UTF8String.fromCharCode(number charcode)
Creatse a UTF8String based off a character code

boolean UTF8String.isUTF8String(object value)
Returns true if the value is a UTF8String, otherwise returns false

Methods

UTF8String UTF8String:Substring(number start, number length)
Gets the parts of a string starting at a specific position with a specific length.

print(u"Hello world!":Substring(7, 5)) --> world

UTF8String UTF8String:Slice(number startindex, number endindex)
Gets the parts of a string starting at a specific position with a specific length.

print(u"Hello world!":Substring(7, s)) --> world

UTF8String UTF8String:ToUpper()
UTF8String UTF8String:ToLower()
UTF8String UTF8String:ToTitle()
Converts it to uppercase/lowercase/titlecase.

UTF8String UTF8String:Format(object …)
Converts the value based of the format specified (this is zero-based index)

local name = "John Smith"
print(u"Hello {0}!":Format(name)) --> Hello John Smith!
print(u"{0} has {1} lives left":Format('Joe', 2)) --> Joe has 2 lives left

UTFString UTF8String:Replace(UTF8String oldvalue, UTF8String newvalue, number count)
Replace the old string value by the new string value, the count argument is how many occurance you want to replace, defaulting to all.

print(u'Hello World :)':Replace(u'World', u'Roblox')) --> Hello Roblox!

table UTF8String:Split(UTF8String separator)
Splits the UTF8String based on the defined separator defaulting to u''

print(u'Hello|World|Apple|Pears|Oranges':Split()) --> {"Hello", "World", "Apple", "Pears", "Oranges"}

UTF8String UTF8String:Reverse()
Reverses the UTF8String

boolean UTF8String:IsAlpha()
Returns true if all characters are Latin Alphabets (A-Z), otherwise returns false (case insensitive)

boolean UTF8String:IsNumeric()
Returns true if all characters are Western Arabic Numerals (0-9), otherwise returns false

boolean UTF8String:IsAlphanumeric()
Returns true if all characters are either Latin Alphabets (A-Z) or Western Arabic Numerals (0-9), otherwise returns false

boolean UTF8String:IsIdentifier()
Returns true if the UTF8String is considered a valid identifier (only contains alphanumeric characters (A-Z, a-z and 0-9) or an underscore (_) and must not begin with a digit (0-9))

boolean UTF8String:IsSpace()
Returns true is all characters are whitespaces (empty strings don’t count), otherwise returns false

number UTF8String:IndexOf(UTF8String value, number fromIndex)
Returns the position of the first occurrence of the specified value in a UTF8String. Returns -1 if that specified value cannot be found

number UTF8String:LastIndexOf(UTF8String value, number fromIndex)
Returns the position of the first occurrence of the specified value in a UTF8String. Returns -1 if that specified value cannot be found

boolean UTF8String:StartsWith(UTF8String value)
boolean UTF8String:EndsWith(UTF8String value)
Determine whether the string starts/endss with the specified value.

number UTF8String:CharCodeAt(number index)
Gets the character code at the index of the UTF8String

table UTF8String:ToCharArray()
Converts the UTF8String to character array

UTF8String UTF8String:Trim(table chars)
UTF8String UTF8String:TrimStart(table chars)
UTF8String UTF8String:TrimEnd(table chars)
Remove all the UTF8String characters from the start and/or the end

lua string tostring(UTF8String)
Explicitly converts UTF8String to lua string

International moudle required
UTF8String UTF8String:LocaleFormat(object …)
Converts the value based of the format specified, but locale aware

print(u"I have {0,en-US,style=currency,currency=USD,currencyDisplay=name}":LocaleFormat(2000)) --> I have 2,000.00 US dollars
print(u"This is {0,en-US,style=unit,unit=meter,useGrouping=min2} to {1,en-US,style=unit,unit=meter,useGrouping=min2} wide and was made on {2,dateStyle=long}":LocaleFormat(9500, 10500, {year = 2012, month = 3, day = 4})) --> This is 9500 m to 10,500 m wide and was made on March 4, 2012

UTF8String UTF8String:ToLocaleUpper(Locale locale)
UTF8String UTF8String:ToLocaleLower(Locale locale)
Converts it to uppercase/lowercase depending on the locale.

print(u'istanbul':toLocaleUpper('en')) --> ISTANBUL
print(u'istanbul':toLocaleUpper('tr-TR')) --> İSTANBUL

iterator UTF8String:BreakIterator(Locale locale, table options)
(Undocumented)

6 Likes

Is the purpose of this library to produce some kind of nostalgia for Python/C# or what?? I too would like to know use cases for this.

As well as why formatting is 0-based as supposed to 1-based when this is Luau

Except there is already a utf8 library built into Rōblox Lua. I would recommend having the module return a function that runs something along the lines of:

getfenv().utf8 = UTF8_LIBRARY

Then, in the calling code, write:

require(/Path/to/UTF8String)()

I don’t recommend this. I did this once and I regret it.

Using getfenv() in your code disables some of Luau’s optimizations and injecting variables from modules kinda defeats the whole purpose of modularity.

This may also break scripts that use the native utf8 library.

You don’t inject globals into a script for the same reason you don’t use _G or shared

1 Like

Yep, it’s based around Python’s string, I just prefer string defaulting to utf8. There’s no functions for utf8.sub, utf8.upper, utf8.lower, utf8.reverse, and find the utf8 library lacking, etc, so I made this. This is for people who deal with UTF8 text a lot like me.
I should’ve created extended functions for utf8 (like utf8.lower, utf8.upper, etc) and kept using strings oh well :frowning:

It had always been that way for my previous modules (e.g. CLDRTools), you just didn’t notice it. It’s just a habit I cannot seem to break.

1 Like