To put it simply, how does/can one make use of bytes in lua?
More specifically:
lua strings contain characters that are supposedly 1 byte each, meaning that a string of length 4 would make use of 4 bytes total for the characters of said string. Thus, we could utilize string.char(0 -> 255) to account for bytes 0000000 through 11111111. That being said, I have found things stating that the CONTROL characters are mostly all 6 bytes, and some other characters take up 2 bytes. My question then becomes rather simple, is this true?
If this is false: great, please let me know… this is truly fantastic.
If this is true: How can I store/edit/manipulate bytes without using the string primitive data type? There is no byte primitive data type in lua (that I could find), and there is no way to utilize 8-bit integers (-127 → 128), since 0b00000001 just converts to 1, 0b10000000 just converts to 128 in int32 format.
The only other method I can think of, is storing bytes in groups of 4 as int32, with each byte of the int32 representing a byte. That being said, this really doesn’t allow us to store bytes anywhere (i.e. a datastore), as it’s not as if we can just combine integers into 1 huge integer.
What is the best method, if there even is one, to work with bytes in lua?
It’s actually quite simple: Lua strings are interpreted on a per-byte level, meaning you can use the full range of 0->255, and a Lua string will store that data literally and allow you to read it back as individual bytes.
I think where the confusion comes from is that GUI elements such as TextLabels interpret those bytes as UTF-8 characters, which have a larger range and may in some cases span multiple bytes.
Edit: You also mentioned DataStores, which also require UTF-8 compliant bytes and may change some bytes to fit the standard. If you’re trying to store raw binary data in DataStores, I’d suggest Base64-encoding them before storing them.
strings are stored internally (in c) as char or byte arrays.
To add to @anon2793720’s answer, which is completely correct:
Datastores only accept byte ranges 0-127 currently in strings, but this may change soonish
You can use the new (undocumented in ROBLOX currently, but usable) methods string.pack, string.unpack, and string.packsize to convert c-like structures to byte-packed strings.
You can use the bit32 library to perform operations on numbers as if they were int32s.
(Edit) Also, the utf8 library helps with actually parsing strings if you want to, for example, print them “grapheme-by-grapheme” (graphemes are one or more bytes which represent a single character in unicode)
In Datastores, I assume that it’s encoded in something similar to JSON. That means " becomes \", \ becomes \\, and \0 (the null character, not the string) becomes \u0000. In purely Lua strings, it should just be the number of bytes equals the number of characters in the string, at least as far as I’m aware.
This is accurate, characters 0 → 127 are stored as 1 byte in utf8, and 128 → 255 are stored as 2 bytes.
Well… doesn’t utf8 just have to ruin our days. I see why Roblox did it, but it would be nice to have the ability to store strings such that each character (0 → 255) is one byte when stored in datastores… I was initially worried about the conversion of CONTROL characters into 6 bytes of data, but this just makes things a whole lot tougher. That being said, using Base64-encoding is a good alternative, as it allows us to have 3 bytes of data for every 4 bytes, which should be ~60% more efficient than not using it at all, given what we have to work with.
Well… that’s annoying.
Hopefully this works with datastores in the future, but for now it does not… which is sad. (at least, since I last checked this does not work with datastores, feel free to correct me!)