Table of Contents
- Overview
- Desiderata
- Explanation
- Hypothetical Implementation
- Results
- Appendix - Further Readings and Suggestions
Overview
This article will (hopefully) be of significant value to anybody who has some large data that gets requested relatively infrequently. A sandbox tycoon would be a likely candidate for something like this, hence we will be using that as our example.
I think many people working on these sorts of games take the most intuitive, yet naive approach of saving the raw table of items. Many of the characters used to delimit the keys, values, or table constructors are superflous and have no reason whatsoever to remain in the datastore string. Hopefully, if this article has been effective in providing any sort of insight, youâll never consider something like that again for high volume, rarely requested data.
The data in question must be encoded/decoded infrequently, that is, once per session. This is because the encoding and decoding processes we will go over are relatively expensive (any, especially lossless, compression algorithm has an inherent space-time complexity tradeoff), however we are only interested in reducing the âfootprintâ of our sandbox and for the most part speed is inconsequential.
Desiderata
- Terse
- Preferably a single value
- Should not exceed the 260k character limit
- Little change to pre-existing code/minimal boilerplate code (although, as you will see later on, we will still need some quantitative value for each of our items- an âidâ)
- Not overly expensive
- Simple
Explanation
Consider the following item:
{
Name = "Sofa",
Position = {x = 0, y = 0, z = 0},
Rot = 0,
Misc = {
Color = {r = 255, g = 255, b = 255},
Material = Enum.MaterialType.Fabric.Value
}
}
The common approach to saving something like this is to append it to some items
table where an aggregate of all your items are kept for the sandbox. This is a fine way to represent your sandbox and its constitutents in game, but it is an incredibly verbose representation of our data to save in a datastore.
So what should you do?
Of course the implementation, and perhaps even choice of algorithm will vary along with the needs of your project but in this article we will be iterating through a long string of (base95) characters where items are seperated by some arbitrary delimiter, almost acting like a stream of bytes.
We can implement a base95 number system. Each item will be expressed as some variable width substring, with a character reserved as our delimiter (thereby restricting our number system by 1 radix).
You could, however, decide to go with fixed width codes but I personally think the lack of flexibility becomes more of a detriment than the potential savings of not including a delimiting character (in fact, the possibility of having multiple extraneous characters per item is far more concerning).
Having variable width codes also mean we can have a different number of metadata properties for each item type. For example, you likely wouldnât need any metadata for a tv item, but you might need to have multiple properties for a floor tile (color, material, reference to an item above it).
In essence:
- Once the player leaves the game, start the encoding process.
- Iterate through every item and extract only the essential data (the ones included in the general template string + any metadata specific to that item type).
- Have some âbase template stringâ which will act as a general template for all items in the sandbox, for example:
ttxxyzzrr
1- Where tt denotes item type (the âidâ we will need to set to each class of item)
- And assuming you are using cartesian coordinates: 2
- xx, x component of our position
- y, y component
- zz, z component
- rr, rotation (first r can be sign, second can be some value from 0-180?)
- Any quantitative data values should be converted to a character via
string.char()
. - Use the base template string as a base to code your item values, appending any relevant metadata.
- Append your item string to a larger string (this is our blob of items), and suffix a delimiter if youâre doing variable width encoding.
- Once the player joins, start the deserialize process which is essentially just a reverse of this.
Note:
1 Two characters in (printable) ascii can represent 95^2 distinct values, that is, we can represent an item with type, position and rotation using a minimum of 9 characters and have 0-9024 possible values for each of the aformentioned properties! I donât think youâll ever reach 9025 item types, neither (probably) will your sandbox span 9024x9024 in area. Of course if you did manage to include that many item classes in your game, you can always add on an extra character t
and extend your range to ~850k!
2 You may find using a grid for your coordinate system to be a better (and potentially more concise) alternative to keep in mind.
Hypothetical Implementation
Here is some very general (almost pseudo) code that may or may not work, but hopefully you get the gist (pun intended). I have omitted the serialize function and save boilerplate because they are almost exactly the same, except that it operates on a table and converts the values to integers via the property_processor functions and encodes further into base94, where itâll be concatenated as a string.
itemData.lua
local itemData = {}
itemData.templateString = "ttsssxyzr"
itemData.delimiter = string.char(32)
itemData.baseProperties = {
{"tt", "id"},
{"sss", "serial"} -- Unique identifier for this item
{"xyz", "position"},
{"r", "rotation"}
}
itemData.items = {
Chair = {
id = 0,
-- Order matters here, the metadata will be parsed in the order
-- by which these table elements are in.
-- So in this case, material denotes the first "metadata character",
-- and color is represented by the 2nd to 5th metadata characters.
metadata = {{property = "material", len = 1}, {property = "color", len = 4}}
},
Sofa = {
id = 1,
metadata = {{property = "color", len = 4}}
},
Bed = {
id = 1,
metadata = {{property = "color", len = 4}}
},
Floor = {
id = 1,
metadata = {{property = "material", len = 1}, {property = "color", len = 4}, {property = "aboveItemSerial", len = 3}}
}
}
dataCompression.lua
local base_char = 33
local util = {
-- We can use modular arithmetic to convert an rgb value to an integer.
getIntFromRgb = function(color3)
return 256^2 * color3.r + 256 * color3.g + color3.b
end,
getRgbFromInt = function(rgbInt)
return Color3.new(rgbInt/(256^2), (rgbInt/256) % 256, rgbInt % 256)
end,
log = function(n, base)
if n <= 0 then
return 0
end
return math.log(n)/math.log(base)
end,
base94Encode = function(n) -- Space character reserved for delimiter, so technically base94.
local n_str = ""
local char_count = math.floor(log(n, 94) + 1)
local r = n
for pos = char_count, 1, -1 do
val = math.floor(r/(94^(pos-1)))
val_char = string.char(base_char + val)
n_str = n_str .. val_char
r = r - val*(94^(pos-1))
end
return n_str
end,
base94Decode = function(str)
local str_len = str.len()
local n = 0
for pos = 1, str_len do
local val = string.byte(string.sub(str, pos, pos)) - base_char
n = n + val*94^(str_len-pos)
end
return n
end
}
-- For brevity, we will only include functions that'll process a color value.
local property_processor = {
color = {
deserialize = function(str_val)
local colorInt = util.base94Decode(str_val)
return util.getRgbFromInt(colorInt)
end,
serialize = function(val)
local rgbInt = util.getIntFromRgb(val)
return util.base94Encode(rgbInt)
end
end
}
local function deserializeItem(item_str)
local item = {}
local current_char_pos = 1
for ord, property_data in ipairs(itemData.baseProperties) do
local str_code, property_name = unpack(property_data)
local property_str_data = string.sub(item_str, current_char_pos, str_code.len())
local val = property_processors[property_name](property_str_data)
item[property_name] = val
-- In fact, this makes having the property string codes completely pointless, but will remain as-is for its didactic value.
current_char_pos = current_char_pos + str_code.len()
end
end
signal.onGetData(function(data)
local sandbox_data = data.sandbox_string
local item_str = ""
for c = 1, sandbox_data.len() do
local char = string.sub(sandbox_data, c, c)
if char ~= itemData.delimiter then
item_str = last_item_str .. char
else
deserializeItem(last_item_str)
item_str = ""
end
end
end)
Remarks
This articleâs effectiveness is predicated on whether or not you have understood roughly how the above processes works (data compression a la string representing a stream of bytes), why they are substantially better than the conventional approach, and you are ready to implement this into your own game. With ingenuity and a little bit of focus, you can come up with a compression scheme to perfectly suit your own project.
Appendix - Further Readings
Run Length Encoding Using this can greatly reduce size, especially if you have duplicate items. Itâs pretty easy to implement too.
LZW
Huffman coding
Arithmetic Coding