Sera - Low-level schematized serialization library

I’ve been making individual benchmarks like creating new buffer objects VS running the serialization code again while making this library. I’m afraid of making any wrong claims, but it feels like, generally speaking, the buffer library seems faster than a handful of table lookups in most cases.

Your average game wouldn’t even benefit much from a network library with serialization, so I thought why not make a schema serdes library that doesn’t try to be solve-all library for everyone but a high-speed serdes utility where it still requires the developer to work with buffers.

This module is my swing at what I understand as “hella fast” in terms of Roblox lua, but I haven’t been working with buffers for that long so I might be wrong :stuck_out_tongue:

Regardless, my goal for this project is to create a super performant serializer that does not bother with miniscule compression gains.

5 Likes

For anybody interested, I’ve created a few custom SeraTypes to use.
Read ‘More Info’ to see important behavior

More Info

When using these types, it’s important that you know the following:
Dictionaries are defined as key-value pairs
Arrays are defined like a stack; a list of items with no gaps

Dictionaries store the following:

  • Value count (table length)
  • Index/Key
  • Value

Arrays store the following:

  • Value count (table length)
  • Value

By default, all of the values are stored at 8-bit unsigned integers, this can be configured by editing the code.

Dictionary
local u8_Ser, u8_Des = module.Uint8.Ser, module.Uint8.Des

--Index of u8, value of u8
--Intended for small, variable-size dictionaries with keys.

--Has a base cost of 1 byte to store value count
--Each value + key takes 2 bytes
module.u8_u8_dict = table.freeze({
	Ser = function(b: buffer, offset: number, value: {number}): number
		local curr_offset
		do --Store the inner value count
			local value_count = 0
			for _ in value do
				value_count += 1
			end

			--Write the value count
			curr_offset = u8_Ser(b, offset, value_count)
		end

		--Store the inner values
		for i, v in value do
			curr_offset = u8_Ser(b, curr_offset, i)
			curr_offset = u8_Ser(b, curr_offset, v)
		end

		return curr_offset --Return the final offset after writing all values
	end,
	Des = function(b: buffer, offset: number): ({number}, number)
		local value_count, curr_offset = u8_Des(b, offset)

		local t = {}

		for count = 1, value_count do
			local i, v
			
			i, curr_offset = u8_Des(b, curr_offset)
			v, curr_offset = u8_Des(b, curr_offset)

			t[i] = v
		end

		return t, curr_offset --Return the final offset after reading all values
	end,
})
Array
--Index of u8, value of u8
--Intended for small, variable-size arrays

--Has a base cost of 1 byte to store value count
--Each value takes 1 byte
module.u8_u8_array = table.freeze({
	Ser = function(b: buffer, offset: number, value: {number}): number
		local curr_offset
		do --Store the inner value count
			local value_count = 0
			for i in value do
				value_count += 1
				
				if value_count ~= i then
					error("Invalid array")
				end
			end

			--Write the value count
			curr_offset = u8_Ser(b, offset, value_count)
		end

		--Store the inner values
		for _, v in value do
			curr_offset = u8_Ser(b, curr_offset, v)
		end

		return curr_offset --Return the final offset after writing all values
	end,
	Des = function(b: buffer, offset: number): ({number}, number)
		local value_count, curr_offset = u8_Des(b, offset)

		local t = {}

		for count = 1, value_count do
			local v
			v, curr_offset = u8_Des(b, curr_offset)

			t[count] = v
		end

		return t, curr_offset --Return the final offset after reading all values
	end,
})
Array (Converted to f32 values)
local u8_Ser, u8_Des = module.Uint8.Ser, module.Uint8.Des
local f32_Ser, f32_Des = module.Float32.Ser, module.Float32.Des

--Index of u8, value of f32
--Intended for small, variable-size arrays

--Has a base cost of 1 byte to store value count
--Each value takes 4 bytes
module.u8_f32_array = table.freeze({
	Ser = function(b: buffer, offset: number, value: {number}): number
		local curr_offset
		do --Store the inner value count
			local value_count = 0
			for i in value do
				value_count += 1

				if value_count ~= i then
					error("Invalid array")
				end
			end

			--Write the value count
			curr_offset = u8_Ser(b, offset, value_count)
		end

		--Store the inner values
		for _, v in value do
			curr_offset = f32_Ser(b, curr_offset, v)
		end

		return curr_offset --Return the final offset after writing all values
	end,
	Des = function(b: buffer, offset: number): ({number}, number)
		local value_count, curr_offset = u8_Des(b, offset)

		local t = {}

		for count = 1, value_count do
			local v
			v, curr_offset = f32_Des(b, curr_offset)

			t[count] = v
		end

		return t, curr_offset --Return the final offset after reading all values
	end,
})
1 Like

I feel like you could get rid of needing to specify the size of each datatype by simply using null terminators (value 0 u8 bytes at the end of each dynamically-sized datatype) which is what I did for my own buffer serde BufferConverter.
This completely removes the string and table size limit at the cost of not being able to directly get the size of a datatype (which noone would probably be doing anyway :stuck_out_tongue: )

1 Like

Honestly when it comes to buffers it is fast whatever you do (unless you code really really badly)

For example my buffer serde BufferConverter takes 8 milliseconds to serialize a table with 500 members (max, 1000 repeats)


and 3 milliseconds mean

And for 100 members it takes 0.5 milliseconds! (mean, 1000 repeats)


and 2 milliseconds max

The buffer library is really fast, you don’t really have to worry about performance anytime soon, just focus on compression since that’s what buffers excel at!

By the way, you do NOT need 48 bytes for a CFrame :sob:

Instead you can do it like this:


And when reading:

This way it’s only 22 bytes!

The reason you can do this is because R00 to R22 are gauranteed to be between -1 and 1, so you can simply represent that as a fraction of -127 to 127!

Does Replica support sera for serializations?

There’s also Sera.LossyCFrame that costs 28 bytes and has perfect coordinate and 0.0005’ish degree precision for rotations.

I have a feeling your implementation might lead to rotation imprecisions up to a degree or maybe even more. At that point you could just convert to euler angles and risk gimbal lock at 24 bytes.

1 Like

Good point, I’ll add an option to use the rotation matrix or axis angle

I wrote my SerDes library completely from my own 15 years of Lua expertise in what I understand as “fast code” so I haven’t been doing much comparison to other libraries. For fun I’ve taken your module for comparison. Here are the results:

First of all it’s a bit of a apples to oranges comparison to compare schematized and non schematized SerDes libraries since they have different goals in mind, but we can try to see just how different they can be if they’re trying to do the same thing:

By defining strict types and anticipating value order with a schema the serialized result becomes pretty compact. Your library defaulting to non f64 numbers destroyed the UserId field - Roblox floats support integer precision and UserId’s have surpassed 2 ^ 32 which means the only native datatype in Roblox to hold UserId’s is f64 aka the generic Lua number type. You can take that into consideration whether a non-schema SerDes should default to f64 numbers or something else.

As for the speed:

There’s over a 30x speed difference when serializing this type of table. I’ve created Sera for a project where I will need tons of serialization at runtime for replicating game state so I’m planning to push Roblox to it’s limits lol.

2 Likes

Dang, this is definitely one hell of a wake up call… that speed difference is really big, I assume because of more loops in mine. Though, for the UserId I would store them as strings instead of numbers.

Also, you can specify a number size type (idk what to call them) by doing Converter.Serialize(…, {numbersAs = “f32”}) (for example), though this does do it for all numbers and would benefit from a schema system (like yours), I guess it really is comparing apples to oranges.

Also also, I would still like you to implement null terminators to completely eliminate the size limitation, I honestly am considering switching to this from my own module if you do implement it :slight_smile:

1 Like

I want to avoid operations where my module would have to look for a null terminator - my goal with Sera is to do as few Lua operations as possible and let the native code behind Lua do the most heavy lifting. Using Sera.String32 would give you limitless string size while having negligible serialized size impact.

1 Like

Well, alright. You learn something new every day…

Also, 15 years?? That’s longer than I’ve been alive! :fearful:

Holy?? It is that fast? I was planning to use Squash, but I am planning to switch now, thank you for this godly resource

1 Like

Quite fast but caching and/or splitting the workload speeds it up hell a lot more. I use mine for exporting or importing large game assets even so still takes a bit of time

Also, for a frame of reference on how much data/bandwidth this can save, I checked the size of data for the serialized and unserialized versions of this data:
image

and the results were:
Unserialized (roblox default): 83 bytes
Serialized: 27 bytes !!

If you want to know how I saw how much data they took up, check this post: Introducing UnreliableRemoteEvents - #110 by Luaction

1 Like

I was doing a few tests benchmarking Sera against Squash - Although I don’t feel like my tests were super high quality I do believe Sera would have to be up to x1.7 faster or at worst just as fast as Squash. Schematized SerDes is easier to write than manually filling the buffer with Squash. I also think Squash might’ve been written better in some regards (I expected it to run faster than Sera), but I’m personally only interested in schematized SerDes.

1 Like

Benchmarks are meant to compare actual processing time of pieces of code - putting processing in new coroutines or threads during the benchmark beats the purpose of a benchmark.

Putting code into coroutines can make throttling easier, but it doesn’t prevent overloading the processor since normally Lua runs on a single processor core and simply resumes a coroutine immediately after a previous one yields.

Its not threads what I mean is the data is split into blocks to be processed similar to QUIC protocol

I use the IEEE 754 NaN value encoding method to serialize custom binary data directly within the fraction bits of a NaN value. This allows me to store small pieces of binary information in a compact floating-point format, which I like since it is ideal for embedding metadata, flags, or other small data that doesn’t need a full object structure. I’ve been focusing on making this as fast as possible, so I also use buffers to handle raw binary data, which lets me manipulate the bytes without the annoying overhead directly.

Pretty good method since it lets me serialize data efficiently, using buffers for fast reading and writing of the manipulated NaN values. And by avoiding the more complex serialization formats, I can quickly encode and decode binary data stored in the NaN’s fraction field.

The IEEE 754 standard, which is a method for floating point numbers in binary, is just naturally faster than more complicated serialization methods tbh. It breaks the number into three parts the sign bit, the exponent, and the fraction. This was made to be optimized for the hardware level, particularly when using hardware supported operations for floating point values but thats beyond Roblox.

Can you share whatever personal module you used in this benchmark?