Make data access for Actors go faster

Hi!
Please consider giving us faster ways to access shared data for actors.

Specifically, I was surprised to see frozen SharedTable access was still around the same speed as regular unfrozen sharedTable access.

My usecase is I have about 10000 conditional buffer copies to do, and reading that data from a frozen sharedTable is almost 4x slower than just doing it on the main thread. :frowning:

(Is that a bug or just a feature that hasnā€™t been enabled/finished yet?)

7 Likes

Hi @MrChickenRocket,

If my understanding is correct you are essentially performing ~10000 reads from the SharedTable. We have seen that SharedTable reads can be considerably slower than reading from a native Luau table. A lot of this has to do with the overhead of calling back into C++ code for the SharedTable type compared to a native VM operation (i.e. executing a VM opcode to read from a Luau table). However, that doesnā€™t necessarily mean we canā€™t still make some further performance improvements to SharedTables.

Could you provide some more information about your use case? For example, what is the type of the key you are using to index the SharedTable? Do you have any code to demonstrate your exact issue, or any timings we could look at to see if the performance is inline with our expectations?

Iā€™ve also faced this issue and had to come up with various workarounds for it. Iā€™m working on a large scale voxel engine where each chunk of the world contains 16x16x16 voxels.

I utilise various methods to ensure the rendering and storage of this data has a low impact. Greedy meshing for voxels is a big one which requires a lot of table reading.

Iā€™d love to use SharedTables to store this information. Itā€™d be a great source of world state across multiple Actors and unrelated core code to the engine. From my benchmarks the read/write times can be as high as 8x-10x that of a native Luau table, so this makes it undesirable and unviable to use.

I instead opt to store my voxels in a native Luau table inside each Actorā€™s main script and simply offer ā€œmethodsā€ of manipulating those blocks through BindToMessage hooks. Whenever a chunkā€™s voxel state needs to be read it causes challenges, but I have other creative workarounds for those instances. Happy to share code privately with engineers if interested.

I understand why all this is the case, and I have no better solutions to offer in regards to alternatives. Just thought Iā€™d chime in to say ā€œyeah, Iā€™m suffering from the same thingā€ :pray:

From my experience, youā€™re better using SharedTable to import instructions and export the final product en masse in a compressed form. Trying to read/write SharedTables as part of the parallel operation itself is going to be bad. Heck, you can even just use traditional BindableEvents for that purpose and consolidate everything into a ModuleScript instead and skip using SharedTables all together. When you want to edit a chunk, send the compressed chunk data (which should already be compressed in storage) to an actor by whatever means, then have it decompress, edit, greedymesh, and recompress all in parallel, then send the entire chunk back to a marshaller in serial.

Depending on the compression used you can squish it down to 4 bytes for each cuboid (every ā€œblockā€ made from greedymeshing) in a 16^3 chunk size by simply using a lookup table. In my example below, I didnā€™t use greedymeshing and the large chunk size made it so that each voxel is 3 characters instead.

In this case Iā€™m packing 100 buffer objects (x 100, one for each player)

Each character table is 12 string keys with numbers, and one Vector3 field.
The final result is a lua ā€œbufferā€ object containing the contents of 100 pairs of player character data tables.

Packing a character view happens two ways - either like so:

function CharacterData:SerializeToBitBufferFast(buf : buffer, offset: number)

	local contentWritePos = offset
	offset += 2 --2 bytes contents

	local contentBits = 0xFFFF
	
	local serialized = self.serialized
	
	offset = WriteVector3(buf, offset, serialized.pos)
	offset = WriteFloat16(buf, offset, serialized.angle)
	offset = WriteFloat16(buf, offset, serialized.stepUp)
	offset = WriteFloat16(buf, offset, serialized.flatSpeed)
	offset = WriteFloat32(buf, offset, serialized.exclusiveAnimTime)
	offset = WriteByte(buf, offset, serialized.animCounter0)
	offset = WriteByte(buf, offset, serialized.animNum0)
	offset = WriteByte(buf, offset, serialized.animCounter1)
	offset = WriteByte(buf, offset, serialized.animNum1)
	offset = WriteByte(buf, offset, serialized.animCounter2)
	offset = WriteByte(buf, offset, serialized.animNum2)
	offset = WriteByte(buf, offset, serialized.animCounter3)
	offset = WriteByte(buf, offset, serialized.animNum3)	

	buffer.writeu16(buf, contentWritePos, contentBits)
	return offset
end

or like so:

function CharacterData:SerializeToBitBuffer(previousData, buf : buffer, offset: number)
	
	if (previousData == nil) then
		return self:SerializeToBitBufferFast(buf, offset)
	end
	
	local contentWritePos = offset
	offset += 2 --2 bytes contents
	
	local contentBits = 0
	local bitIndex = 0

	--calculate bits
	for keyIndex, key in CharacterData.keys do
		local value = self.serialized[key]
		local func = CharacterData.methods[CharacterData.packFunctions[key]]
            
        	local valueA = previousData.serialized[key]
            	local valueB = value

            	if func.compare(valueA, valueB) == false then
			contentBits = bit32.bor(contentBits, bit32.lshift(1, bitIndex))
           		offset = func.write(buf, offset, value)
            	end
		bitIndex += 1
	end
		
	buffer.writeu16(buf, contentWritePos, contentBits)
	return offset
end

In testing, I pay about 1ms to create this shared table from a regular table and freeze it.

And then the loop that does this runs about 4x slower than if I just let it run on the main thread.

To be doubly clear - this table is frozen and is only used in a read-access way.