SharedTable Best Practices

CompilerError · September 1, 2023, 4:31pm

I’m hoping to better understand some of the best practices around parallel Luau, primarily with respect to SharedTables.

One thing I’ve noticed while using SharedTables is that they take significantly longer to write to, which is understandable given that there is likely some form of memory replication across the various VMs involved in the parallelization process.

Generally when writing to any part of a shared table I see a minimum of 0.003ms per write up to 0.01ms in my experience, that may not sound like much, but it adds up a lot when you’re trying to perform heavy processing across several threads. I’ve found that writing to a standard temp table and then overwriting a larger chunk of a SharedTable to be a bit faster, but there’s some added overhead of “cloning” the data in the SharedTable. Ex:

local st = SharedTable.new()

st.test = {}

st.test.x = 1
st.test.y = 2
st.test.subTable = {
	x2 = 1,
	y2 = 1,	
} -- This gets converted into a shared table automatically from what I can tell.

local st_clone = {
	subTable = {}	
};

while task.wait() do
	debug.profilebegin('shared_table_test')
	st.test.x += 1
	st.test.y += 1
	st.test.subTable.x2 += 1
	st.test.subTable.y2 += 1
	debug.profileend()
	
	debug.profilebegin('shared_table_overwrite_test')
	st_clone.x = st.test.x + 1
	st_clone.y = st.test.y + 1
	st_clone.subTable.x2 = st.test.subTable.x2 + 1
	st_clone.subTable.y2 = st.test.subTable.y2 + 1
	
	st.test = st_clone
	debug.profileend()
end

The microprofiler then shows the following:

shared table direct write test

shared table clone and overwrite test

The first example shows that the time taken to update the shared table directly, versus cloning its data and overwriting the shared table altogether is faster. It also correlates with my findings that each write takes approximately 0.003ms where in the first example we’re writing to a SharedTable 4 times and thus ~0.012ms (in this case 0.013ms) while in the clone and overwrite example we see 0.006ms where we’re only writing to the shared table one time but there’s some overhead in cloning the data.

When increasing the number of writes and values to clone this “optimization” does continue to yield faster results, but the “cloning” overhead doesn’t seem to scale all that well. It ends up being about a 25% improvement over direct writes in my experience.

However, in the case that you need to write to the same value potential many times, or really just more than once, this “optimization” really shines. Writing to a standard table is likely 10s of times faster than writing to a SharedTable so in this example:

local st = SharedTable.new()

st.test = {}

st.test.x = 1
st.test.y = 2
st.test.subTable = {
	x2 = 1,
	y2 = 1,	
} -- This gets converted into a shared table automatically from what I can tell.

local st_clone = {
	subTable = {}	
};

while task.wait() do
	debug.profilebegin('shared_table_test')
	st.test.x += 1
	st.test.y += 1
	st.test.x += 1
	st.test.y += 1
	st.test.x += 1
	st.test.y += 1
	st.test.x += 1
	st.test.y += 1
	st.test.subTable.x2 += 1
	st.test.subTable.y2 += 1
	st.test.subTable.x2 += 1
	st.test.subTable.y2 += 1
	st.test.subTable.x2 += 1
	st.test.subTable.y2 += 1
	st.test.subTable.x2 += 1
	st.test.subTable.y2 += 1
	debug.profileend()
	
	debug.profilebegin('shared_table_overwrite_test')
	st_clone.x = st.test.x + 1
	st_clone.y = st.test.y + 1
	st_clone.x += 1
	st_clone.y += 1
	st_clone.x += 1
	st_clone.y += 1
	st_clone.x += 1
	st_clone.y += 1
	st_clone.subTable.x2 = st.test.subTable.x2 + 1
	st_clone.subTable.y2 = st.test.subTable.y2 + 1
	st_clone.subTable.x2 += 1
	st_clone.subTable.y2 += 1
	st_clone.subTable.x2 += 1
	st_clone.subTable.y2 += 1
	st_clone.subTable.x2 += 1
	st_clone.subTable.y2 += 1
	
	st.test = st_clone
	debug.profileend()
end

We see far greater results in cloning the SharedTable and overwriting it at the end.

See Microprofiler results:

shared table direct write test

shared table clone and overwrite test

This shows that cloning the table in this case yields nearly 325% faster results than directly writing to the table. It also doesn’t quite agree with my previous finding that a write to a SharedTable takes approximately 0.003ms, I’m not entirely sure why that is, perhaps someone might have some insight into that.

I’m sure these times vary quite a bit across different CPUs but it’s been extremely consistent from my testing on my own PC.

Perhaps the Actor Messaging API is faster? I really doubt it is, but I have yet to test this.

If any of you have more experience with Parallel Luau than I do (and I don’t have much) then please feel free to chime in and help drive this conversation!

I’m really curious to learn more about Parallel Luau as it is today in Roblox and

Tomi1231 · September 1, 2023, 10:34pm

I made a parallel lua module and I ended up going with shared tables for sending the parameters to all the actors. One thing I ended up doing was merge the parameters of multiple tasks togheter (still a single actor though, I merged tasks togheter to reduce overhead) in the same table, reducing the amount of nested tables, and that seemed to help a lot

However, those performance improvements might also be the result of less writes to the table