Ever wanted to speed up workloads with Parallel Luau, but had trouble fiddling with Actors? RoKernels may be the solution for you! RoKernels is a parallel computing framework inspired by OpenCL and CUDA, and allows developers to write parallel code in a clean and easy way. Interested? Consider trying out RoKernels for yourself right now at https://create.roblox.com/store/asset/133509020570754/RoKernels!
How to use RoKernels
To use RoKernels, write a parallel compute kernel in a ModuleScript, then create a RoKernels device using the RoKernels module, then run the kernel on the device. RoKernels will handle the rest for you, and all you have to do is wait until the compute workload is finished!
Matrix Multiplication Use Case Example
Doing matrix multiplication in serial is slow. It’s so slow in fact that calculating the matrix product of two matrices of size 1000x1000 or greater will cause a script timeout. RoKernels to the rescue! Using RoKernels, I wrote a kernel to do matrix multiplication in parallel, and I used a RoKernels device with 10 Actors to run the kernel. This is what the multiplication kernel looks like:
return function(readBuffer: table, writeBuffer: table, i: number)
i = i + 1
local MATRIX_SIZE, A, B = unpack(readBuffer)
local nuTable = {}
for j = 1, i do
local acc = 0
for k = 1, MATRIX_SIZE do
acc = acc + A[i][k] * B[k][j]
end
nuTable[j] = acc
end
writeBuffer[i] = nuTable
end
By using parallel computing to calculate matrix products, a matrix product of two 1000x1000 matrices can be calculated in three seconds, rather than causing a script timeout.
But it gets even better. I benchmarked how long it takes to calculate the product of two 500x500 matrices using serial compute vs. RoKernels parallel compute. Here is the code for each method:
Serial:
local start = os.clock()
for i = 1, MATRIX_SIZE, 1 do
for j = 1, MATRIX_SIZE, 1 do
local acc = 0
for k = 1, MATRIX_SIZE, 1 do
acc = acc + A[i][k] * B[k][j]
end
C[i][j] = acc
end
end
print(os.clock() - start)
Parallel:
local start = os.clock()
Device:run(script.MatrixMultiplicationKernel, {MATRIX_SIZE, A, B}, C, MATRIX_SIZE)
print("Did matrix multiplication")
print(os.clock() - start)
Doing a 500x500 matrix product in serial takes about 1.8 seconds on average.
In comparison, doing a 500x500 matrix product only takes about 0.4 seconds on average. That is a 4x speed increase over doing matrix products in serial, and is a 1.4 second time save per matrix multiplication!
Conclusion
RoKernels is a powerful framework that allows developers to speed up their workloads several times over by harnessing the power of parallel computing in a quick and easy manner. If you want to unlock the true performance potential of your games, consider trying RoKernels out for yourself at https://create.roblox.com/store/asset/133509020570754/RoKernels.