(Was gonna post this last night but the devforum was messed up)
You’re only measuring creation of pretty small tables. I ran some more extensive tests on 10,000x10,000 tables with random elements. It was (kinda) interesting:
Lua test
code
local w, h = 10000, 10000
local startTimer, endTimer;
do
local start;
function startTimer()
start = os.clock();
end
function endTimer(name)
print(string.format("%s took\t%.1f ms", name, 1000*(os.clock() - start)));
end
end
-- init
wait(3)
startTimer()
local oneDim = table.create(w * h, nil)
endTimer("oneDim alloc")
wait()
startTimer()
for row = 1, h do
for col = 1, w do
oneDim[(row-1)*w + col] = math.random()*10
end
end
endTimer("oneDim init")
wait()
startTimer()
local twoDim = table.create(h, nil)
for row = 1, h do
twoDim[row] = table.create(w, nil)
end
endTimer("twoDim alloc")
wait()
startTimer()
for row = 1, h do
for col = 1, w do
twoDim[row][col] = math.random() * 10;
end
end
endTimer("twoDim init")
wait()
-- test
local x;
-- one dimensional
startTimer()
for row = 1, h do
for col = 1, w do
x = oneDim[(row-1)*w + col]
end
end
endTimer("oneDim access");
wait()
startTimer()
for i = 1, w*h do
x = oneDim[i]
end
endTimer("oneDim linear access");
wait()
startTimer()
for row = 1, h do
for col = 1, w do
oneDim[(row-1)*w + col] = x
end
end
endTimer("oneDim mutate");
wait()
-- 2 dimensional
startTimer()
for row = 1, h do
for col = 1, w do
x = twoDim[row][col]
end
end
endTimer("twoDim access")
wait()
startTimer()
for row = 1, h do
for col = 1, w do
twoDim[row][col] = x
end
end
endTimer("twoDim mutate")
Results, luau
|
1D (ms) |
2D (ms) |
alloc |
615 |
599 |
init |
5652 |
5294 |
access |
1386 |
1145 |
lin. access |
675 |
N/A |
mutate |
1425 |
1187 |
I ran the same thing with Lua for Windows, with the modifications that I replaced the table.create
s with just {}
, and made h = 1000
instead of 10000
due to memory issues.
Results, standard luac/lua:
Do not compare these results directly to luau’s – the size of the tables are different
|
1D (ms) |
2D (ms) |
alloc |
0* |
1* |
init |
1252 |
1124 |
access |
301 |
346 |
lin. access |
201 |
N/A |
mutate |
311 |
356 |
*not actually allocating anything
C test
For fun I ran the same thing in c (no optimizations):
code
// forgive the syntax colors
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
const long w = 10000;
const long h = 10000;
double start;
double getTime() {
return (double)clock() / CLOCKS_PER_SEC;
}
void startTimer() {
start = getTime();
}
void endTimer(const char* name) {
printf("%s took\t%.3f ms\n", name, 1000.0 * (getTime() - start));
fflush(stdout);
}
int main() {
startTimer();
double* oneDim = (double*)malloc(w * h * sizeof(double));
if (oneDim == NULL) {
return errno;
}
endTimer("oneDim alloc");
startTimer();
for (long row = 0; row < h; row++) {
for (long col = 0; col < w; col++) {
oneDim[row * w + col] = (double)(rand() % 10000) / 1000;
}
}
endTimer("oneDim init");
startTimer();
double** twoDim = (double**)malloc(h * sizeof(double*));
for (long row = 0; row < h; row++) {
twoDim[row] = (double*)malloc(w * sizeof(double));
}
endTimer("twoDim alloc");
startTimer();
for (long row = 0; row < h; row++) {
for (long col = 0; col < w; col++) {
twoDim[row][col] = (double)(rand() % 10000) / 1000;
}
}
endTimer("twoDim init");
double x;
startTimer();
for (long row = 0; row < h; row++) {
for (long col = 0; col < w; col++) {
x = oneDim[row * w + col];
}
}
endTimer("oneDim access");
startTimer();
for (size_t i = 0; i < w*h; i++) {
x = oneDim[i];
}
endTimer("oneDim linear access");
startTimer();
for (long row = 0; row < h; row++) {
for (long col = 0; col < w; col++) {
oneDim[row * w + col] = x;
}
}
endTimer("oneDim mutate");
startTimer();
for (long row = 0; row < h; row++) {
for (long col = 0; col < w; col++) {
x = twoDim[row][col];
}
}
endTimer("twoDim access");
startTimer();
for (long row = 0; row < h; row++) {
for (long col = 0; col < w; col++) {
twoDim[row][col] = x;
}
}
endTimer("twoDim mutate");
// cleanup
free(oneDim);
for (long row = 0; row < h; row++) {
free(twoDim[row]);
}
free(twoDim);
}
Results, C:
|
1D (ms) |
2D (ms) |
alloc |
1 |
35 |
init |
2689 |
2574 |
access |
187 |
216 |
lin. access |
222 |
N/A |
mutate |
197 |
234 |
Takeaways
IDK.This wasn’t scientific and these languages are wildly different and benchmarking properly is hard to do. Plus my code’s probably incorrect somewhere. But:
-
Luau is slower than C? (groundbreaking)
-
Allocation was slow for both. table.create
lies in its docs, and does actually initialize the elements even if the second element is nil. A quick test confirms this:
size |
table.create time |
100 |
0.0 ms |
1000 |
0.0 ms |
10000 |
0.0 ms |
100000 |
0.6 ms |
1000000 |
5.9 ms |
10000000 |
58.2 ms |
Would be nice to have straight allocation without initialization, but what can ya do. |
|
-
It’s basically a tie for speed. Use whichever one’s easier. If you can utilize the linear access capabilities of the 1D array, use that.
-
It doesn’t seem like cache locallity or anything actually matters here much.
-
The 2D array is (slightly) faster for a lot of things in lua. That could be that the extra cost of a multiplication and add to index into the 1D table correctly is more expensive than the extra indirection in the 2D. (again though, it’s pretty unnoticeable).
The 1D array is faster for most things in C. That makes more sense to me.
There may still be other benefits to using the 1D array – e.g. it is probably easier on the garbage collector to track…? I have no idea if that’s true, though. Ask whoever wrote the engine