Yeah, I’m not sure why they’d be different if you have it enabled as well! I’m running an i7 4790k.
No - we’re not ready for this yet. The next step is to enable the VM for some select games, but we need to ship some fixes to it before that can happen so that’ll happen after the week of 4th.
It appears that when scripts run in the command bar, this optimization is disabled… we’ll take a look, this is surprising - I don’t recall the reason so it might be that we just missed this.
Alright, I got around to testing the new VM with Blox.
To check out the improvements I did a falling sand test. You know, this one that usually kills your PC:
The sand is generated as a single block layer at the top of the world, and when updated by a block update tick, falls instantly to the bottom layer of the world. The server then sends a block update event to the client, which then queues 10 block render updates to update the rendered blocks to match the new block data. On the next render step, those render updates are executed and some instances are created and destroyed to represent the new blocks.
Normally, block updates and block render updates are capped so as to not impact framerate, but for this test I disabled all caps. This means every queued update will be executed as fast as possible (and also your toaster will die instantly)
Keep in mind that Studio is simulating the server and client on the same machine, so this takes into account the entire chain of events.
I measured the total time taken for all the sand to fall.
In the old VM, this took 1:02.20. In the new VM, this took 1:12.08. I repeated the test several times to make sure that’s correct, and I’m actually surprised it took longer on the new VM.
Next up, I reinstated the block update and block render update caps, at 1 wait() every 20 updates. This is the cap used in all live production Blox servers right now, and strikes a nice compromise between visual inconsistencies and game performance. I then measured the new total time taken for all the sand to fall.
In the old VM, this took 4:01.31. In the new VM, this took 4:10.06.
That’s not great news for me (and a big disappointment since I really wanted some of those gains) but I’ll hold out hope for the future - it’s a beta so I can’t judge too harshly yet.
zeux-chan pls help
(on the bright side, world generation felt a tad faster, which is awesome )
The logical conclusion is that your code is not bottlenecked in the VM itself and instead spends the time in reflection code or something along these lines. If you share a self-contained example we can take a look.
I’m not sure what you mean by reflection code, can you provide an example? I’m just curious
Also I’m assuming this means my performance issues aren’t coming from my number crunching code?
Right, that’s my best guess, but it would be valuable for us to look at this - I don’t think we’ve seen any code before that was even slightly slower.
You’re correct on it being a Virtual Machine. Lua code is fundamentally built on the Virtual Machine, that’s why Roblox’s changes to it can have such an impact. Which is awesome!
This is really neat! I benchmarked my chassis with 100,000 iterations of the transmission on both the old VM and the new one.
Results (on average)
• The old VM took 1.09 seconds.
• The new VM took 0.395 seconds.
An over 2.75x increase! This is great!
Good job, zeuxcg and team! Hope to see you at RDC 2019, that is if I’m invited…
Edit
I tested the same place on an old Windows XP machine, equipped with a Core 2 Duo E4500 and an ancient iGPU which doesn’t even support Roblox, meaning the CPU has to do the tasks of both CPU and GPU. Anyways, on this PC the old VM took about 2.246 seconds, again an average result. If it enjoys the same increase with the new VM, we can estimate the test to take 0.677 seconds with the new VM, which is just as great! I couldn’t perform an actual Studio test on this machine due to the fact that it runs Windows XP.
Another thing to note is that my mobile, holding a Qualcomm Snapdragon 636, took 2.412 seconds to perform the benchmark (on average). Which means, again assuming the same gains, it would take 0.663 seconds with the new VM. I was quite surprised to see an over decade-old mid-low range CPU perform on-par with a modern smartphone CPU.
This is great to see! The gains depend highly on the platform - for example for us on Mac the gains are a bit more substantial than on Windows in general. On Android on Qualcomm chips we’ve tested the gains tend to be a bit higher than on PC/Mac, whereas on iOS they tend to be a bit lower. Hard to predict a lot of complex interactions there between the Lua code itself, the VM code, the C compiler and the internals of the CPU.
Why not this?
for i = 1, 1e6 do
math.sqrt(0) math.sqrt(1) math.sqrt(2) math.sqrt(3) math.sqrt(4) math.sqrt(5) math.sqrt(6) math.sqrt(7) math.sqrt(8) math.sqrt(9)
end
print(tick() - t0)
local sqrt = math.sqrt
wait(2)
local t0 = tick()
for i = 1, 1e6 do
sqrt(0) sqrt(1) sqrt(2) sqrt(3) sqrt(4) sqrt(5) sqrt(6) sqrt(7) sqrt(8) sqrt(9)
print(tick() - t0)
Type-checking and annotations are targeted for release in next quarter (see Roblox Platform High-Level Roadmap). Arseny’s project was only a hack-week project concept, this means that the final product will likely differ from what you see in the concept too.
That makes sense; the IntelliSense and compiler depend on the Lua VM, but from what I’ve observed, the VM also includes the debugger and Studio graphics engine. They’re revamping one part then the other in the same way many projects still work after dependencies update.
Apologies for wanting a clarification, especially since it seems to have already been said, but why exactly are both getfenv() and setfenv() reccomended to avoid? I can certainly see that they’d cause a lot of issues, but I’d like to know the exact reasons. I often do a lot of framework and back end stuff, so they’re handy for debugging, a little extra kick of module code, and figuring out which script is running the module. I don’t use them frequent enough to say there’s a call every 5 seconds but I’d like to err on the side of caution.
maybe its because he didn’t want to risk observing the overhead of using a for loop
Could be, but it’s known that iteration is significantly less expensive than calling functions.
You guys have exceeded my expectations on a fast lua VM, but the question I’m wondering is if you guys can lift some of the restrictive limitations of vanilla lua while doing so. Phantom Forces has been dealing with a upvalue limit in it’s weapons implementation for over a year now (Yes the framework has got to that point), It would be interesting to see a way to manage these kind of things dynamically rather then a static array in a struct.
They are planning on raising it:
measurable increase in speed in an ancient neural network project of mine. Now if only I could work out what to do with all the data to train them.