Micro-optimization, Lua assembly, and basement shenanigans

Hello, fellow citizens of this forum.
I am pleased to inform you all that after about 2 consecutive days of work, I have finally finished a (stable) releasable version of my Lua debugging tool (woo).
For those of you new to my rants about how perfect code should be, here was my previous release of a library to accomplish just this, but just a normal library for only proto debugging.

However, semicolon, I now bring a full disassembler and reassembler, using that same library, and with a neat (not necessary) interface. As it happens, this is a great tool to be used for micro-optimization, deep code analysis, and literally any other meme you want to make out of it.
So, I present to you, the project:
https://github.com/Rerumu/Rerudec

This year’s achievements:

  • Finishing ReluaCore
  • Finishing Rerude
  • Under 20 liters of caffeine intake
  • At least 2 devforum posts advertising my work

If anyone makes anything cool with this, or makes any other discoveries, make sure to spam my Discord DMs about it or PM me here, thanks.
Now, if you excuse me, I have 4 days worth of sleep to catch up on, and the fact that C# doesn’t support 8 bit default ASCII by default did not help out.

Images of the beauty:

Did I mention you can recompile the assembly-like code back into bytecode (hex view for simplicity)?

For now, I will likely go into hibernation, I hope all you scripters (and those learning) make use of this early holidays gift!

17 Likes

This is super cool!

Since the LuaVM isn’t really running at a fixed rate, do you know if certain ops take longer or shorter in general? Could your tool provide a visual studio-escue execution time statement? That would be nifty!

1 Like

As far as I know, it depends. But this, mostly being a static bytecode analyzer can probably only go as far as telling you the opcodes and the registers they’re accessing.
Right off the top of my head though I remember dictionaries being slow to interact with and I believe CALL was also one of the slowest opcodes. Thanks for the idea though; I might make it part of my next post.

2 Likes

CALL also invokes another routine so if it’s a Lua routine, you can add all the ops in that. If it’s a C routine, then of course, it’s just variable. You could disassemble the C code at that point.

Would be neat if you could inspect where a routine jumps to.

1 Like

It’s more about copying arguments and pointers that makes CALL expensive on its own, as opposed to the actual code running which may vary in length. At the moment Rerudec shows all subfunctions of a function and what CALL is calling in the side comments.
As for disassembling C code, it doesn’t quite work that way. Lua CClosures are structs containing a pointer to a C/C++ function along with some other misc data. C/C++ actually never directly interacts with Lua and instead uses the Lua C api to edit its stack and state. Often times the sort of returning you see from functions such as wait() isn’t a C++ function returning, since they can only return 1 item at a time; they’re just pushing values onto the Lua stack.
That said, from the C/C++ end you don’t need to worry so much about speed, since their respective compilers do a hell of a good job at optimizing.

2 Likes

Yeah. I always assumed the Lua stack was pretty performant but now that I think about it, of course the C/Lua binding is sorta expensive. I still think that some of the Lua C++ functions are going to have more time executing the meat of the function vs. the wrapping. print() is actually really expensive.

It’s be interesting to profile.

On Roblox’s side, there’s a lot of automated argument parsing being done. Can’t really do anything about that though.

1 Like

I mean, yes, some things will take longer on the C++ end for sure even with the performance boost (LuaJIT when tbh), and it’s usually due to all the interactions it has to do with Lua, which are often slow. This is probably the reason we got __namecall as a metamethod, too. Prior to this, we still relied on the relative speed of SELF and CALL working together, and then the arguments being passed in a certain order. Now, instead of indexing and calling we simply invoke a metamethod.
Most of these optimizations only really show their powers when you’re doing extensive, quick cross-language calls. But yeah when I get back from my Christmas vacation I’ll get to work on a post about VM performance and optimizations since you mentioned it.
As a side note, I’m pretty sure Roblox creates a new thread per event, and then accesses the stack of that thread to pass arguments, and considering we have events such as RenderStepped sometimes being connected… :eyes:

2 Likes