Lua performance profiling API

Edit: It’s live. debug.profilebegin(string label), debug.profileend()

Some tutorial videos:


Hey, so we’re working on a Lua performance profiling API.
You’ll be able to use this API to measure and visualize how long your Lua code takes, as well as figure out what is causing performance problems within your game.

This will be somewhat similar to Unity profiling, which is “annotation based” (You’ll need to add annotated sections to your code to see profiling data about it).

So we’ve got three possible ways this API could look:

Choice 1:

debug.profilebegin(“test_section”)
yourCode()
debug.profileend()

You will need to manually match your profilebegin and profileend calls, or else your data will be invalid. You can nest multiple profilebegins within each other if you have the matching profileends.

Choice 2:

local profileToken = debug.profilebegin(“test_section”)
yourCode()
debug.profileend(profileToken)

You will need to manually match your profilebegin return token to your profileend call, or else your data will be invalid. You also won’t be able to close things out of order. Attempting to close something out of order will throw an error. The token parameter on close can be optional, and without it it would default to choice 1 behaviour.

Choice 3:

debug.profilesection(function()
yourCode()
end)

Any yielding code will fail inside this block.


A couple of caveats with how this would all work.

  1. yielding or waiting functions would NOT work with this measurement.
  2. this function would only be useful for measuring the performance of your non-yielding, non-waiting lua code.

So, my question to you: Which one of these APIs would you prefer?

  • Choice 1
  • Choice 2
  • Choice 3
  • It doesn’t matter to me.
  • I don’t know what this would be used for.

0 voters

Edit:
Added a mockup of what this would look like.

13 Likes

Choice 1 is bad because I don’t get any control over it. I would never use this feature if that’s how it’s implemented.

Choice 3 is bad because it makes it difficult to implement the function. I need to be able to paste and cut this function wherever I need it, it needs to be quick and easy. If I have to wrap my code in a function then it isn’t quick and easy and it introduces the limitation that whatever I’m profiling has to be a specific piece of code in a single script.

Choice 2 is perfect because it allows me to implement profiling however and wherever I want. I love it.

4 Likes

I’m curious: what control won’t you have with choice 1? (You can open and close as many profilebegins, you’re just asked to close every one that you open).

In choice 2, you won’t be able to close them out of order.

I prefer choice 1 because it should be a stack, and that API reflects it best. However, I would support renaming it to “profile.push(name)” and “profile.pop()”. It’s not ideal to make new globals, but given the nature of this feature, widespread adoption could hinge on brevity.

Then again, I could just put this at the top of my scripts:

local profilePush, profilePop = debug.profilebegin, debug.profileend

I think choice 1 but I’m not sure.

Could I start it up in one script and end it in another? Would it let me look at renderstepped events? I’m kind of a noob and how this works only learned last month how to kinda-sorta use the profiler.

Edit: I think me be unsure about everything might mean that I (personally) would like tons of documentation/ease of use.

start it up one script and end it in another

Yes for a localscript/script calling a modulescript. Triggering an event would trigger the listening script “after the current script’s completion”, so not exactly.

renderstepped events

Yes, if you’ve got lua code that binds to the renderstep.

documentation

We’ll be doing a bunch of documentation (wiki page and probably a video) on how this all works.

Excellent. I don’t know if it works like this but here’s my proposition:

Kinda like we have these dots to watch/pause code, I think enabling the profiler in a similar manor would be awesome. Maybe a click+drag or a click+shift click to select everything inside it.

Here’s my second idea (again, no idea if it works like this):

These things where we extend/expand folds could be right clicked. It’ll turn the arrow purple or something and everything inside it gets profiled.

I’d imagine it’s a lot easier to just point your mouse and go ‘please look at this’ than it would be to type out code for every individual thing.

1 Like

Is better integration with scripting an option? In Unity you really never have to use those annotations. They do internally in their C++ code where there’s no other option, but that’s another thing. It’s already broken down in a tree by function. That would really be preferable. Else I vote for 1, be direct.

1 Like

I’m not really sure how this is going to work, but I’d definitely want to be able to close them out of order. I’d want to be able to test and view results in a fashion similar to this:

with functions/script execution time instead of file sizes. For instance, that list could read something like this:

  • Script (3s)
  • Main (3s)
    • AnotherFunction (1s)
    • Function2 (2s)

Edit:

I agree, if all of this could be automatically tracked and I could open up a tree of all the functions in the script, that’d be a lot better than the proposed choices. The provided choices are the equivalent of me having to add print(variableState) every other line in my scripts – it sucks to do stuff like this manually. The control is good for when I want to be able to analyze a specific piece of code like a part of a function, but I’d rather have it be usable in general than have extra power for one in a blue moon use that auto-profiling couldn’t cover.

1 Like

What do you mean? Like the explorer in studio except instead of workspace you see

RunService → RenderStepped Heartbeat etc

and workspace → FindFirstChild, Ray, Region3 stuff etc

If so that’d be freaking amazing. Again I’m not sure how this stuff works but I’d love to be able to do:

LocalClientScript1 (whatever the script is) → workspace ->FindFirstChild → Usage: 230 Cycles, 0.2%, Runtime 500ms (some stats idk)

Another example would be

ControlScript → RunService → RenderStepped → Usage: 200 Cycles, 1%, Runtime: 50ms

@ContextLost

We’re working on a more beginner-friendly “why is my game running slow” studio tool, but that’s several months down the line. This is specifically is for the ctrl+f6 frame visualizer, which is for debugging long-running lua code (and a whole lot less work to do)

@ScriptOn

Better studio integration can definitely happen if this ships and becomes a consistently useful tool for developers :slight_smile:

A few more specifics on what this would look like:

For something like this (this code would obviously slow down your game and cause frame rate problems):

1Capture.PNG

You’d be able to see this:

2Capture.PNG

You’d be able to tell that the code inside the “alltests” annotation takes up multiple rendering frames, and that “multest” and “addtest” take about the same time.

2 Likes

Would this be able to completely track how long Instance.new and stuff takes? I know there’s another area in there that you can really zoom into to see how long it takes.

Yeah this would work for anything that isn’t yielding (*async, *wait, etc).

I think you’re on to something here. To expand on that, what if this was just integrated with Studio’s script editor? It could parse these debug.profilebegin()/debug.profileend() calls, hide them in the displayed source code, and use them to render an overlay. Then during execution, perhaps you could hover over the overlay to see the average time for each section.

Just some ideas, I have no clue what the official plan on this is.

1 Like

Just to make sure I understand this right:

When no parameter is provided, choice 2 works exactly like choice 1? Choice 2 only adds functionality, it doesn’t remove any?

Which means, if I want to make sure I’m matching my profilebegin and profileend properly (easily), only choice 2 lets me do that? Choice 1 will only error when there’s one extra profileend or one too few. With a lot of code running profiler tests that could be a mess to debug. Choice 2 will error as soon as I provide the wrong argument, giving me a section in which the erroneous profilebegin and profileend matching is within, right?

If all of that is true, then choice 2 provides exactly the same functionality as choice 1 while also making it easier to debug if one messes up profilebegin and profileend matching. Developers can program things just like choice 1 which is easy to use, while also having the option to be sure things are matching up if they need or want to use it.

That’s right, except we’re not 100% on making the parameter in choice 2 optional. Do you think you’d take the time to use properly match the returned tokens, or would you just drop them if given the choice?

Edit, being concise!

I will drop tokens it if the code is small, simple, and clear.

I will use tokens in complicated sections where errors or inaccuracies caused by begin/end mismatches would be hard to track.

Tokens are useful for preventing mismatches which cause inaccuracies and errors. This is not a big deal in small sections of code, but in complicated sections it is hard to manually track begins/ends, so tokens speed things up.

To avoid wasting time debugging, some developers might make a token system themselves if choice 1 is implemented.

Old, long post

It would depend on what I was profiling. In most cases I’d likely drop them if given the choice. Edit: I’m not so sure now. I’m becoming a fan of being strict about using tokens in order to prevent errors and inaccuracies.

If I have a big, complicated section then I’d likely use them in order to make sure I am matching my profilebegin and profileend properly so that I’m getting accurate results and not making errors.

I would also use them if I got a “too many” or “too few” error. I’d stick the tokens in the places I was unsure about and see if they error. This is very important in big, complicated sections if a lot of profiling is going on. One might have to track profiling through module loading, calling of functions in other scripts/modules, and other ways that are hard to track. Having to manually look through from the first profilebegin and count possibly to the last profileend to find the mismatch could take a lot of work and time, which tokens solves. Just tacking on a profilebegin or profileend will give inaccurate results. With choice 1, some developers might even go about wrapping the API with token functionality in order to have their own way to check such things and avoid wasting time on debugging their profile code.


Major edit: Also to note, one might mismatch begins and ends without noticing it. If they have the right number with the mismatch then they will have inaccurate profiler results. Tokens solves this too.

Changed my vote to “doesn’t matter to me” after some thinking.

I’m probably not going to use this until it’s either automated or handled like breakpoints. I’m already not a fan of adding print(state) every other line, so I don’t see why I’d like to use a similar implementation for Lua profiling – the proposed choices are hardly different from taking the time at the beginning of a function and printing current-start at the end, with the only difference being that I get a GUI display instead of messages in the output.

I don’t want to continually have to perpetually type/remove the same stuff over and over again while I’m performance testing either, especially if it means losing track of start/end points for profiling and causing errors every time I check performance. I’m really not a fan of doing this through Lua at all.

I get what you’re saying.
I’d just add this: this (profiling script/lua performance relative to frame time) is functionality that isn’t available in studio right now, and this is a limited tool for advanced users.

We’d like to move to better, more accessible tools. That’s all in the future.