Lua performance profiling API

ScriptOn · September 14, 2016, 12:22am

Excellent. I don’t know if it works like this but here’s my proposition:

Kinda like we have these dots to watch/pause code, I think enabling the profiler in a similar manor would be awesome. Maybe a click+drag or a click+shift click to select everything inside it.

Here’s my second idea (again, no idea if it works like this):

These things where we extend/expand folds could be right clicked. It’ll turn the arrow purple or something and everything inside it gets profiled.

I’d imagine it’s a lot easier to just point your mouse and go ‘please look at this’ than it would be to type out code for every individual thing.

ContextLost · September 14, 2016, 12:24am

Is better integration with scripting an option? In Unity you really never have to use those annotations. They do internally in their C++ code where there’s no other option, but that’s another thing. It’s already broken down in a tree by function. That would really be preferable. Else I vote for 1, be direct.

EchoReaper · September 14, 2016, 12:27am

I’m not really sure how this is going to work, but I’d definitely want to be able to close them out of order. I’d want to be able to test and view results in a fashion similar to this:

with functions/script execution time instead of file sizes. For instance, that list could read something like this:

Script (3s)
Main (3s)
- AnotherFunction (1s)
- Function2 (2s)

Edit:

I agree, if all of this could be automatically tracked and I could open up a tree of all the functions in the script, that’d be a lot better than the proposed choices. The provided choices are the equivalent of me having to add print(variableState) every other line in my scripts – it sucks to do stuff like this manually. The control is good for when I want to be able to analyze a specific piece of code like a part of a function, but I’d rather have it be usable in general than have extra power for one in a blue moon use that auto-profiling couldn’t cover.

ScriptOn · September 14, 2016, 12:28am

What do you mean? Like the explorer in studio except instead of workspace you see

RunService → RenderStepped Heartbeat etc

and workspace → FindFirstChild, Ray, Region3 stuff etc

If so that’d be freaking amazing. Again I’m not sure how this stuff works but I’d love to be able to do:

LocalClientScript1 (whatever the script is) → workspace ->FindFirstChild → Usage: 230 Cycles, 0.2%, Runtime 500ms (some stats idk)

Another example would be

ControlScript → RunService → RenderStepped → Usage: 200 Cycles, 1%, Runtime: 50ms

spotco · September 14, 2016, 12:28am

@ContextLost

We’re working on a more beginner-friendly “why is my game running slow” studio tool, but that’s several months down the line. This is specifically is for the ctrl+f6 frame visualizer, which is for debugging long-running lua code (and a whole lot less work to do)

@ScriptOn

Better studio integration can definitely happen if this ships and becomes a consistently useful tool for developers

spotco · September 14, 2016, 12:34am

A few more specifics on what this would look like:

For something like this (this code would obviously slow down your game and cause frame rate problems):

You’d be able to see this:

You’d be able to tell that the code inside the “alltests” annotation takes up multiple rendering frames, and that “multest” and “addtest” take about the same time.

ScriptOn · September 14, 2016, 12:36am

Would this be able to completely track how long Instance.new and stuff takes? I know there’s another area in there that you can really zoom into to see how long it takes.

spotco · September 14, 2016, 12:37am

Yeah this would work for anything that isn’t yielding (*async, *wait, etc).

0xBAADF00D · September 14, 2016, 12:38am

I think you’re on to something here. To expand on that, what if this was just integrated with Studio’s script editor? It could parse these debug.profilebegin()/debug.profileend() calls, hide them in the displayed source code, and use them to render an overlay. Then during execution, perhaps you could hover over the overlay to see the average time for each section.

Just some ideas, I have no clue what the official plan on this is.

Corecii · September 14, 2016, 12:42am

Just to make sure I understand this right:

When no parameter is provided, choice 2 works exactly like choice 1? Choice 2 only adds functionality, it doesn’t remove any?

Which means, if I want to make sure I’m matching my profilebegin and profileend properly (easily), only choice 2 lets me do that? Choice 1 will only error when there’s one extra profileend or one too few. With a lot of code running profiler tests that could be a mess to debug. Choice 2 will error as soon as I provide the wrong argument, giving me a section in which the erroneous profilebegin and profileend matching is within, right?

If all of that is true, then choice 2 provides exactly the same functionality as choice 1 while also making it easier to debug if one messes up profilebegin and profileend matching. Developers can program things just like choice 1 which is easy to use, while also having the option to be sure things are matching up if they need or want to use it.

spotco · September 14, 2016, 12:45am

That’s right, except we’re not 100% on making the parameter in choice 2 optional. Do you think you’d take the time to use properly match the returned tokens, or would you just drop them if given the choice?

Corecii · September 14, 2016, 12:54am

Edit, being concise!

I will drop tokens it if the code is small, simple, and clear.

I will use tokens in complicated sections where errors or inaccuracies caused by begin/end mismatches would be hard to track.

Tokens are useful for preventing mismatches which cause inaccuracies and errors. This is not a big deal in small sections of code, but in complicated sections it is hard to manually track begins/ends, so tokens speed things up.

To avoid wasting time debugging, some developers might make a token system themselves if choice 1 is implemented.

Old, long post

It would depend on what I was profiling. In most cases I’d likely drop them if given the choice. Edit: I’m not so sure now. I’m becoming a fan of being strict about using tokens in order to prevent errors and inaccuracies.

If I have a big, complicated section then I’d likely use them in order to make sure I am matching my profilebegin and profileend properly so that I’m getting accurate results and not making errors.

I would also use them if I got a “too many” or “too few” error. I’d stick the tokens in the places I was unsure about and see if they error. This is very important in big, complicated sections if a lot of profiling is going on. One might have to track profiling through module loading, calling of functions in other scripts/modules, and other ways that are hard to track. Having to manually look through from the first profilebegin and count possibly to the last profileend to find the mismatch could take a lot of work and time, which tokens solves. Just tacking on a profilebegin or profileend will give inaccurate results. With choice 1, some developers might even go about wrapping the API with token functionality in order to have their own way to check such things and avoid wasting time on debugging their profile code.

Major edit: Also to note, one might mismatch begins and ends without noticing it. If they have the right number with the mismatch then they will have inaccurate profiler results. Tokens solves this too.

EchoReaper · September 14, 2016, 1:16am

Changed my vote to “doesn’t matter to me” after some thinking.

I’m probably not going to use this until it’s either automated or handled like breakpoints. I’m already not a fan of adding print(state) every other line, so I don’t see why I’d like to use a similar implementation for Lua profiling – the proposed choices are hardly different from taking the time at the beginning of a function and printing current-start at the end, with the only difference being that I get a GUI display instead of messages in the output.

I don’t want to continually have to perpetually type/remove the same stuff over and over again while I’m performance testing either, especially if it means losing track of start/end points for profiling and causing errors every time I check performance. I’m really not a fan of doing this through Lua at all.

spotco · September 14, 2016, 1:20am

I get what you’re saying.
I’d just add this: this (profiling script/lua performance relative to frame time) is functionality that isn’t available in studio right now, and this is a limited tool for advanced users.

We’d like to move to better, more accessible tools. That’s all in the future.

Sharksie · September 14, 2016, 3:30am

I didn’t realize everything was on the same stack. Choice 1 would work but I’d want to name the scopes to make them easier to read, and at that point it’s just choice 2.

That’s fine. The only case where this would be a problem is when you want to yield before closing, which isn’t something that you should do anyway…

B_rcode · September 14, 2016, 4:24am

If it’s using stacks, then what’s the point of choice 2? (if I understood correctly)

spotco · September 14, 2016, 5:24am

Verification that the close is for the open that you expect it to be (otherwise will exception)

Sharksie · September 14, 2016, 5:57am

Will there be any options for extra data? I want to be able to associate a segment with specific data and read the data when I mouse over it. That would be really convenient for debugging. Without extra data I can only tell how long a set of code is taking, but with data I can tell how long a set of code is taking under specific circumstances.

einsteinK · September 14, 2016, 12:34pm

The first choice seems obvious, as it’s similar to Lua’s if-statements and all the things.
The third choice restricts funtionality a lot, while we could make that function with choice 1/2.

Although as a Lua-tionist the first choice seems the best, choice 2 seems better.
The need to end the right token will solve 70% of the issues, being devs not using it properly.
(if they use the token argument for profileend() otherwise they’re still (semi-)screwed)
It also allows implementing choice 1, although implementing choice 2 with choice 1 is also possible.

I was gonna slightly prefer choice 2 over choice 1, but then I (re)read that small note under choice 2:

(basically if choice 2 is implemented, you can act as if choice 1 is implemented)

Choice 2 for the win!

Questions related to details of the implementation

What if we do yield? Will it error, similar to yielding in a non-__call metamethod?
What if the code errors during a profiling?
Not that big of an issue if the profiling is started/stopped in the same thread.
If it isn’t, then it seems like a big issue…
What about multiple (sub)threads? (a bit related to the previous question)
Some simple stuff like calling pcall already creates another thread.
with the “can’t yield” limitation, I actually don’t expect much problems
(well, lots of problems if a profiling won’t get ended because of an error)

Also, for people that don’t like adding/removing print/debug statements a lot:
It wouldn’t be too difficult to write a plugin that can “convert” a script to a debug version.
The converted version could be parented to it, or switched with the original 'n stuff.
(with the right kind of converting, scripts could be converted as normal>debug>normal>…)
Converting would be replacing comments with certain instructions:

--@profile("Test")
while a() do b() end
-- Results in the loop being wrapped
debug.profilebegin("Test")
while a() do b() end debug.profileend()
-- Could be done for functions and all other "scope" statements

--@profile("Test")
func()
-- Results in the function being wrapped
debug.profilebegin("Test") func() debug.profileend()

Just need to edit a fixed version of Stravant’s LuaToolset stuff, which is currently used for minifying etc

spotco · September 14, 2016, 5:41pm

@Sharksie
All you can pass into the profilebegin is a string (that will group with all the other profilebegins of that string name). From the sound of it you should be able to implement what you were talking about using strings you dynamically create.

@einsteinK

Yield, Wait or Error will immediately close all active annotations (if your code isn’t consistent, we need to make the annotation stack consistent at the very least).

So, it’s pretty easy to write something like:

debug.profilebegin(“test1”)
debug.profilebegin(“test2”)
wait()
debug.profileend()
debug.profileend()

(The annotation stack state after the wait is empty, so both of those profileends are invalid).

pcall
behaves as a normal method call