PSA: Please don't rely on the format of `debug.traceback` results

Noble_Draconian · May 13, 2019, 11:15pm

I don’t rely at all on the format of the error, It’s just a hassle to debug your code when the error traceback only leads you to the pcall or thread spawn, and not the actual line itself where the error occurred.

Anaminus · May 13, 2019, 11:53pm

I was about to make a request related to this. Any chance we could get traces that are more semantic? That way we don’t have to worry about the format at all.

zeuxcg · May 14, 2019, 12:15am

These two situations are somewhat different.

For pcall, we currently make a thread in pcall implementation to be able to yield inside of it. This is a significant problem wrt performance. There’s a desire to fix this by reimplementing pcall but no concrete plan yet (aka we know we want to do it but we don’t yet know how to best approach it).

For coroutines, they are semantically disjoint from the resuming thread in a way, so it would need to be handled differently. Can you explain the situation where you want a callstack to span several coroutines?

Maximum_ADHD · May 14, 2019, 12:54am

Sorry about that. I didn’t anticipate any changes being made to the structure of the stack trace.

Do you guys plan to make any changes to the debugger instances? They’re documented on the DevHub as well: https://developer.roblox.com/api-reference/class/ScriptDebugger

Agreed. It’d be nice to have a structured dictionary instead of having people try to parse the stack trace.

Noble_Draconian · May 14, 2019, 1:46am

With the way my code is structured, I tend to call functions inside special modules via coroutine.wrap(). Said Start functions also call module functions inside coroutine.wrap().

My code is structured in a “Service/Controller” (modified MVC) format; that is my framework loads, initializes, and automatically starts services when the server runs. Each service handles a different aspect of the game, e.g. DataService handles the loading/saving of player data.
On the client I have something similiar, called “Controllers”. Controllers behave just like “Services” do, except they run on the client instead of the server.

A specific example would be the MarketController in my game. It handles sending requests to the market service (server side) and it also loads up the market/inventory UIs.

The code in question roughly looks like this (irrelevant code was removed):

local MarketController={}

local ShopUI;
local InventoryUI;

function MarketController:Init() --This is called when my framework loads the controller
    ShopUI=require(script.ShopUI)
    setmetatable(ShopUI,{__index=MarketController})
    InventoryUI=require(script.InventoryUI)
    setmetatable(InventoryUI,{__index=MarketController})

    ShopUI:Init()
    InventoryUI:Init()
end

function MarketController:Start() --This is called when the framework has loaded all controllers
    coroutine.wrap(ShopUI.Start)(ShopUI)
    coroutine.wrap(InventoryUI.Start)(InventoryUI)
end

return MarketController

In the ShopUI and InventoryUI modules (which handles UI state and interaction for their respective UIs), I can call various methods via self, such as self:SortItems() or self:PurchaseItem() (a method of the marketcontroller that is exposed to the shop UI module via __index).

If there is an error inside of the ShopUI or InventoryUI modules, the stack trace isn’t accurate/only traces to the coroutine.wrap(). This makes debugging embedded modules a hassle, as a lot of the systems in my game use this method.

IdiomicLanguage · May 14, 2019, 2:11am

Since we shouldn’t rely on the format of stack traces, could we receive an actual interface to detect errors and handle them game wide? I’ve personally used the LogService.MessageOut and ScriptContext.Error events in a live game to detect errors which would then be parsed and stored in a database and sent to me via text. It would be wonderful if there was a better method to be on the lookout for errors and record them.

zeuxcg · May 14, 2019, 3:18am

Assuming you’re asking for an API to give a structured callstack representation, I’m not sure we should make one. It’s easy to do, but it seems like a trap.

Callstack is an array of call frames, where each frame is currently identified by a script, a line number and a function name. However:

Callstack entries can arbitrarily disappear and reappear due to inlining and changes in inlining heuristics
Callstack entries can arbitrarily disappear due to tail calls (we don’t have guaranteed tail calls but may introduce restricted tail calls for optimization in the future)
Line numbers can arbitrarily change due to changes in code generation (for example, in a multiline function call which line to associate with the call itself is ambiguous)
Function names can arbitrarily change due to changes in compiler (for example, which name to assign to function Foo.Bar:Baz() is unclear)
Function names can arbitrarily disappear and reappear due to changes in naming heuristics when names aren’t specified (see Moo.baz example from the original post)

Effectively, we can make an API that produces a callstack, but every single bit of information returned by this API will be fragile. At which point you’re probably better off not having an API in the first place.

zeuxcg · May 14, 2019, 3:19am

ScriptContext.Error is still the recommended way to detect errors in a live game (and log them via a third-party analytics service). We do need better first-class support for this on the platform level, but that’s not directly tied to the format and mechanics of error generation.

Anaminus · May 14, 2019, 4:25am

It seems like all of those problems would also occur with with the current string-based traceback.

As pointed out by @Maximum_ADHD, we already have a way to get a full featured callstack. It’s limited to the studio debugger, and rightfully so, being chunky and expensive to lug around as most debugging stuff is.

On the other hand, all I’m looking for is a table with some fields containing the same information already present in the traceback string:

Source (as a LuaSourceContainer if possible, just the string otherwise)
Line number
Variable name/type (if available)

If the information can be put into a string, then surely it can be put into a table.

I’m trying out an errors-are-values approach, where a function returns the error rather than throwing it. Usually, this doesn’t require anything more than a single stack frame, if that. The fewer the frames, the cheaper it is to create errors. If needed, the error can be wrapped in another error one level up, containing the next frame, and so on.

My theoretical error-creating function might look like this:

function NewError(includeStackFrame, ...)
	local err = {
		message = pack(...),
	}
	if includeStackFrame then
		err.frame = getStackFrame(2) -- frame of caller
	end
	return err
end

Currently I use debug.traceback to get the full trace, parse out just the first frame, and attempt to locate the referenced script. All assuming the arbitrary script name isn’t trying to sabotage the parser.

zeuxcg · May 14, 2019, 5:26am

Correct, but importantly there’s more obviously no promise of stability. Exposing a “nicer” API doesn’t seem valuable if the API can’t be relied upon.

Currently I use debug.traceback to get the full trace, parse out just the first frame, and attempt to locate the referenced script.

Why parse the first frame out instead of keeping the entire trace around? FWIW debug.traceback is substantially faster in the new VM.

IdiomicLanguage · May 14, 2019, 6:25am

This seems to be fundamentally backward to me. The structured / parsed data should be the original object and if needed to be displayed in a human-readable format then it can be easily stringified in any desired format.

The main advantage to having access to the structured (parsed) error data is that it can be manipulated without human intervention. Error counts and statistics for files and functions can be generated which would be helpful even if not always complete. The data may be useful for game-wide error handling as well.

Anaminus · May 14, 2019, 6:27am

If stack traces aren’t stable or reliable, then there’s really no point in having them at all. In fact, exposing them in any way whatsoever would do more harm than help.

Mainly because it adds noise. Since I have errors being returned rather than thrown, they must be handled all the way down the stack. Consider this example:

function Add(a, b)
	if type(a) ~= "number" or type(b) ~= "number" then
		return 0, WrapError(nil, "value must be a number")
	end
	return a + b, nil
end

function AddEach(...)
	local a = 0
	for i, b in ipairs({...}) do
		local c, err = Add(a, b)
		if err ~= nil then
			return 0, WrapError(err, "bad argument #" .. i)
		end
		a = c
	end
	return a
end

local total, err = AddEach(7, 17, "37", 47)
if err ~= nil then
	print("ERROR:", err)
	return
end
print("RESULT:", total)

The WrapError function wraps an error around another error. Code can then unwrap or inspect the error and decide what to do (usually it’s propagating the error). When the error is converted to a string, it can step through the chain of wrapped errors to construct a readable result:

bad argument #3: value must be a number
Stack trace:
	script:13: function AddEach
	script:3: function Add

Such errors could also be constructed to exclude the stack frame entirely. This might be useful for patterns like Promises, which otherwise produce bloated stack traces that are hard to read.

zeuxcg · May 14, 2019, 6:33am

The point of stack traces is to convey information about the location and circumstances of the error to a human. They are not stable in that you can’t rely on them having specific properties that persist indefinitely - for example, given a hypothetical structured API, debug.getstackframe(1).function == "foo" may or may not return true depending on various factors mentioned above in the thread. You should use them to be able to capture stack information and later display it or log it.

For example, a very valid use of debug.traceback is to build generic constructs, like promises, that can record error location and provide debug utility in other cases, see roblox-lua-promise/lib/init.lua at master · LPGhatguy/roblox-lua-promise · GitHub as an example. Note that in all uses of the backtrace the result is saved to be fed to a formatting/printing function later.

Anaminus · May 14, 2019, 7:10am

I see. Having a “source” field that refers directly to the script would indeed be unstable. I’m still uncertain about the usefulness of inaccurate frames, though. Wouldn’t the first thing I’d do when I see a trace be to open up the referred script and go to the referred line?

Anyway, my purpose for parsing is not to make decisions based on the content of a frame, but to decide how to display the content. At the very least, it would be useful to have finer control which frames are produced. The level argument of debug.traceback lets you chop off the top, but it would be nice to also chop off the bottom, or just have frames in a list.

sncplay42 · May 14, 2019, 7:13am

Off topic a bit here, but is this a change from vanilla behaviour where return f(some, args) is a tail call?

Or am I forgetting an old change and Roblox Lua already doesn’t do that?

zeuxcg · May 14, 2019, 2:43pm

Roblox Lua already doesn’t do that since 2013 or so - don’t quite remember when we removed it but we did. The primary motivation for this was to make stack traces easier to understand and debugging easier to use; the secondary motivation was a new feature that would require stack tracing that tail calls would prevent; the feature never happened though.

In general tail calls aren’t super interesting to us, and calls are really fast in the new VM, but it’s possible that at some point in the future we’ll decide that we actually want to emit tail calls in some cases for performance in optimized builds.

DataBrain · May 14, 2019, 2:55pm

This is cool and all… but I kind of need to read the traceback for my unit tester.

I’m using string patterns on the traceback obtained from an xpcall, since I don’t know of any other way to get the error. If the format changes, I’ll just have to change the unit tester’s code. I don’t know how else I would do it.

zeuxcg · May 14, 2019, 5:19pm

If you need to do it, you need to do it. Just be aware that the format can change and don’t rely on this as part of code that’s critical for your game to function.

SteadyOn · May 14, 2019, 6:15pm

I wasn’t even aware of xpcall’s existance. Seems really useful and I’ll definitely start using it. Thanks!

buildthomas · September 11, 2019, 6:16pm

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.