I finished an upgrade to my old Minification code, which has been in need of an upgrade for a while now: it hadn’t even been uploaded to official site plugins page.
Release notes:
[ul]
[li]No more odd bits of brokenness - There shouldn’t be any odd bits of brokenness anymore other than a few bugs left over that I haven’t ironed out. The old minifier of mine didn’t really use a full comprehensive approach to minification, so it incorrectly handled some really odd edges cases. The new plugin should correctly handle any situation, because it fully understands and handles all of Lua’s scoping rules down to oddness like what happens when you use the same variable name multiple times in a variable / argument list.[/li]
[li]Almost optimal variable naming - The old minifier pretty much just looped over variables in the script and gave them names one after another incrementing the variable name to use each time. The new minifier uses a non-trivial algorithm to figure out a reasonably minimal set of variable names making heavy use of local variable shadowing where-ever possible to try to give as many variables as possible one character names (And does quite well, even minifying itself, a 3000 line script with 600+ variables it manages to use only one character variable names). Aside: Does anyone know if there are any papers on how one would go about finding the truly optimal variable naming? My naming is very good in most cases, but I can find some contrived “bad” cases where it results in a very inefficient set of name compared to the optimal one. I’m interested whether finding the optimal naming is actually a solved problem, and if it is, whether the algorithm is reasonable to use in practice.[/li]
[li]b Additional simplification passes[/b] - The new code base is set up so that I can add additional simplification passes that actually modify the AST rather than simply shuffling around whitespace and changing variable names. For instance, converting “if a then foo() end” into “_=a and foo()”[/li]
[li]b Beautifier[/b] - The new plugin includes a somewhat functional beautifier in addition to the minifier. This beautifier tries to format the code in the standard Lua formatting, adding correct indentation to the code. The beautifier also respects existing whitespace / comments, leaving them as they were before. Also includes a nice option for reverse engineering minified code, which will rename variables into nice unique annotated strings that you can easily find & replace in your text editor of choice as you inspect the un-minified script.[/li]
[/ul]
Overview of the variable renaming algorithm for anyone interested (You can find the code in the “MinifyVariables_2” function in the LuaSyntaxToolset module under the plugin):
First, the minifier runs through the AST, finding all variable references and variable declarations, as well as scoping information, in order to build up a tree of all of the scopes / variables / variable references.
Next, the minifier assigns each variable a “used names” set, which is initially empty. These are names names that that variable cannot have, as they have been used elsewhere in a way that would collide with the variable in question’s usage.
Next, the minifier sorts the variables in order of least used variables first. That is, the variables with the least references to them are to be renamed with the shortest variable names first. But wait, that seems a bit counterintuative?? Shouldn’t the variables with the most usages be the ones that have a high priority for getting short variable names? That’s the way it seems at first, but if you actually think about it: Variables with few usages tend to have very short lifetimes, and that means that there is a good chance that not very many other variables overlap with those lifetimes. If not very many other variables overlap with those lifetimes, then there’s a good chance that we can end up reusing these short variable names again for those other variables. I did some testing and I’m convinced that renaming the less used variables first is a better solution for all “practical” scripts: It is obviously possible to construct a case where this approach performs badly under this algorithm, but these cases are not remotely similar to any “real” code as far as I can see. TL;DR: By renaming infrequently used variables first, you tend not to actually “use up” many short variable names, since you’ll be able to “shadow over” / “shadow under” those infrequently used variables again with the same short variable names later.
Next, for each variable in the script in that sorted order, the script choses the next shortest available variable name for that variable. It does this by finding the shortest variable name that is not already in the “used names” set for that variable.
Once it has done the renaming, it has to go through every other variable in the script that has not been renamed yet, and update that variable’s “used names” set, adding this variable’s name if there is a collision between that variable and this one. Where exactly does a “collision” occur? Here are the exact circumstances (taken from a comment in the plugin source code). Note, “depth” in this context means how scopes deep the variable is nested, think of it has “how tabbed in” the variable is:
[ol]
[li] At the same depth, that overlap in usage-lifetime with this one. EG: “local a = 5 print(a) local b = 3 print(b)”, a and b do not overlap in usage lifetime there, since by the time “b” comes into scope, “a” is no longer need. They can be renamed to the same thing safely.[/li]
[li] At a deeper level, which have a reference to this variable in their lifetimes.[/li]
[li] At a shallower level, which are referenced during this variable’s lifetime[/li]
[/ol]
Make sense? Probably not at first. Take some time to really think about what these cases mean and you should be able to understand why those are the collision cases.