Page MenuHomePhabricator

JavaScriptMinifier: Save bytes by normalising escaped unicode sequences
Closed, DeclinedPublic

Description

brion at r98281:

"Note that identifiers using escapes don't get normalized to their UTF-8 form; this might be a nice thing to do as it saves a couple bytes, but currently there's no change made to output."

Please normalize escape sequences, when minifier javascript (not validate). Thanks.


Version: 1.20.x
Severity: enhancement

Details

Reference
bz31286

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 11:50 PM
bzimport set Reference to bz31286.
bzimport added a subscriber: Unknown Object (MLST).

Bumping priority down since it's a fairly rare case, but it'd be nice to fix!

JSMin+ (JSMinPlus) was patched in r98281 to accept Unicode chars & escapes in identifiers for validation. A further tweak to have it retain the decoded form for the escapes would result in slightly smaller output from JSMin+.

However, we actually do our minification using the faster (but not quite as compressy) home-brewed JavaScriptMinifier -- so if we want that on MediaWiki we'll need to poke that side. :)

For minification purposes, this may actually make a bigger impact on string literals than identifiers: for instance WikiEditor's jquery.wikiEditor.toolbar.config.js contains a butt-ton of single Unicode characters as \uXXXX escapes -- every one is 6 bytes of source but would be 2-4 bytes if decoded.

That potentially saves a couple hundred bytes here. Not huge, but they add up.

This would be nice to have in the minifier, indeed.

On the other hand, since UTF-8/Unicode characters are legal in JavaScript regardless (hence they may be decoded by a minifier), one might as well just put them decoded in the source file in the first place!

For third party/upstream resources this may not be an option, but at the very least it would be an upstream bug. It is simply not needed to encode them in JavaScript. And if using UTF-8 in a file is a problem, then one likely has bigger problems to deal with (anno 2012 one certainly should be able to deal with that).

The reason they are encoded in jquery.wikiEditor.* may be because PHP's json_encode() enforced that until PHP 5.4 (since 5.4 it is possible to disable that redundant encoding).

From recent experience with jQuery, decoding things like this can lead to unexpected bugs.

Marking bugs that suggest altering the token stream in JavaScriptMinifier as
wontfix.

If we decide to go that way, it is probably best to use existing libraries such
as UglifyJS which are much more experienced with this sort of thing.

For now, we aren't using that yet because of the performance penalty involved with iterating over the token stream and reformatting it (since we do all of this on-demand on production servers).

If and when we do that, these bugs are redundant anyway as most if not all javascript reformatting applications have these features already.