#+TITLE: UglifyJS -- a JavaScript parser/compressor/beautifier
#+KEYWORDS: javascript, js, parser, compiler, compressor, mangle, minify, minifier
#+DESCRIPTION: a JavaScript parser/compressor/beautifier in JavaScript
#+STYLE:
#+AUTHOR: Mihai Bazon
#+EMAIL: mihai.bazon@gmail.com
* UglifyJS --- a JavaScript parser/compressor/beautifier
This package implements a general-purpose JavaScript
parser/compressor/beautifier toolkit. It is developed on [[http://nodejs.org/][NodeJS]], but it
should work on any JavaScript platform supporting the CommonJS module system
(and if your platform of choice doesn't support CommonJS, you can easily
implement it, or discard the =exports.*= lines from UglifyJS sources).
The tokenizer/parser generates an abstract syntax tree from JS code. You
can then traverse the AST to learn more about the code, or do various
manipulations on it. This part is implemented in [[../lib/parse-js.js][parse-js.js]] and it's a
port to JavaScript of the excellent [[http://marijn.haverbeke.nl/parse-js/][parse-js]] Common Lisp library from [[http://marijn.haverbeke.nl/][Marijn
Haverbeke]].
( See [[http://github.com/mishoo/cl-uglify-js][cl-uglify-js]] if you're looking for the Common Lisp version of
UglifyJS. )
The second part of this package, implemented in [[../lib/process.js][process.js]], inspects and
manipulates the AST generated by the parser to provide the following:
- ability to re-generate JavaScript code from the AST. Optionally
indented---you can use this if you want to “beautify” a program that has
been compressed, so that you can inspect the source. But you can also run
our code generator to print out an AST without any whitespace, so you
achieve compression as well.
- shorten variable names (usually to single characters). Our mangler will
analyze the code and generate proper variable names, depending on scope
and usage, and is smart enough to deal with globals defined elsewhere, or
with =eval()= calls or =with{}= statements. In short, if =eval()= or
=with{}= are used in some scope, then all variables in that scope and any
variables in the parent scopes will remain unmangled, and any references
to such variables remain unmangled as well.
- various small optimizations that may lead to faster code but certainly
lead to smaller code. Where possible, we do the following:
- foo["bar"] ==> foo.bar
- remove block brackets ={}=
- join consecutive var declarations:
var a = 10; var b = 20; ==> var a=10,b=20;
- resolve simple constant expressions: 1 +2 * 3 ==> 7. We only do the
replacement if the result occupies less bytes; for example 1/3 would
translate to 0.333333333333, so in this case we don't replace it.
- consecutive statements in blocks are merged into a sequence; in many
cases, this leaves blocks with a single statement, so then we can remove
the block brackets.
- various optimizations for IF statements:
- if (foo) bar(); else baz(); ==> foo?bar():baz();
- if (!foo) bar(); else baz(); ==> foo?baz():bar();
- if (foo) bar(); ==> foo&&bar();
- if (!foo) bar(); ==> foo||bar();
- if (foo) return bar(); else return baz(); ==> return foo?bar():baz();
- if (foo) return bar(); else something(); ==> {if(foo)return bar();something()}
- remove some unreachable code and warn about it (code that follows a
=return=, =throw=, =break= or =continue= statement, except
function/variable declarations).
- act a limited version of a pre-processor (c.f. the pre-processor of
C/C++) to allow you to safely replace selected global symbols with
specified values. When combined with the optimisations above this can
make UglifyJS operate slightly more like a compilation process, in
that when certain symbols are replaced by constant values, entire code
blocks may be optimised away as unreachable.
** <>
The following transformations can in theory break code, although they're
probably safe in most practical cases. To enable them you need to pass the
=--unsafe= flag.
*** Calls involving the global Array constructor
The following transformations occur:
#+BEGIN_SRC js
new Array(1, 2, 3, 4) => [1,2,3,4]
Array(a, b, c) => [a,b,c]
new Array(5) => Array(5)
new Array(a) => Array(a)
#+END_SRC
These are all safe if the Array name isn't redefined. JavaScript does allow
one to globally redefine Array (and pretty much everything, in fact) but I
personally don't see why would anyone do that.
UglifyJS does handle the case where Array is redefined locally, or even
globally but with a =function= or =var= declaration. Therefore, in the
following cases UglifyJS *doesn't touch* calls or instantiations of Array:
#+BEGIN_SRC js
// case 1. globally declared variable
var Array;
new Array(1, 2, 3);
Array(a, b);
// or (can be declared later)
new Array(1, 2, 3);
var Array;
// or (can be a function)
new Array(1, 2, 3);
function Array() { ... }
// case 2. declared in a function
(function(){
a = new Array(1, 2, 3);
b = Array(5, 6);
var Array;
})();
// or
(function(Array){
return Array(5, 6, 7);
})();
// or
(function(){
return new Array(1, 2, 3, 4);
function Array() { ... }
})();
// etc.
#+END_SRC
*** =obj.toString()= ==> =obj+“”=
** Install (NPM)
UglifyJS is now available through NPM --- =npm install uglify-js= should do
the job.
** Install latest code from GitHub
#+BEGIN_SRC sh
## clone the repository
mkdir -p /where/you/wanna/put/it
cd /where/you/wanna/put/it
git clone git://github.com/mishoo/UglifyJS.git
## make the module available to Node
mkdir -p ~/.node_libraries/
cd ~/.node_libraries/
ln -s /where/you/wanna/put/it/UglifyJS/uglify-js.js
## and if you want the CLI script too:
mkdir -p ~/bin
cd ~/bin
ln -s /where/you/wanna/put/it/UglifyJS/bin/uglifyjs
# (then add ~/bin to your $PATH if it's not there already)
#+END_SRC
** Usage
There is a command-line tool that exposes the functionality of this library
for your shell-scripting needs:
#+BEGIN_SRC sh
uglifyjs [ options... ] [ filename ]
#+END_SRC
=filename= should be the last argument and should name the file from which
to read the JavaScript code. If you don't specify it, it will read code
from STDIN.
Supported options:
- =-b= or =--beautify= --- output indented code; when passed, additional
options control the beautifier:
- =-i N= or =--indent N= --- indentation level (number of spaces)
- =-q= or =--quote-keys= --- quote keys in literal objects (by default,
only keys that cannot be identifier names will be quotes).
- =-c= or =----consolidate-primitive-values= --- consolidates null, Boolean,
and String values. Known as aliasing in the Closure Compiler. Worsens the
data compression ratio of gzip.
- =--ascii= --- pass this argument to encode non-ASCII characters as
=\uXXXX= sequences. By default UglifyJS won't bother to do it and will
output Unicode characters instead. (the output is always encoded in UTF8,
but if you pass this option you'll only get ASCII).
- =-nm= or =--no-mangle= --- don't mangle names.
- =-nmf= or =--no-mangle-functions= -- in case you want to mangle variable
names, but not touch function names.
- =-ns= or =--no-squeeze= --- don't call =ast_squeeze()= (which does various
optimizations that result in smaller, less readable code).
- =-mt= or =--mangle-toplevel= --- mangle names in the toplevel scope too
(by default we don't do this).
- =--no-seqs= --- when =ast_squeeze()= is called (thus, unless you pass
=--no-squeeze=) it will reduce consecutive statements in blocks into a
sequence. For example, "a = 10; b = 20; foo();" will be written as
"a=10,b=20,foo();". In various occasions, this allows us to discard the
block brackets (since the block becomes a single statement). This is ON
by default because it seems safe and saves a few hundred bytes on some
libs that I tested it on, but pass =--no-seqs= to disable it.
- =--no-dead-code= --- by default, UglifyJS will remove code that is
obviously unreachable (code that follows a =return=, =throw=, =break= or
=continue= statement and is not a function/variable declaration). Pass
this option to disable this optimization.
- =-nc= or =--no-copyright= --- by default, =uglifyjs= will keep the initial
comment tokens in the generated code (assumed to be copyright information
etc.). If you pass this it will discard it.
- =-o filename= or =--output filename= --- put the result in =filename=. If
this isn't given, the result goes to standard output (or see next one).
- =--overwrite= --- if the code is read from a file (not from STDIN) and you
pass =--overwrite= then the output will be written in the same file.
- =--ast= --- pass this if you want to get the Abstract Syntax Tree instead
of JavaScript as output. Useful for debugging or learning more about the
internals.
- =-v= or =--verbose= --- output some notes on STDERR (for now just how long
each operation takes).
- =-d SYMBOL[=VALUE]= or =--define SYMBOL[=VALUE]= --- will replace
all instances of the specified symbol where used as an identifier
(except where symbol has properly declared by a var declaration or
use as function parameter or similar) with the specified value. This
argument may be specified multiple times to define multiple
symbols - if no value is specified the symbol will be replaced with
the value =true=, or you can specify a numeric value (such as
=1024=), a quoted string value (such as ="object"= or
='https://github.com'=), or the name of another symbol or keyword
(such as =null= or =document=).
This allows you, for example, to assign meaningful names to key
constant values but discard the symbolic names in the uglified
version for brevity/efficiency, or when used wth care, allows
UglifyJS to operate as a form of *conditional compilation*
whereby defining appropriate values may, by dint of the constant
folding and dead code removal features above, remove entire
superfluous code blocks (e.g. completely remove instrumentation or
trace code for production use).
Where string values are being defined, the handling of quotes are
likely to be subject to the specifics of your command shell
environment, so you may need to experiment with quoting styles
depending on your platform, or you may find the option
=--define-from-module= more suitable for use.
- =-define-from-module SOMEMODULE= --- will load the named module (as
per the NodeJS =require()= function) and iterate all the exported
properties of the module defining them as symbol names to be defined
(as if by the =--define= option) per the name of each property
(i.e. without the module name prefix) and given the value of the
property. This is a much easier way to handle and document groups of
symbols to be defined rather than a large number of =--define=
options.
- =--unsafe= --- enable other additional optimizations that are known to be
unsafe in some contrived situations, but could still be generally useful.
For now only these:
- foo.toString() ==> foo+""
- new Array(x,...) ==> [x,...]
- new Array(x) ==> Array(x)
- =--max-line-len= (default 32K characters) --- add a newline after around
32K characters. I've seen both FF and Chrome croak when all the code was
on a single line of around 670K. Pass --max-line-len 0 to disable this
safety feature.
- =--reserved-names= --- some libraries rely on certain names to be used, as
pointed out in issue #92 and #81, so this option allow you to exclude such
names from the mangler. For example, to keep names =require= and =$super=
intact you'd specify --reserved-names "require,$super".
- =--inline-script= -- when you want to include the output literally in an
HTML =
function f(a, b, c) {
var i, boo, w = 10, q = 20;
for (i = 1; i < 10; ++i) {
boo = foo(a);
}
for (i = 0; i < 1; ++i) {
boo = bar(c);
}
function foo() { ... }
function bar() { ... }
}
#+END_SRC
- =pro.ast_mangle(ast, options)= -- generates a new AST containing mangled
(compressed) variable and function names. It supports the following
options:
- =toplevel= -- mangle toplevel names (by default we don't touch them).
- =except= -- an array of names to exclude from compression.
- =defines= -- an object with properties named after symbols to
replace (see the =--define= option for the script) and the values
representing the AST replacement value.
- =pro.ast_squeeze(ast, options)= -- employs further optimizations designed
to reduce the size of the code that =gen_code= would generate from the
AST. Returns a new AST. =options= can be a hash; the supported options
are:
- =make_seqs= (default true) which will cause consecutive statements in a
block to be merged using the "sequence" (comma) operator
- =dead_code= (default true) which will remove unreachable code.
- =pro.gen_code(ast, options)= -- generates JS code from the AST. By
default it's minified, but using the =options= argument you can get nicely
formatted output. =options= is, well, optional :-) and if you pass it it
must be an object and supports the following properties (below you can see
the default values):
- =beautify: false= -- pass =true= if you want indented output
- =indent_start: 0= (only applies when =beautify= is =true=) -- initial
indentation in spaces
- =indent_level: 4= (only applies when =beautify= is =true=) --
indentation level, in spaces (pass an even number)
- =quote_keys: false= -- if you pass =true= it will quote all keys in
literal objects
- =space_colon: false= (only applies when =beautify= is =true=) -- wether
to put a space before the colon in object literals
- =ascii_only: false= -- pass =true= if you want to encode non-ASCII
characters as =\uXXXX=.
- =inline_script: false= -- pass =true= to escape occurrences of
=
Based on parse-js (http://marijn.haverbeke.nl/parse-js/).
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
#+END_EXAMPLE