Canonicalize JavaScript Libraries

Configuration

The 'Canonicalize JavaScript Libraries' filter is enabled by specifying:

Apache:
ModPagespeedEnableFilters canonicalize_javascript_libraries
Nginx:
pagespeed EnableFilters canonicalize_javascript_libraries;

in the configuration file.

Description

This filter identifies popular JavaScript libraries that can be replaced with ones hosted for free by a JavaScript library hosting service — by default the Google Hosted Libraries. This has several benefits:

In Apache the default set of libraries can be found in the pagespeed_libraries.conf file, which is loaded along with pagespeed.conf when Apache starts up. It contains signatures for all the Google Hosted Libraries. In Nginx you need to convert pagespeed_libraries.conf from Apache-format to Nginx format:

$ scripts/pagespeed_libraries_generator.sh > ~/pagespeed_libraries.conf
$ sudo mv ~/pagespeed_libraries.conf /path/to/nginx/configuration_files/

You also need to include it in your Nginx configuration by reference:

include pagespeed_libraries.conf;

Don't edit pagespeed_libraries.conf. Local edits will keep you from being able to update it when you update PageSpeed. Rather than editing it you should add additional libraries to your main configuration file:

Apache:
ModPagespeedLibrary 43 1o978_K0_LNE5_ystNklf \
   //www.modpagespeed.com/rewrite_javascript.js
Nginx:
pagespeed Library 43 1o978_K0_LNE5_ystNklf
   //www.modpagespeed.com/rewrite_javascript.js;

The general format of these entries is:

Apache:
ModPagespeedLibrary bytes MD5 canonical_url
Nginx:
pagespeed Library bytes MD5 canonical_url;

Here bytes is the size in bytes of the library after minification by PageSpeed, and MD5 is the MD5 hash of the library after minification. Minification controls for differences in whitespace that may occur when the same script is obtained from different sources. The canonical_url is the hosting service URL used to replace occurrences of the script. Note that the canonical URL in the above example is protocol-relative; this means the data will be fetched using the same protocol (http or https) as the containing page. Because older browsers don't handle protocol-relative URLs reliably, PageSpeed resolves a protocol-relative library URL to an absolute URL based on the protocol of the containing page. Do not use http canonical URLs in configurations that may serve content over https, or the rewritten pages will expose your site to attack and trigger a mixed-content warning in the browser. Similarly, avoid using https URLs unless you know that the resulting library will eventually be fetched from a secure page, as SSL negotiation adds overhead to the initial request.

Additional library configuration metadata can be generated with the help of the pagespeed_js_minify utility installed along with PageSpeed. To use this utility, you will need a local copy of the JavaScript code that you wish to replace. If this is stored in library.js, you can generate bytes and MD5 as follows:

Apache:
$ pagespeed_js_minify --print_size_and_hash library.js
Nginx:
$ cd /path/to/psol/lib/Release/linux/ia32/
      $ pagespeed_js_minify --print_size_and_hash library.js

If you're using the new javascript minifier, add the --use_experimental_minifier argument to pagespeed_js_minify. If you're using the old minifier, add --nouse_experimental_minifier. (As of 1.10.33.0, --use_experimental_minifier is default. Previously, --nouse_experimental_minifier was.) The default pagespeed_libraries.conf includes hashes for both the old and new minifiers.

This filter is based on the best practices of optimizing browser caching and reducing payload size.

Operation

In order to identify a library and canonicalize its URL, PageSpeed must of course be able to fetch the JavaScript code from the URL on the original page. Because library canonicalization identifies libraries solely by their size and hash signature, it is not necessary to authorize PageSpeed to fetch content from the domain hosting the canonical content itself. This means that it is safe to use this filter behind a reverse proxy or in other situations where network access by PageSpeed is deliberately restricted. Browsers visiting the site will fetch the content from the canonical URL, but PageSpeed itself does not need to do so.

Examples

You can see the filter in action at www.modpagespeed.com on this example.

If the HTML document looks like this:

<html>
  <head>
    <script src="jquery_1_8.js">
    </script>
    <script src="a.js">
    </script>
    <script src="b.js">
    </script>
  </head>
  <body>
  ...
  </body>
</html>

Then, assuming jquery_1_8.js was an unminified copy of the jquery library and a.js and b.js contained site-specific code that made use of jquery, the page would be rewritten as follows:

<html>
  <head>
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js">
    </script>
    <script src="a.js">
    </script>
    <script src="b.js">
    </script>
  </head>
  <body>
  ...
  </body>
</html>

The library URL has been replaced by a reference to the canonical minified version hosted on ajax.googleapis.com. Note that canonical libraries do not participate in most other JavaScript optimizations. For example, if Combine JavaScript is also enabled, the above page will be rewritten as follows:

<html>
  <head>
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js">
    </script>
    <script src="http://www.example.com/a.js+b.js.pagespeed.jc.zYiUaxFS8I.js">
    </script>
  </head>
  <body>
  ...
  </body>
</html>

The canonical library is not combined with the other two JavaScript files, since this would lose the bandwidth and caching benefits of fetching it from the canonical URL.

If defer_javascript is enabled, and library is not tagged with data-pagespeed-no-defer, the canonicalized library is deferred.

Requirements

Only complete, unmodified libraries referenced by <script> tags in the HTML will be rewritten. Libraries that are loaded by other means (for example by injecting a loader script) or that have been modified will not be canonicalized.

Risks

You must ensure that you abide by the terms of service of the providers of the canonical content before enabling canonicalization. The terms of service for the default configuration can be found at https://developers.google.com/speed/libraries/terms.

The canonical URL refers to a third-party domain; this can cause additional DNS lookup latency the first time a library is loaded. This is mitigated by the fact that the canonical copy of the data is shared among multiple sites.

The initial request for a canonical URL will contain a Referer: header with the URL of the referring page. This permits the host of the canonical content to see a subset of traffic to your site (the first load of a page on your site that contains an identified library by a browser that does not already have that library in its cache). The provider should describe how this data is used in its terms of service. The terms of service for the default configuration can be found at https://developers.google.com/speed/libraries/terms. Again, this risk is mitigated by the fact that canonical libraries are shared among multiple sites; a popular library is likely to already reside in the browser cache.

Sites serving content on both http and https URLs must use protocol-relative canonical URLs as shown above. Fetching a library insecurely from a secure page exposes a site to attack. Fetching a library securely from an ordinary page can increase load time due to SSL overheads.

It is theoretically possible to craft a JavaScript file whose minified size and hash exactly match that of a canonical library, but whose code behaves differently. In such a case the library will be replaced with the canonical (widely-used) library. This will break the page that contains the reference to the crafted JavaScript.