Configuring Downstream Caches

Standard Configuration

Note: This feature is currently experimental. Options and configuration described here are subject to change in future releases. Please subscribe to the announcements mailing list to keep yourself informed of updates to this feature.

By default PageSpeed serves HTML files with Cache-Control: no-cache, max-age=0 so that changes to the HTML and its resources are sent fresh on each request. The HTML can be cached, however, if you:

For example, if you're running a cache on port 80 that reverse proxies to your site on port 8080, then you'd need to tell PageSpeed to send its PURGE requests to port 80:

Apache:
ModPagespeedDownstreamCachePurgeLocationPrefix http://localhost:80
Nginx:
pagespeed DownstreamCachePurgeLocationPrefix http://localhost:80;

You also need to give PageSpeed a key so it can allow the cache to request rebeaconing without allowing external entities to do so:

Apache:
ModPagespeedDownstreamCacheRebeaconingKey "<your-secret-key>"
Nginx:
pagespeed DownstreamCacheRebeaconingKey "<your-secret-key>";

These are the only changes you need to make to the PageSpeed configuration file, but before you restart you also need to make some changes to your cache configuration. These vary by cache; below are configurations for Varnish 3.x, Varnish 4.x, and Nginx's proxy_cache:

Varnish 3.x:
acl purge {
  # If PageSpeed isn't running on the same server as your cache, list the IP(s)
  # of the PageSpeed machine(s) here.
  "127.0.0.1";
}
sub vcl_recv {
  # Tell PageSpeed not to use optimizations specific to this request.
  set req.http.PS-CapabilityList = "fully general optimizations only";

  # Don't allow external entities to force beaconing.
  remove req.http.PS-ShouldBeacon;

  # Authenticate the purge request by IP.
  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
      error 405 "Not allowed.";
    }
    return (lookup);
  }
}

# Mark HTML as uncacheable.  If we can't send them purge requests they can't
# cache our html.
sub vcl_fetch {
   if (beresp.http.Content-Type ~ "text/html") {
     remove beresp.http.Cache-Control;
     set beresp.http.Cache-Control = "no-cache, max-age=0";
   }
   return (deliver);
}

sub vcl_hit {
  # Make purging happen in response to a PURGE request.  This happens
  # automatically in Varnish 4.x so we don't need it there.
  if (req.request == "PURGE") {
    purge;
    error 200 "Purged.";
  }

  # 5% of the time ignore that we got a cache hit and send the request to the
  # backend anyway for instrumentation.
  if (std.random(0, 100) < 5) {
    set req.http.PS-ShouldBeacon = "<your-secret-key>";
    return (pass);
  }
}

sub vcl_miss {
  # Make purging happen in response to a PURGE request.  This happens
  # automatically in Varnish 4.x so we don't need it there.
  if (req.request == "PURGE") {
    purge;
    error 200 "Purged.";
  }

  # Instrument 25% of cache misses.
  if (std.random(0, 100) < 25) {
    set req.http.PS-ShouldBeacon = "<your-secret-key>";
    return (pass);
  }
}
Varnish 4.x:
acl purge {
  # If PageSpeed isn't running on the same server as your cache, list the IP(s)
  # of the PageSpeed machine(s) here.
  "127.0.0.1";
}

sub vcl_recv {
  # Tell PageSpeed not to use optimizations specific to this request.
  set req.http.PS-CapabilityList = "fully general optimizations only";

  # Don't allow external entities to force beaconing.
  unset req.http.PS-ShouldBeacon;

  # Authenticate the purge request by IP.
  if (req.method == "PURGE") {
    if (!client.ip ~ purge) {
      return (synth(405,"Not allowed."));
    }
    return (purge);
  }
}

# Mark HTML as uncacheable.  If we can't send them purge requests they can't
# cache our html.
sub vcl_backend_response {
   if (beresp.http.Content-Type ~ "text/html") {
     unset beresp.http.Cache-Control;
     set beresp.http.Cache-Control = "no-cache, max-age=0";
   }
   return (deliver);
}

sub vcl_hit {
  # 5% of the time ignore that we got a cache hit and send the request to the
  # backend anyway for instrumentation.
  if (std.random(0, 100) < 5) {
    set req.http.PS-ShouldBeacon = "<your-secret-key>";
    return (pass);
  }
}
sub vcl_miss {
  # Instrument 25% of cache misses.
  if (std.random(0, 100) < 25) {
    set req.http.PS-ShouldBeacon = "<your-secret-key>";
    return (pass);
  }
}
Nginx proxy_cache:
http {
  # Define a mapping used to mark HTML as uncacheable.
  map $upstream_http_content_type $new_cache_control_header_val {
    default $upstream_http_cache_control;
    "~*text/html" "no-cache, max-age=0";
  }

  server {
    # PageSpeed's beacon dependent filters need the cache to let some requests
    # through to the backend.  This code below depends on the ngx_set_misc
    # module and randomly passes 5% of traffic to the backend for rebeaconing.
    set $should_beacon_header_val "";
    set_random $rand 0 100;
    if ($rand ~* "^[0-4]$") {
      set $should_beacon_header_val "<your-secret-key>";
      set $bypass_cache 1;
    }

    location / {
      # existing proxy_pass
      # existing proxy_cache
      # existing proxy_cache_key

      # What servers should we accept PURGE requests from?  If PageSpeed isn't
      # running on the same server as your cache, list the IP(s) of the
      # PageSpeed machine(s) here.
      #
      # This requires rebuilding with the ngx_cache_purge module:
      #   https://github.com/FRiCKLE/ngx_cache_purge
      proxy_cache_purge PURGE from 127.0.0.1;

      # Mark HTML as uncacheable.  If we can't send them purge requests they
      # can't cache our html.  Uses the map defined above.
      proxy_hide_header Cache-Control;
      add_header Cache-Control $new_cache_control_header_val;

      # Tell PageSpeed not to use optimizations specific to this request.
      proxy_set_header PS-CapabilityList "fully general optimizations only";

      # See discussion of rebeaconing above.
      proxy_cache_bypass $bypass_cache;
      proxy_hide_header PS-ShouldBeacon;
      proxy_set_header PS-ShouldBeacon $should_beacon_header_val;
    }
  }
}

When running with downstream caching all resources referenced from the HTML will be cache-extended as usual, so if you have resources that need to be cached for a short time then they can be stale. If so, either Disallow those resources, so PageSpeed doesn't inline or cache-extend them, or decrease the cache lifetime on your HTML.

Additional Options

The configuration above should be a good fit for most sites, but PageSpeed's downstream caching is highly configurable with many options that allow you to tweak it for your particular setup.

Beaconing

Several filters such as inline_images, inline_preview_images, lazyload_images and prioritize_critical_css depend extensively on client beacons to determine critical images and CSS. When such filters are enabled, pages periodically have beaconing JavaScript inserted as part of the rewriting process. The standard configuration passes through 5% of cache hits to the backend with a PS-ShouldBeacon header set, so that these filters can continue to receive the beacons they need.

If you have a high traffic site, 5% is probably a larger share than you need for PageSpeed to receive sufficient beacons. In that case you can decrease the percentage of traffic to pass through. For example, here's how you'd decrease it to 2%:

Varnish 3.x or 4.x:
-  if (std.random(0, 100) < 5) {
+  if (std.random(0, 100) < 2) {
Nginx proxy_cache
-  if ($rand ~* "^[0-4]$") {
+  if ($rand ~* "^[01]$") {

Alternatively, you may be willing to give up the benefit of the beaconing-dependent filters in exchange for never intentionally bypassing the cache. If so, you should turn off beaconing and beacon-dependent filters in PageSpeed:

Apache:
ModPagespeedCriticalImagesBeaconEnabled false
ModPagespeedDisableFilters prioritize_critical_css
Nginx:
pagespeed CriticalImagesBeaconEnabled false;
pagespeed DisableFilters prioritize_critical_css;

Additionally you should remove the proxy config that handles beaconing:

Varnish 3.x:
-  remove req.http.PS-ShouldBeacon;
...
-  if (std.random(0, 100) < 5) {
-    set req.http.PS-ShouldBeacon = "<your-secret-key>";
-    return (pass);
-  }
...
-  if (std.random(0, 100) < 25) {
-    set req.http.PS-ShouldBeacon = "<your-secret-key>";
-    return (pass);
-  }
Varnish 4.x:
-  unset req.http.PS-ShouldBeacon;
...
-  sub vcl_hit {
-    if (std.random(0, 100) < 5) {
-      set req.http.PS-ShouldBeacon = "<your-secret-key>";
-      return (pass);
-    }
-  }
-  sub vcl_miss {
-    if (std.random(0, 100) < 25) {
-      set req.http.PS-ShouldBeacon = "<your-secret-key>";
-      return (pass);
-    }
-  }
Nginx proxy_cache
-  set $should_beacon_header_val "";
-  set_random $rand 0 100;
-  if ($rand ~* "^[0-4]$") {
-    set $should_beacon_header_val "<your-secret-key>";
-    set $bypass_cache 1;
-  }
...
-  proxy_cache_bypass $bypass_cache;
-  proxy_hide_header PS-ShouldBeacon;
-  proxy_set_header PS-ShouldBeacon $should_beacon_header_val;

PageSpeed Resources

Because PageSpeed already caches its optimized resources, you may want to exclude them caching by the downstream cache. If so, you can set:

Varnish 3.x and 4.x:
+  if (req.url ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") {
+    return (pass);
+  }
Nginx proxy_cache
+  if ($uri ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") {
+    set $bypass_cache "1";
+  }

If you have enabled URL signing, change the 10 in the regexp to 20 to account for the additional characters in the hash.

PS-CapabilityList

Typically PageSpeed will produce different HTML for different browsers. For example, when responding to a request that has Accept: image/webp, PageSpeed knows the requesting browser supports WebP and so it can send these images, while if the Accept header doesn't mention WebP then it will send JPEG or PNG. To suppress this behavior, the standard configuration above sets a header:

  PS-CapabilityList: fully general optimizations only

This header can also be used to tell PageSpeed to make specific optimizations. There are five capabilities PageSpeed can take advantage of that aren't supported in all browsers, and it gives them each a code:

CapabilityCode
Inline Imagesii
Lazyload Imagesll
WebP Imagesjw
Lossless WebP Imagesws
Animated WebP Imageswa
Defer Javascriptdj

For example, you could include whether the Accept header includes image/webp in your cache key, and then for the fraction of traffic that claimed webp support send:

  PS-CapabilityList: jw:

Every page would go through to your origin twice and be cached twice, once processed with WebP support and once without.

You can combine multiple capabilities together with a comma. For example, if you decided to make a cache fragment for Chrome 30+, which supports all of these, for that fragment you would send:

  PS-CapabilityList: ll,ii,dj,jw,ws:

For Firefox 4+, which supports all of these but WebP, you would send:

  PS-CapabilityList: ll,ii,dj:

To use this header properly, however, you have to know which capabilities are supported by which browsers in the version of PageSpeed you're using and craft regular expressions to match exactly those ones. This is very difficult to do in general because it involves duplicating the code in user_agent_matcher.cc as regexes, but a simple division is:

Purging with GET

If you're integrating PageSpeed with a cache that doesn't support PURGE requests but does support purging in response to a prefixed GET request, PageSpeed can support that. You would configure your cache to treat a GET to /purge/foo/bar as a request to purge /foo/bar and configure PageSpeed as:

Apache:
ModPagespeedDownstreamCachePurgeLocationPrefix http://CACHE-HOST:PORT/purge
ModPagespeedDownstreamCachePurgeMethod GET
Nginx:
pagespeed DownstreamCachePurgeLocationPrefix http://CACHE-HOST:PORT/purge;
pagespeed DownstreamCachePurgeMethod GET;

Purge Threshold

Whenever PageSpeed serves an HTML response that is not fully optimized it continues rewriting in the background. When it finishes, if the HTML it served was less than 95% optimized, it sends a purge request to the downstream cache. The next request to come in will bypass the cache and come back to PageSpeed where it can serve out the now more highly optimized page. If you want to change what point PageSpeed considers the page done and stops optimizing, you can set a different value for this threshold. For example, to lower it to 80%, so that PageSpeed is satisfied with a page that is only 80% optimized, you would set:

Apache:
ModPagespeedDownstreamCacheRewrittenPercentageThreshold 80
Nginx:
pagespeed DownstreamCacheRewrittenPercentageThreshold 80;

Script Variables

Note: Nginx-only

Note: New feature as of 1.10.33.0

In ngx_pagespeed DownstreamCachePurgeLocationPrefix, DownstreamCachePurgeMethod, and DownstreamCacheRewrittenPercentageThreshold support script variables, so it's possible to set them on a per-request basis. Turn this on with:

http {
  pagespeed ProcessScriptVariables on;
  ...
}
You can then use script variables in arguments for these commands:
  pagespeed DownstreamCachePurgeLocationPrefix "$purge_location";
  pagespeed DownstreamCachePurgeMethod "$cache_purge_method";
  pagespeed DownstreamCacheRewrittenPercentageThreshold "$rewrite_threshold";

For more details on script variables, including how to handle dollar signs, see Script Variable Support.

Implementation Details

To support downstream caching PageSpeed sends a purge request to the caching layer whenever it identifies an opportunity for more rewriting to be done on content that was just served. Such opportunities could arise because of, say, the resources now becoming available in the PageSpeed cache or an image compression operation completing. The cache purge forces the next request for the HTML file to come all the way to the backend PageSpeed server and obtain better rewritten content, which is then stored in the cache. This interaction between the PageSpeed server and the downstream caching layer is depicted in the diagram given below.

In the interaction depicted above, note that the partially optimized HTML will be served from the cache until a purge request gets sent by the PageSpeed server. It is recommended to set up PageSpeed and the downstream caching layer servers on a one to one basis so that the purges can be sent to the correct downstream server.