Page Speed Optimization Libraries  1.13.35.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Classes | Namespaces
url_to_filename_encoder.h File Reference
#include <cstddef>
#include "pagespeed/kernel/base/string.h"
#include "pagespeed/kernel/base/string_util.h"

Go to the source code of this file.

Classes

class  net_instaweb::UrlToFilenameEncoder
 Helper class for converting a URL into a filename. More...
 

Namespaces

 net_instaweb
 Unit-test framework for wget fetcher.
 

Detailed Description

jmara.nosp@m.ntz@.nosp@m.googl.nosp@m.e.co.nosp@m.m (Joshua Marantz)

URL filename encoder goals:

  1. Allow URLs with arbitrary path-segment length, generating filenames with a maximum of 128 characters.
  2. Provide a somewhat human readable filenames, for easy debugging flow.
  3. Provide reverse-mapping from filenames back to URLs.
  4. Be able to distinguish http://x from http://x/ from http://x/index.html. Those can all be different URLs.
  5. Be able to represent http://a/b/c and http://a/b/c/d, a pattern seen with Facebook Connect.

We need an escape-character for representing characters that are legal in URL paths, but not in filenames, such as '?'.

We can pick any legal character as an escape, as long as we escape it too. But as we have a goal of having filenames that humans can correlate with URLs, we should pick one that doesn't show up frequently in URLs. Candidates are ~`!#$%^&()-=_+{}[],. but we would prefer to avoid characters that are treated specially by tools like shells or build tools. It turns out that , is neither frequent in URLs nor special anywhere else, so we use that.

The escaping algorithm is: 1) Escape all unfriendly symbols as ,XX where XX is the hex code. 2) Add a ',' at the end (We do not allow ',' at end of any directory name, so this assures that e.g. /a and /a/b can coexist in the filesystem). 3) Go through the path segment by segment (where a segment is one directory or leaf in the path) and 3a) If the segment is empty, escape the second slash. i.e. if it was www.foo.com///<a then we escape the second / like www.foo.com/,2Fa, 3a) If it is "." or ".." prepend with ',' (so that we have a non- empty and non-reserved filename). 3b) If it is over 128 characters, break it up into smaller segments by inserting ,-/ (Windows limits paths to 128 chars, other OSes also have limits that would restrict us)

For example: URL File / /, /index.html /index.html, /. /., /a/b /a/b, /a/b/ /a/b/, /a/b/c /a/b/c, Note: no prefix problem /u?foo=bar /u,3Ffoo=bar, // /,2F, /./ /,./, /../ /,../, /, /,2C, /,./ /,2C./, /very...longname/ /very...long,-/name If very...long is about 126 long.