Page Speed Optimization Libraries  1.13.35.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Static Public Member Functions | List of all members
net_instaweb::HtmlKeywords Class Reference

#include "html_keywords.h"

Static Public Member Functions

static void Init ()
 
static void ShutDown ()
 
static const StringPiece * KeywordToString (HtmlName::Keyword keyword)
 Returns an HTML keyword as a string, or NULL if not a keyword.
 
static StringPiece Escape (const StringPiece &unescaped, GoogleString *buf)
 
static StringPiece Unescape (const StringPiece &escaped, GoogleString *buf, bool *decoding_error)
 
static bool IsAutoClose (HtmlName::Keyword k1, HtmlName::Keyword k2)
 
static bool IsContained (HtmlName::Keyword k1, HtmlName::Keyword k2)
 
static bool IsOptionallyClosedTag (HtmlName::Keyword keyword)
 
static bool WritePre (StringPiece text, StringPiece style, Writer *writer, MessageHandler *handler)
 

Detailed Description

Helper class for HtmlParser to recognize HTML keywords, handle escaping and unescaping, and assist the lexer in understanding how to interpret unbalanced tags.

Member Function Documentation

static StringPiece net_instaweb::HtmlKeywords::Escape ( const StringPiece &  unescaped,
GoogleString buf 
)
inlinestatic

Take a raw text and escape it so it's safe for an HTML attribute, e.g. a&b –> a&b

static void net_instaweb::HtmlKeywords::Init ( )
static

Initialize a singleton instance of this class. This call is inherently thread unsafe, but only the first time it is called. If multi-threaded programs call this function before spawning threads then there will be no races.

static bool net_instaweb::HtmlKeywords::IsAutoClose ( HtmlName::Keyword  k1,
HtmlName::Keyword  k2 
)
inlinestatic

Note that Escape and Unescape are not guaranteed to be inverses of one another. For example, Unescape("")=="&", but Escape("&")="&". However, note that Unescape(Escape(s)) == s.

Another case to be wary of is when the argument to Unescape is not properly escaped. The result will be that the string is returned unmodified. For example, Unescape("a&b")=="a&b", butthen re-escaping that will give "a&b". Hence, the careful maintainer of an HTML parsing and rewriting system will need to maintain the original escaped text parsed from HTML files, and pass that to browsers. Determines whether an open tag of type k1 should be automatically closed if a StartElement for tag k2 is encountered. E.g.

<tbody> should be transformed to

<tbody>.

static bool net_instaweb::HtmlKeywords::IsContained ( HtmlName::Keyword  k1,
HtmlName::Keyword  k2 
)
inlinestatic

Determines whether an open tag of type k1 should be automatically closed if an EndElement for tag k2 is encountered. E.g. <tbody>

should be transformed into <tbody></tbody>

.

static bool net_instaweb::HtmlKeywords::IsOptionallyClosedTag ( HtmlName::Keyword  keyword)
inlinestatic

Determines whether the specified HTML keyword is closed automatically by the parser if the close-tag is omitted. E.g. <head> must be closed, but formatting elements such as

do not need to be closed. Also note the distinction with tags which are implicitly closed in HTML such as and
.

static void net_instaweb::HtmlKeywords::ShutDown ( )
static

Tear down the singleton instance of this class, freeing any allocated memory. This call is inherently thread unsafe.

static StringPiece net_instaweb::HtmlKeywords::Unescape ( const StringPiece &  escaped,
GoogleString buf,
bool *  decoding_error 
)
inlinestatic

Take escaped text and unescape it so its value can be interpreted, e.g. "http://myhost.com/p?v&amp;w" –> "http://myhost.com/p?v&w"

*decoding_error is set to true if the escaped string could not be safely transformed into a simple stream of bytes.

Todo:
TODO(jmarantz): Support a variant where we unescape to UTF-8.
static bool net_instaweb::HtmlKeywords::WritePre ( StringPiece  text,
StringPiece  style,
Writer writer,
MessageHandler handler 
)
static

Wraps text in a pre-tag using the specified style arguments and sends it to writer, returning false if the writer failed. E.g. style could be "color:red;". if style is empty then it is simply a pre-tag without attributes.


The documentation for this class was generated from the following file: