Page Speed Optimization Libraries  1.13.35.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Public Member Functions | Static Public Member Functions | List of all members
net_instaweb::HtmlLexer Class Reference

#include "html_lexer.h"

Public Member Functions

 HtmlLexer (HtmlParse *html_parse)
 
void StartParse (const StringPiece &id, const ContentType &content_type)
 Initialize a new parse session, id is only used for error messages.
 
void Parse (const char *text, int size)
 
void FinishParse ()
 Completes parse, reporting any leftover text as a final HtmlCharacterEvent.
 
bool IsImplicitlyClosedTag (HtmlName::Keyword keyword) const
 Determines whether a tag should be terminated in HTML.
 
bool TagAllowsBriefTermination (HtmlName::Keyword keyword) const
 Determines whether a tag can be terminated briefly (e.g. <tag>)
 
bool IsOptionallyClosedTag (HtmlName::Keyword keyword) const
 Determines whether it's OK to leave a tag unclosed.
 
void DebugPrintStack ()
 Print element stack to stdout (for debugging).
 
HtmlElementParent () const
 
const DocTypedoctype () const
 
void set_size_limit (int64 x)
 Sets the limit on the maximum number of bytes that should be parsed.
 
bool size_limit_exceeded () const
 

Static Public Member Functions

static bool IsLiteralTag (HtmlName::Keyword keyword)
 
static bool IsSometimesLiteralTag (HtmlName::Keyword keyword)
 

Detailed Description

Constructs a re-entrant HTML lexer. This lexer minimally parses tags, attributes, and comments. It is intended to parse the Wild West of the Web. It's designed to be tolerant of syntactic transgressions, merely passing through unparseable chunks as Characters.

Todo:
TODO(jmarantz): refactor this with html_parse, so that this class owns the symbol table and the event queue, and no longer needs to mutually depend on HtmlParse. That will make it easier to unit-test.

Member Function Documentation

const DocType& net_instaweb::HtmlLexer::doctype ( ) const
inline

Return the current assumed doctype of the document (based on the content type and any HTML directives encountered so far).

static bool net_instaweb::HtmlLexer::IsLiteralTag ( HtmlName::Keyword  keyword)
static

Determines whether a tag should be interpreted as a 'literal' tag. That is, a tag whose contents are not parsed until a corresponding matching end tag is encountered.

static bool net_instaweb::HtmlLexer::IsSometimesLiteralTag ( HtmlName::Keyword  keyword)
static

Determines whether a tag is interpreted as a 'literal' tag in some user agents. Since some user agents will interpret the contents of these tags, our lexer never treats them as literal tags.

HtmlElement* net_instaweb::HtmlLexer::Parent ( ) const

Returns the current lowest-level parent element in the element stack, or NULL if the stack is empty.

void net_instaweb::HtmlLexer::Parse ( const char *  text,
int  size 
)

Parse a chunk of text, adding events to the parser by calling html_parse_->AddEvent(...).

bool net_instaweb::HtmlLexer::size_limit_exceeded ( ) const
inline

Indicates whether we have exceeded the limit on the maximum number of bytes that we should parse.


The documentation for this class was generated from the following file: