#include "html_lexer.h"

Public Member Functions
	HtmlLexer (HtmlParse *html_parse)

void	StartParse (const StringPiece &id, const ContentType &content_type)
	Initialize a new parse session, id is only used for error messages.

void	Parse (const char *text, int size)

void	FinishParse ()
	Completes parse, reporting any leftover text as a final HtmlCharacterEvent.

bool	IsImplicitlyClosedTag (HtmlName::Keyword keyword) const
	Determines whether a tag should be terminated in HTML.

bool	TagAllowsBriefTermination (HtmlName::Keyword keyword) const
	Determines whether a tag can be terminated briefly (e.g. <tag>)

bool	IsOptionallyClosedTag (HtmlName::Keyword keyword) const
	Determines whether it's OK to leave a tag unclosed.

void	DebugPrintStack ()
	Print element stack to stdout (for debugging).

HtmlElement *	Parent () const

const DocType &	doctype () const

void	set_size_limit (int64 x)
	Sets the limit on the maximum number of bytes that should be parsed.

bool	size_limit_exceeded () const

Static Public Member Functions
static bool	IsLiteralTag (HtmlName::Keyword keyword)

static bool	IsSometimesLiteralTag (HtmlName::Keyword keyword)

Detailed Description

Constructs a re-entrant HTML lexer. This lexer minimally parses tags, attributes, and comments. It is intended to parse the Wild West of the Web. It's designed to be tolerant of syntactic transgressions, merely passing through unparseable chunks as Characters.

Todo:: TODO(jmarantz): refactor this with html_parse, so that this class owns the symbol table and the event queue, and no longer needs to mutually depend on HtmlParse. That will make it easier to unit-test.

Member Function Documentation

const DocType& net_instaweb::HtmlLexer::doctype ( ) const

inline

Return the current assumed doctype of the document (based on the content type and any HTML directives encountered so far).

static bool net_instaweb::HtmlLexer::IsLiteralTag ( HtmlName::Keyword keyword )

static

Determines whether a tag should be interpreted as a 'literal' tag. That is, a tag whose contents are not parsed until a corresponding matching end tag is encountered.

static bool net_instaweb::HtmlLexer::IsSometimesLiteralTag ( HtmlName::Keyword keyword )

static

Determines whether a tag is interpreted as a 'literal' tag in some user agents. Since some user agents will interpret the contents of these tags, our lexer never treats them as literal tags.

HtmlElement* net_instaweb::HtmlLexer::Parent ( ) const

Returns the current lowest-level parent element in the element stack, or NULL if the stack is empty.

void net_instaweb::HtmlLexer::Parse	(	const char *	text,
		int	size
	)

Parse a chunk of text, adding events to the parser by calling html_parse_->AddEvent(...).

bool net_instaweb::HtmlLexer::size_limit_exceeded ( ) const

inline

Indicates whether we have exceeded the limit on the maximum number of bytes that we should parse.

The documentation for this class was generated from the following file:

pagespeed/kernel/html/html_lexer.h

Public Member Functions

Static Public Member Functions

Detailed Description

Member Function Documentation