The HTML syntax {#syntax} ==========================

This section only describes the rules for resources labeled with an [=HTML MIME type=]. Rules for XML resources are discussed in the section below entitled "[=The XML syntax=]".

## Writing HTML documents ## {#writing-html-documents} *This section only applies to documents, authoring tools, and markup generators. In particular, it does not apply to conformance checkers; conformance checkers must use the requirements given in the next section ("parsing HTML documents").* Documents must consist of the following parts, in the given order: 1. Optionally, a single U+FEFF BYTE ORDER MARK (BOM) character. 2. Any number of [=comments=] and [=space characters=]. 3. A [=DOCTYPE=]. 4. Any number of [=comments=] and [=space characters=]. 5. The [=document element=], in the form of an <{html}> element. 6. Any number of [=comments=] and [=space characters=]. The various types of content mentioned above are described in the next few sections. In addition, there are some restrictions on how [=character encoding declarations=] are to be serialized, as discussed in the section on that topic.

Space characters before the <{html}> element, and space characters at the start of the <{html}> element and before the <{head}> element, will be dropped when the document is parsed; space characters *after* the <{html}> element will be parsed as if they were at the end of the <{body}> element. Thus, space characters around the [=document element=] do not round-trip. It is suggested that newlines be inserted after the DOCTYPE, after any comments that are before the [=document element=], after the <{html}> element's start tag (if it is not [=omitted=]), and after any comments that are inside the <{html}> element but before the <{head}> element.

Many strings in the HTML syntax (e.g., the names of elements and their attributes) are case-insensitive, but only for [=uppercase ASCII letters=] and [=lowercase ASCII letters=]. For convenience, in this section this is just referred to as "case-insensitive". ### The DOCTYPE ### {#the-doctype} A DOCTYPE is a required preamble.

DOCTYPEs are required for legacy reasons. When omitted, browsers tend to use a different rendering mode that is incompatible with some specifications. Including the DOCTYPE in a document ensures that the browser makes a best-effort attempt at following the relevant specifications.

A DOCTYPE must consist of the following components, in this order:

1. A string that is an [=ASCII case-insensitive=] match for the string "`<!DOCTYPE`". 2. One or more [=space characters=]. 3. A string that is an [=ASCII case-insensitive=] match for the string "`html`". 4. Optionally, a [=DOCTYPE legacy string=]. 5. Zero or more [=space characters=]. 6. A U+003E GREATER-THAN SIGN character (>).

In other words, <!DOCTYPE html>, case-insensitively.

For the purposes of HTML generators that cannot output HTML markup with the short DOCTYPE "<!DOCTYPE html>", a DOCTYPE legacy string may be inserted into the DOCTYPE (in the position defined above). This string must consist of:

SYSTEM

about:legacy-compat

In other words, <!DOCTYPE html SYSTEM "about:legacy-compat"> or <!DOCTYPE html SYSTEM 'about:legacy-compat'>, case-insensitively except for the part in single or double quotes.

The [=DOCTYPE legacy string=] should not be used unless the document is generated from a system that cannot output the shorter string. ### Elements ### {#writing-html-documents-elements} There are six different kinds of elements: [=void elements=], the <{template}> elements, [=raw text elements=], [=escapable raw text elements=], [=foreign elements=], and [=normal elements=]. : Void elements :: <{area}>, <{base}>, <{br}>, <{col}>, <{embed}>, <{hr}>, <{img}>, <{input}>, <{link}>, <{meta}>, <{param}>, <{source}>, <{track}>, <{wbr}> : The <{template}> elements :: <{template}> : Raw text elements :: <{script}>, <{style}> : escapable raw text elements :: <{textarea}>, <{title}> : Foreign elements :: Elements from the [=MathML namespace=] and the [=SVG namespace=]. : Normal elements :: All other allowed [=HTML elements=] are normal elements. Tags are used to delimit the start and end of elements in the markup. [=Raw text=], [=escapable raw text=], and [=normal elements=] have a [=start tag=] to indicate where they begin, and an [=end tag=] to indicate where they end. The start and end tags of certain [=normal elements=] can be omitted, as described in the section on [=omitted|optional tags=]. Those that cannot be omitted must not be omitted. [=Void elements=] only have a start tag; end tags must not be specified for [=void elements=]. [=Foreign elements=] must either have a start tag and an end tag, or a start tag that is marked as self-closing, in which case they must not have an end tag. The contents of the element must be placed between just after the start tag (which [=omitted|might be implied, in certain cases=]) and just before the end tag (which again, [=omitted|might be implied, in certain cases=]). The exact allowed contents of each individual element depend on the [=content model=] of that element, as described earlier in this specification. Elements must not contain content that their content model disallows. In addition to the restrictions placed on the contents by those content models, however, the five types of elements have additional *syntactic* requirements. [=Void elements=] can't have any contents (since there's no end tag, no content can be put between the start tag and the end tag).

The <{template}> element can have template contents, but such template contents are not children of the <{template}> element itself. Instead, they are stored in a {{DocumentFragment}} associated with a different {{Document}} — without a [=browsing context=] — so as to avoid the <{template}> contents interfering with the main {{Document}}. The markup for the template contents of a <{template}> element is placed just after the <{template}> element's start tag and just before <{template}> element's end tag (as with other elements), and may consist of any [=text=], [=character references=], [=kind of element|elements=], and [=comments=], but the text must not contain the character U+003C LESS-THAN SIGN (<) or an [=ambiguous ampersand=].

[=Raw text elements=] can have [=text=], though it has [[#restrictions-on-the-contents-of-raw-text-and-escapable-raw-text-elements|restrictions]] described below. [=Escapable raw text elements=] can have [=text=] and [=character references=], but the text must not contain an [=ambiguous ampersand=]. There are also [[#restrictions-on-the-contents-of-raw-text-and-escapable-raw-text-elements|further restrictions]] described below. [=Foreign elements=] whose start tag is marked as self-closing can't have any contents (since, again, as there's no end tag, no content can be put between the start tag and the end tag). [=Foreign elements=] whose start tag is *not* marked as self-closing can have [=text=], [=character references=], [=CDATA sections=], other [=kind of element|elements=], and [=comments=], but the text must not contain the character U+003C LESS-THAN SIGN (<) or an [=ambiguous ampersand=].

The HTML syntax does not support namespace declarations, even in [=foreign elements=]. For instance, consider the following HTML fragment: <p> <svg> <metadata>  <cdr:license xmlns:cdr="https://www.example.com/cdr/metadata" name="MIT"/> </metadata> </svg> </p> The innermost element, cdr:license, is actually in the [=SVG namespace=], as the "xmlns:cdr" attribute has no effect (unlike in XML). In fact, as the comment in the fragment above says, the fragment is actually non-conforming. This is because the SVG specification does not define any elements called "cdr:license" in the [=SVG namespace=].

[=Normal elements=] can have [=text=], [=character references=], other [=kind of element|elements=], and [=comments=], but the text must not contain the character U+003C LESS-THAN SIGN (<) or an [=ambiguous ampersand=]. Some [=normal elements=] also have [[#restrictions-on-content-models|yet more restrictions]] on what content they are allowed to hold, beyond the restrictions imposed by the content model and those described in this paragraph. Those restrictions are described below. Tags contain a tag name, giving the element's name. HTML elements all have names that only use [=alphanumeric ASCII characters=]. In the HTML syntax, tag names, even those for [=foreign elements=], may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive. #### Start tags #### {#start-tags} Start tags must have the following format: 1. The first character of a start tag must be a U+003C LESS-THAN SIGN character (<). 2. The next few characters of a start tag must be the element's [=tag name=]. 3. If there are to be any attributes in the next step, there must first be one or more [=space characters=]. 4. Then, the start tag may have a number of attributes, the [=attribute|syntax for which=] is described below. Attributes must be separated from each other by one or more [=space characters=]. 5. After the attributes, or after the [=tag name=] if there are no attributes, there may be one or more [=space characters=]. (Some attributes are required to be followed by a space. See [[#elements-attributes]] below.) 6. Then, if the element is one of the [=void elements=], or if the element is a [=foreign element=], then there may be a single U+002F SOLIDUS character (/). This character has no effect on [=void elements=], but on [=foreign elements=] it marks the start tag as self-closing. 7. Finally, start tags must be closed by a U+003E GREATER-THAN SIGN character (>). #### End tags #### {#end-tags} End tags must have the following format: 1. The first character of an end tag must be a U+003C LESS-THAN SIGN character (<). 2. The second character of an end tag must be a U+002F SOLIDUS character (/). 3. The next few characters of an end tag must be the element's [=tag name=]. 4. After the tag name, there may be one or more [=space characters=]. 5. Finally, end tags must be closed by a U+003E GREATER-THAN SIGN character (>). #### Attributes #### {#elements-attributes} Attributes for an element are expressed inside the element's start tag. Attributes have a name and a value. Attribute names must consist of one or more characters other than the [=space characters=], U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the [=control characters=], and any characters that are not defined by Unicode. In the HTML syntax, attribute names, even those for [=foreign elements=], may be written with any mix of lower- and uppercase letters that are an [=ASCII case-insensitive=] match for the attribute's name. Attribute values are a mixture of [=text=] and [=character references=], except with the additional restriction that the text cannot contain an [=ambiguous ampersand=]. Attributes can be specified in four different ways: : Empty attribute syntax :: Just the [=attribute name=]. The value is implicitly the empty string.

In the following example, the <{input/disabled}> attribute is given with the empty attribute syntax: <input disabled>

If an attribute using the empty attribute syntax is to be followed by another attribute, then there must be a [=space character=] separating the two. : Unquoted attribute value syntax :: The [=attribute name=], followed by zero or more [=space characters=], followed by a single U+003D EQUALS SIGN character, followed by zero or more [=space characters=], followed by the [=attribute value=], which, in addition to the requirements given above for attribute values, must not contain any literal [=space characters=], any U+0022 QUOTATION MARK characters ("), U+0027 APOSTROPHE characters ('), U+003D EQUALS SIGN characters (=), U+003C LESS-THAN SIGN characters (<), U+003E GREATER-THAN SIGN characters (>), or U+0060 GRAVE ACCENT characters (`), and must not be the empty string.

In the following example, the <{input/value}> attribute is given with the unquoted attribute value syntax: <input value=yes>

If an attribute using the unquoted attribute syntax is to be followed by another attribute or by the optional U+002F SOLIDUS character (/) allowed in step 6 of the [=start tag=] syntax above, then there must be a [=space character=] separating the two. : Single-quoted attribute value syntax :: The [=attribute name=], followed by zero or more [=space characters=], followed by a single U+003D EQUALS SIGN character, followed by zero or more [=space characters=], followed by a single U+0027 APOSTROPHE character ('), followed by the [=attribute value=], which, in addition to the requirements given above for attribute values, must not contain any literal U+0027 APOSTROPHE characters ('), and finally followed by a second single U+0027 APOSTROPHE character (').

In the following example, the <{input/type}> attribute is given with the single-quoted attribute value syntax: <input type='checkbox'>

If an attribute using the single-quoted attribute syntax is to be followed by another attribute, then there must be a [=space character=] separating the two. : Double-quoted attribute value syntax :: The [=attribute name=], followed by zero or more [=space characters=], followed by a single U+003D EQUALS SIGN character, followed by zero or more [=space characters=], followed by a single U+0022 QUOTATION MARK character ("), followed by the [=attribute value=], which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single U+0022 QUOTATION MARK character (").

In the following example, the <{input/name}> attribute is given with the double-quoted attribute value syntax: <input name="be good">

If an attribute using the double-quoted attribute syntax is to be followed by another attribute, then there must be a [=space character=] separating the two. There must never be two or more attributes on the same start tag whose names are an [=ASCII case-insensitive=] match for each other. --- When a [=foreign element=] has one of the namespaced attributes given by the local name and namespace of the first and second cells of a row from the following table, it must be written using the name given by the third cell from the same row.

Local name	Namespace	Attribute name
`actuate`	[=XLink namespace=]	<{xlink/actuate\|xlink:actuate}>
`arcrole`	[=XLink namespace=]	<{xlink/arcrole\|xlink:arcrole}>
`href`	[=XLink namespace=]	<{xlink/href\|xlink:href}>
`role`	[=XLink namespace=]	<{xlink/role\|xlink:role}>
`show`	[=XLink namespace=]	<{xlink/show\|xlink:show}>
`title`	[=XLink namespace=]	<{xlink/title\|xlink:title}>
`type`	[=XLink namespace=]	<{xlink/type\|xlink:type}>
`lang`	[=XML namespace=]	<{xml/lang\|xml:lang}>
`space`	[=XML namespace=]	<{xml/space\|xml:space}>
`xmlns`	[=XMLNS namespace=]	<{xmlns/xmlns}>
`xlink`	[=XMLNS namespace=]	<{xlink/xlink\|xmlns:xlink}>

No other namespaced attribute can be expressed in [[#syntax|the HTML syntax]].

Whether the attributes in the table above are conforming or not is defined by other specifications (e.g., the SVG and MathML specifications); this section only describes the syntax rules if the attributes are serialized using the HTML syntax.

#### Optional tags #### {#optional-tags} Certain tags can be omitted.

Omitting an element's [=start tag=] in the situations described below does not mean the element is not present; it is implied, but it is still there. For example, an HTML document always has a root <{html}> element, even if the string <html> doesn't appear anywhere in the markup.

An <{html}> element's [=start tag=] may be omitted if the first thing inside the <{html}> element is not a [=comment=].

For example, in the following case it's ok to remove the "<html>" tag: <!DOCTYPE html> <html> <head> <title>Hello</title> </head> <body> <p>Welcome to this example.</p> </body> </html> Doing so would make the document look like this: <!DOCTYPE html> <head> <title>Hello</title> </head> <body> <p>Welcome to this example.</p> </body> </html> This has the exact same DOM. In particular, note that white space around the [=document element=] is ignored by the parser. The following example would also have the exact same DOM: <!DOCTYPE html><head> <title>Hello</title> </head> <body> <p>Welcome to this example.</p> </body> </html> However, in the following example, removing the start tag moves the comment to before the <{html}> element: <!DOCTYPE html> <html>  <head> <title>Hello</title> </head> <body> <p>Welcome to this example.</p> </body> </html> With the tag removed, the document actually turns into the same as this:

      <!DOCTYPE html>
      <!-- where is this comment in the DOM? -->
      <html>
        <head>
          <title>Hello</title>
        </head>
        <body>
          <p>Welcome to this example.</p>
        </body>
      </html>

This is why the tag can only be removed if it is not followed by a comment: removing the tag when there is a comment there changes the document's resulting parse tree. Of course, if the position of the comment does not matter, then the tag can be omitted, as if the comment had been moved to before the start tag in the first place.

An <{html}> element's [=end tag=] may be omitted if the <{html}> element is not immediately followed by a [=comment=]. A <{head}> element's [=start tag=] may be omitted if the element is empty, or if the first thing inside the <{head}> element is an element. A <{head}> element's [=end tag=] may be omitted if the <{head}> element is not immediately followed by a [=space character=] or a [=comment=]. A <{body}> element's [=start tag=] may be omitted if the element is empty, or if the first thing inside the <{body}> element is not a [=space character=] or a [=comment=], except if the first thing inside the <{body}> element is a <{meta}>, <{link}>, <{script}>, <{style}>, or <{template}> element. A <{body}> element's [=end tag=] may be omitted if the <{body}> element is not immediately followed by a [=comment=].

Note that in the example above, the <{head}> element start and end tags, and the <{body}> element start tag, can't be omitted, because they are surrounded by white space: <!DOCTYPE html> <html> <head> <title>Hello</title> </head> <body> <p>Welcome to this example.</p> </body> </html> (The <{body}> and <{html}> element end tags could be omitted without trouble; any spaces after those get parsed into the <{body}> element anyway.) Usually, however, white space isn't an issue. If we first remove the white space we don't care about: <!DOCTYPE html><html><head><title>Hello</title></head><body><p>Welcome to this example.</p></body></html> Then we can omit a number of tags without affecting the DOM: <!DOCTYPE html><title>Hello</title><p>Welcome to this example.</p> At that point, we can also add some white space back: <!DOCTYPE html> <title>Hello</title> <p>Welcome to this example.</p> This would be equivalent to this document, with the omitted tags shown in their parser-implied positions; the only white space text node that results from this is the newline at the end of the <{head}> element:

      <!DOCTYPE html>
      <html><head><title>Hello</title>
      </head><body><p>Welcome to this example.</p></body></html>

An <{li}> element's [=end tag=] may be omitted if the <{li}> element is immediately followed by another <{li}> element or if there is no more content in the parent element. A <{dt}> element's [=end tag=] may be omitted if the <{dt}> element is immediately followed by another <{dt}> element or a <{dd}> element. A <{dd}> element's [=end tag=] may be omitted if the <{dd}> element is immediately followed by another <{dd}> element or a <{dt}> element, or if there is no more content in the parent element. A <{p}> element's [=end tag=] may be omitted if the <{p}> element is immediately followed by an <{address}>, <{article}>, <{aside}>, <{blockquote}>, <{details}>, <{div}>, <{dl}>, <{fieldset}>, <{figcaption}>, <{figure}>, <{footer}>, <{form}>, <{h1}>, <{h2}>, <{h3}>, <{h4}>, <{h5}>, <{h6}>, <{header}>, <{hr}>, <{main}>, <{nav}>, <{ol}>, <{p}>, <{pre}>, <{section}>, <{table}>, or <{ul}> element, or if there is no more content in the parent element and the parent element is an [=HTML element=] that is not an <{a}>, <{audio}>, <{del}>, <{ins}>, <{map}>, <{noscript}>, or <{video}> element, or an [=autonomous custom element=].

We can thus simplify the earlier example further:

      <!DOCTYPE html><title>Hello</title><p>Welcome to this example.</p>

An <{rb}> element's end tag may be omitted if the <{rb}> element is immediately followed by an <{rb}>, <{rt}>, <{rtc}> or <{rp}> element, or if there is no more content in the parent element. An <{rp}> element's end tag may be omitted if the <{rp}> element is immediately followed by an <{rb}>, <{rt}>, <{rtc}> or <{rp}> element, or if there is no more content in the parent element. An <{rt}> element's end tag may be omitted if the <{rt}> element is immediately followed by an <{rb}>, <{rt}>, <{rtc}> or <{rp}> element, or if there is no more content in the parent element. An <{rtc}> element's end tag may be omitted if the <{rtc}> element is immediately followed by an <{rb}> or <{rtc}> element, or if there is no more content in the parent element. An <{optgroup}> element's [=end tag=] may be omitted if the <{optgroup}> element is immediately followed by another <{optgroup}> element, or if there is no more content in the parent element. An <{option}> element's [=end tag=] may be omitted if the <{option}> element is immediately followed by another <{option}> element, or if it is immediately followed by an <{optgroup}> element, or if there is no more content in the parent element. A <{colgroup}> element's [=start tag=] may be omitted if the first thing inside the <{colgroup}> element is a <{col}> element, and if the element is not immediately preceded by another <{colgroup}> element whose [=end tag=] has been omitted. (It can't be omitted if the element is empty.) A <{colgroup}> element's [=end tag=] may be omitted if the <{colgroup}> element is not immediately followed by a [=space character=] or a [=comment=]. A <{caption}> element's [=end tag=] may be omitted if the <{caption}> element is not immediately followed by a [=space character=] or a [=comment=]. A <{thead}> element's [=end tag=] may be omitted if the <{thead}> element is immediately followed by a <{tbody}> or <{tfoot}> element. A <{tbody}> element's [=start tag=] may be omitted if the first thing inside the <{tbody}> element is a <{tr}> element, and if the element is not immediately preceded by a <{tbody}>, <{thead}>, or <{tfoot}> element whose [=end tag=] has been omitted. (It can't be omitted if the element is empty.) A <{tbody}> element's [=end tag=] may be omitted if the <{tbody}> element is immediately followed by a <{tbody}> or <{tfoot}> element, or if there is no more content in the parent element. A <{tfoot}> element's [=end tag=] may be omitted if there is no more content in the parent element. A <{tr}> element's [=end tag=] may be omitted if the <{tr}> element is immediately followed by another <{tr}> element, or if there is no more content in the parent element. A <{td}> element's [=end tag=] may be omitted if the <{td}> element is immediately followed by a <{td}> or <{th}> element, or if there is no more content in the parent element. A <{th}> element's [=end tag=] may be omitted if the <{th}> element is immediately followed by a <{td}> or <{th}> element, or if there is no more content in the parent element.

The ability to omit all these table-related tags makes table markup much terser. Take this example: <table> <caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated)</caption> <colgroup> <col> <col> <col> </colgroup> <thead> <tr> <th>Function</th> <th>Control Unit</th> <th>Central Station</th> </tr> </thead> <tbody> <tr> <td>Headlights</td> <td>✔</td> <td>✔</td> </tr> <tr> <td>Interior Lights</td> <td>✔</td> <td>✔</td> </tr> <tr> <td>Electric locomotive operating sounds</td> <td>✔</td> <td>✔</td> </tr> <tr> <td>Engineer's cab lighting</td> <td></td> <td>✔</td> </tr> <tr> <td>Station Announcements - Swiss</td> <td></td> <td>✔</td> </tr> </tbody> </table> The exact same table could be marked up as follows: <table> <caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated) <colgroup><col><col><col> <thead> <tr> <th>Function <th>Control Unit <th>Central Station <tbody> <tr> <td>Headlights <td>✔ <td>✔ <tr> <td>Interior Lights <td>✔ <td>✔ <tr> <td>Electric locomotive operating sounds <td>✔ <td>✔ <tr> <td>Engineer's cab lighting <td> <td>✔ <tr> <td>Station Announcements - Swiss <td> <td>✔ </table> Since the cells take up much less room this way, this can be made even terser by having each row on one line: <table> <caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated) <colgroup><col><col><col> <thead> <tr> <th>Function <th>Control Unit <th>Central Station <tbody> <tr> <td>Headlights <td>✔ <td>✔ <tr> <td>Interior Lights <td>✔ <td>✔ <tr> <td>Electric locomotive operating sounds <td>✔ <td>✔ <tr> <td>Engineer's cab lighting <td> <td>✔ <tr> <td>Station Announcements - Swiss <td> <td>✔ </table> The only differences between these tables, at the DOM level, is with the precise position of the white space. This makes no difference to the parsing and the meaning is unchanged.

However, a [=start tag=] must never be omitted if it has any attributes.

Returning to the earlier example with all the white space removed and then all the optional tags removed: <!DOCTYPE html><title>Hello</title><p>Welcome to this example. If the <{body}> element in this example had to have a <{global/class}> attribute and the <{html}> element had to have a <{global/lang}> attribute, the markup would have to become: <!DOCTYPE html><html lang="en"><title>Hello</title><body class="demo"><p>Welcome to this example.

This section assumes that the document is conforming, in particular, that there are no [=content model=] violations. Omitting tags in the fashion described in this section in a document that does not conform to the [=content models=] described in this specification is likely to result in unexpected DOM differences (this is, in part, what the content models are designed to avoid).

#### Restrictions on content models #### {#restrictions-on-content-models} For historical reasons, certain elements have extra restrictions beyond even the restrictions given by their content model. A <{table}> element must not contain <{tr}> elements, even though these elements are technically allowed inside <{table}> elements according to the content models described in this specification. (If a <{tr}> element is put inside a <{table}> in the markup, it will in fact imply a <{tbody}> start tag before it.) A single [[#newlines|newline]] may be placed immediately after the [=start tag=] of <{pre}> and <{textarea}> elements. This does not affect the processing of the element. The otherwise optional [[#newlines|newline]] *must* be included if the element's contents themselves start with a [[#newlines|newline]] (because otherwise the leading newline in the contents would be treated like the optional newline, and ignored).

The following two <{pre}> blocks are equivalent: <pre>Hello</pre> <pre> Hello</pre>

#### Restrictions on the contents of raw text and escapable raw text elements #### {#restrictions-on-the-contents-of-raw-text-and-escapable-raw-text-elements} The text in [=raw text=] and [=escapable raw text elements=] must not contain any occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/). ### Text ### {#writing-text} Text is allowed inside elements, attribute values, and comments. Extra constraints are placed on what is and what is not allowed in text based on where the text is to be put, as described in the other sections. #### Newlines #### {#newlines} Newlines in HTML may be represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order. Where [=character references=] are allowed, a character reference of a U+000A LINE FEED (LF) character (but not a U+000D CARRIAGE RETURN (CR) character) also represents a [[#newlines|newline]]. ### Character references ### {#character-references} In certain cases described in other sections, [=text=] may be mixed with character references. These can be used to escape characters that couldn't otherwise legally be included in [=text=]. Character references must start with a U+0026 AMPERSAND character (&). Following this, there are three possible kinds of character references: : Named character references :: The ampersand must be followed by one of the names given in [[#named-character-references]] section, using the same case. The name must be one that is terminated by a U+003B SEMICOLON character (;). : Decimal numeric character reference :: The ampersand must be followed by a U+0023 NUMBER SIGN character (#), followed by one or more [=ASCII digits=], representing a base-ten integer that corresponds to a Unicode code point that is allowed according to the definition below. The digits must then be followed by a U+003B SEMICOLON character (;). : Hexadecimal numeric character reference :: The ampersand must be followed by a U+0023 NUMBER SIGN character (#), which must be followed by either a U+0078 LATIN SMALL LETTER X character (x) or a U+0058 LATIN CAPITAL LETTER X character (X), which must then be followed by one or more [=ASCII hex digits=], representing a hexadecimal integer that corresponds to a Unicode code point that is allowed according to the definition below. The digits must then be followed by a U+003B SEMICOLON character (;). The numeric character reference forms described above are allowed to reference any Unicode code point but not: * permanently undefined Unicode characters (noncharacters) * surrogates (U+D800—U+DFFF) * control codes (U+0000—U+001F and U+007F—U+009F), with the exception of the following numeric character references which are allowed: * ` ` or ` ` - representing U+0009 CHARACTER TABULATION (␉) * ` ` or ` ` - representing U+000A LINE FEED (␊) * `` or `` - representing U+000C FORM FEED (␌) An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more [=alphanumeric ASCII characters=], followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the [[#named-character-references]] section. ### CDATA sections ### {#cdata-sections} CDATA sections must consist of the following components, in this order: 1. The string "<![CDATA[". 2. Optionally, [=text=], with the additional restriction that the text must not contain the string "]]>". 3. The string "]]>".

CDATA sections can only be used in foreign content (MathML or SVG). In this example, a CDATA section is used to escape the contents of a MathML <{ms}> element: <p>You can add a string to a number, but this stringifies the number:</p> <math> <ms><![CDATA[x<y]]></ms> <mo>+</mo> <mn>3</mn> <mo>=</mo> <ms><![CDATA[x<y3]]></ms> </math>

### Comments ### {#comments} Comments must have the following format: 1. The string "", or "--!>", nor end with the string "<!-". 3. The string "-->"

The [=text=] is allowed to end with the string "<!", as in .

## Parsing HTML documents ## {#parsing-html-documents} *This section only applies to user agents, data mining tools, and conformance checkers.*

The rules for parsing XML documents into DOM trees are covered by the next section, entitled "[[#xhtml]]".

User agents must use the parsing rules described in this section to generate the DOM trees from [[#text-html|text/html]] resources. Together, these rules define what is referred to as the HTML parser.

While the HTML syntax described in this specification bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules. Some earlier versions of HTML (in particular from HTML 2.0 to HTML 4.01) were based on SGML and used SGML parsing rules. However, few (if any) web browsers ever implemented true SGML parsing for HTML documents; the only user agents to strictly handle HTML as an SGML application have historically been validators. The resulting confusion — with validators claiming documents to have one representation while widely deployed Web browsers interoperably implemented a different representation — has wasted decades of productivity. This version of HTML thus returns to a non-SGML basis. Authors interested in using SGML tools in their authoring pipeline are encouraged to use XML tools and the XML serialization of HTML.

This specification defines the parsing rules for HTML documents, whether they are syntactically correct or not. Certain points in the parsing algorithm are said to be parse errors. The error handling for parse errors is well-defined (that's the processing rules described throughout this specification), but user agents, while parsing an HTML document, may [=abort the parser=] at the first [=parse error=] that they encounter for which they do not wish to apply the rules described in this specification. Conformance checkers must report at least one [=parse error=] condition to the user if one or more [=parse error=] conditions exist in the document and must not report [=parse error=] conditions if none exist in the document. Conformance checkers may report more than one [=parse error=] condition if more than one [=parse error=] condition exists in the document.

[=Parse errors=] are only errors with the *syntax* of HTML. In addition to checking for [=parse errors=], conformance checkers will also verify that the document obeys all the other conformance requirements described in this specification.

For the purposes of conformance checkers, if a resource is determined to be in [[#syntax|the HTML syntax]], then it is an [=HTML document=].

As stated in the terminology section, references to [=element types=] that do not explicitly specify a namespace always refer to elements in the [=HTML namespace=]. Where possible, references to such elements are hyperlinked to their definition.

### Overview of the parsing model ### {#overview-of-the-parsing-model}

The input to the HTML parsing process consists of a stream of [=Unicode code points=], which is passed through a [[#tokenization|tokenization]] stage followed by a tree construction stage. The output is a {{Document}} object.

Implementations that [=do not support scripting=] do not have to actually create a DOM {{Document}} object, but the DOM tree in such cases is still used as the model for the rest of the specification.

In the common case, the data handled by the tokenization stage comes from the network, but [=it can also come from script=] running in the user agent, e.g., using the {{Document/write()|document.write()}} API. There is only one set of states for the tokenizer stage and the tree construction stage, but the tree construction stage is reentrant, meaning that while the tree construction stage is handling one token, the tokenizer might be resumed, causing further tokens to be emitted and processed before the first token's processing is complete.

In the following example, the tree construction stage will be called upon to handle a "`p`" start tag token while handling the "`script`" end tag token: ... <script> document.write('<p>'); </script> ...

To handle these cases, parsers have a script nesting level, which must be initially set to zero, and a parser pause flag, which must be initially set to false. ### The input byte stream ### {#the-input-byte-stream} The stream of Unicode code points that comprises the input to the tokenization stage will be initially seen by the user agent as a stream of bytes (typically coming over the network or from the local file system). The bytes encode the actual characters according to a particular *character encoding*, which the user agent uses to decode the bytes into characters.

For XML documents, the algorithm user agents are required to use to determine the character encoding is given by the XML specification. This section does not apply to XML documents. [[!XML]]

Usually, the [=encoding sniffing algorithm=] defined below is used to determine the character encoding. Given a character encoding, the bytes in the [=input byte stream=] must be converted to characters for the tokenizer's [=input stream=], by passing the [=input byte stream=] and character encoding to decode.

A leading Byte Order Mark (BOM) causes the character encoding argument to be ignored and will itself be skipped.

Bytes or sequences of bytes in the original byte stream that did not conform to the Encoding specification (e.g., invalid UTF-8 byte sequences in a UTF-8 input byte stream) are errors that conformance checkers are expected to report. [[!ENCODING]]

The decoder algorithms describe how to handle invalid input; for security reasons, it is imperative that those rules be followed precisely. Differences in how invalid byte sequences are handled can result in, amongst other problems, script injection vulnerabilities ("XSS").

When the HTML parser is decoding an input byte stream, it uses a character encoding and a confidence. The confidence is either *tentative*, *certain*, or *irrelevant*. The encoding used, and whether the confidence in that encoding is *tentative* or *certain*, is [=used during the parsing=] to determine whether to [=change the encoding=]. If no encoding is necessary, e.g., because the parser is operating on a Unicode stream and doesn't have to use a character encoding at all, then the [=confidence=] is *irrelevant*.

Some algorithms feed the parser by directly adding characters to the [=input stream=] rather than adding bytes to the [=input byte stream=].

#### Parsing with a known character encoding #### {#parsing-with-a-known-character-encoding} When the HTML parser is to operate on an input byte stream that has a known definite encoding, then the character encoding is that encoding and the [=confidence=] is *certain*. #### Determining the character encoding #### {#determining-the-character-encoding} In some cases, it might be impractical to unambiguously determine the encoding before parsing the document. Because of this, this specification provides for a two-pass mechanism with an optional pre-scan. Implementations are allowed, as described below, to apply a simplified parsing algorithm to whatever bytes they have available before beginning to parse the document. Then, the real parser is started, using a tentative encoding derived from this pre-parse and other out-of-band metadata. If, while the document is being loaded, the user agent discovers a character encoding declaration that conflicts with this information, then the parser can get reinvoked to perform a parse of the document with the real encoding. User agents must use the following algorithm, called the encoding sniffing algorithm, to determine the character encoding to use when decoding a document in the first pass. This algorithm takes as input any out-of-band metadata available to the user agent (e.g., the [=Content-Type metadata=] of the document) and all the bytes available so far, and returns a character encoding and a [=confidence=] that is either *tentative* or *certain*. 1. If the user has explicitly instructed the user agent to override the document's character encoding with a specific encoding, optionally return that encoding with the [=confidence=] *certain* and abort these steps.

Typically, user agents remember such user requests across sessions, and in some cases apply them to documents in <{iframe}>s as well.

2. The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 1024 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse.

The authoring conformance requirements for character encoding declarations limit them to only appearing in [=the first 1024 bytes=]. User agents are therefore encouraged to use the prescan algorithm below (as invoked by these steps) on the first 1024 bytes, but not to stall beyond that.

3. If the transport layer specifies a character encoding, and it is supported, return that encoding with the [=confidence=] *certain*, and abort these steps. 4. Optionally [=prescan a byte stream to determine its encoding|prescan the byte stream to determine its encoding=]. The |end condition| is that the user agent decides that scanning further bytes would not be efficient. User agents are encouraged to only prescan [=the first 1024 bytes=]. User agents may decide that scanning *any* bytes is not efficient, in which case these substeps are entirely skipped. The aforementioned algorithm either aborts unsuccessfully or returns a character encoding. If it returns a character encoding, then this algorithm must be aborted, returning the same encoding, with [=confidence=] *tentative*. 5. If the [=HTML parser=] for which this algorithm is being run is associated with a {{Document}} that is itself in a [=nested browsing context=], run these substeps: 1. Let |new document| be the {{Document}} with which the [=HTML parser=] is associated. 2. Let |parent document| be the {{Document}} through which |new document| is nested (the [=active document=] of the [=parent browsing context=] of |new document|). 3. If |parent document|'s [=concept/origin=] is not the [=same origin=] as |new document|'s [=concept/origin=], then abort these substeps. 4. If |parent document|'s [=character encoding=] is not an [=ASCII-compatible encoding=], then abort these substeps. 5. Return |parent document|'s [=character encoding=], with the [=confidence=] *tentative*, and abort the [=encoding sniffing algorithm=]'s steps. 6. Otherwise, if the user agent has information on the likely encoding for this page, e.g., based on the encoding of the page when it was last visited, then return that encoding, with the [=confidence=] *tentative*, and abort these steps. 7. The user agent may attempt to autodetect the character encoding from applying frequency analysis or other algorithms to the data stream. Such algorithms may use information about the resource other than the resource's contents, including the address of the resource. If autodetection succeeds in determining a character encoding, and that encoding is a supported encoding, then return that encoding, with the [=confidence=] *tentative*, and abort these steps. [[UNIVCHARDET]]

User agents are generally discouraged from attempting to autodetect encodings for resources obtained over the network, since doing so involves inherently non-interoperable heuristics. Attempting to detect encodings based on an HTML document's preamble is especially tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin with a lot of markup rather than with text content.

The UTF-8 encoding has a highly detectable bit pattern. Files from the local file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents with byte sequences that do not match it are very likely not. When a user agent can examine the whole file, rather than just the preamble, detecting for UTF-8 specifically can be especially effective. [[PPUTF8]] [[UTF8DET]]

8. Otherwise, return an implementation-defined or user-specified default character encoding, with the [=confidence=] *tentative*. In controlled environments or in environments where the encoding of documents can be prescribed (for example, for user agents intended for dedicated use in new networks), the comprehensive `UTF-8` encoding is suggested. In other environments, the default encoding is typically dependent on the user's locale (an approximation of the languages, and thus often encodings, of the pages that the user is likely to frequent). The following table gives suggested defaults based on the user's locale, for compatibility with legacy content. Locales are identified by BCP 47 language tags. [[!BCP47]] [[!ENCODING]]

Locale language		Suggested default encoding
ar	Arabic	[=windows-1256=]
ba	Bashkir	[=windows-1251=]
be	Belarusian	[=windows-1251=]
bg	Bulgarian	[=windows-1251=]
cs	Czech	[=windows-1250=]
el	Greek	[=ISO-8859-7=]
et	Estonian	[=windows-1257=]
fa	Persian	[=windows-1256=]
he	Hebrew	[=windows-1255=]
hr	Croatian	[=windows-1250=]
hu	Hungarian	[=ISO-8859-2=]
ja	Japanese	[=Shift_JIS=]
kk	Kazakh	[=windows-1251=]
ko	Korean	[=EUC-KR=]
ku	Kurdish	[=windows-1254=]
ky	Kyrgyz	[=windows-1251=]
lt	Lithuanian	[=windows-1257=]
lv	Latvian	[=windows-1257=]
mk	Macedonian	[=windows-1251=]
pl	Polish	[=ISO-8859-2=]
ru	Russian	[=windows-1251=]
sah	Yakut	[=windows-1251=]
sk	Slovak	[=windows-1250=]
sl	Slovenian	[=ISO-8859-2=]
sr	Serbian	[=windows-1251=]
tg	Tajik	[=windows-1251=]
th	Thai	[=windows-874=]
tr	Turkish	[=windows-1254=]
tt	Tatar	[=windows-1251=]
uk	Ukrainian	[=windows-1251=]
vi	Vietnamese	[=windows-1258=]
zh-CN	Chinese (People's Republic of China)	[=gb18030=]
zh-TW	Chinese (Taiwan)	[=Big5=]
All other locales		[=windows-1252=]

The contents of this table are derived from the intersection of Windows, Chrome, and Firefox defaults. The [=document's character encoding=] must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to use for the input byte stream. --- When an algorithm requires a user agent to prescan a byte stream to determine its encoding, given some defined |end condition|, then it must run the following steps. These steps either abort unsuccessfully or return a character encoding. If at any point during these steps (including during instances of the [=get an attribute=] algorithm invoked by this one) the user agent either runs out of bytes (meaning the |position| pointer created in the first step below goes beyond the end of the byte stream obtained so far) or reaches its |end condition|, then abort the [=prescan a byte stream to determine its encoding=] algorithm unsuccessfully. 1. Let |position| be a pointer to a byte in the input byte stream, initially pointing at the first byte. 2. |Loop|: If |position| points to:

: A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '' sequence) and comes after the 0x3C byte that was found. (The two 0x2D bytes can be the same as those in the '<!--' sequence.) : A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash) :: 1. Advance the |position| pointer so that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or 0x2F byte (the one in sequence of characters matched above). 2. Let |attribute list| be an empty list of strings. 3. Let |got pragma| be false. 4. Let |need pragma| be null. 5. Let |charset| be the null value (which, for the purposes of this algorithm, is distinct from an unrecognized encoding or the empty string). 6. |Attributes|: [=Get an attribute=] and its value. If no attribute was sniffed, then jump to the |Processing| step below. 7. If the attribute's name is already in |attribute list|, then return to the step labeled |Attributes|. 8. Add the attribute's name to |attribute list|. 9. Run the appropriate step from the following list, if one applies:

: If the attribute's name is "<{meta/http-equiv}>" :: If the attribute's value is "`content-type`", then set |got pragma| to true. : If the attribute's name is "<{meta/content}>" :: Apply the algorithm for extracting a character encoding from a `meta` element, giving the attribute's value as the string to parse. If a character encoding is returned, and if |charset| is still set to null, let |charset| be the encoding returned, and set |need pragma| to true. : If the attribute's name is "<{meta/charset}>" :: Let |charset| be the result of [=getting an encoding=] from the attribute's value, and set |need pragma| to false. 10. Return to the step labeled |Attributes|. 11. |Processing|: If |need pragma| is null, then jump to the step below labeled |Next byte|. 12. If |need pragma| is true but |got pragma| is false, then jump to the step below labeled |Next byte|. 13. If |charset| is failure, then jump to the step below labeled |Next byte|. 14. If |charset| is a [=UTF-16 encoding=], then set |charset| to [=UTF-8=]. 15. If |charset| is [=x-user-defined=], then set |charset| to [=windows-1252=]. 16. Abort the [=prescan a byte stream to determine its encoding=] algorithm, returning the encoding given by |charset|. : A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter) :: 1. Advance the |position| pointer so that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII >) byte. 2. Repeatedly [=get an attribute=] until no further attributes can be found, then jump to the step below labeled |Next byte|. : A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!') : A sequence of bytes starting with: 0x3C 0x2F (ASCII '</') : A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?') :: Advance the |position| pointer so that it points at the first 0x3E byte (ASCII >) that comes after the 0x3C byte that was found. : Any other byte :: Do nothing with that byte. 3. |Next byte|: Move |position| so it points at the next byte in the input byte stream, and return to the step above labeled |Loop|. When the [=prescan a byte stream to determine its encoding=] algorithm says to get an attribute, it means doing this: 1. If the byte at |position| is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x2F (ASCII /) then advance |position| to the next byte and redo this step. 2. If the byte at |position| is 0x3E (ASCII >), then abort the [=get an attribute=] algorithm. There isn't one. 3. Otherwise, the byte at |position| is the start of the attribute name. Let |attribute name| and |attribute value| be the empty string. 4. Process the byte at |position| as follows:

: If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII >) :: Abort the [=get an attribute=] algorithm. The attribute's name is the value of |attribute name| and its value is the value of |attribute value|. : If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z) :: Append the Unicode character with code point |b|+0x20 to |attribute value| (where |b| is the value of the byte at |position|). : Anything else :: Append the Unicode character with the same code point as the value of the byte at |position| to |attribute value|. 12. Advance |position| to the next byte and return to the previous step. For the sake of interoperability, user agents should not use a pre-scan algorithm that returns different results than the one described above. (But, if you do, please at least let us know, so that we can improve this algorithm and benefit everyone...) #### Character encodings #### {#character-encodings} User agents must support the encodings defined in the WHATWG Encoding specification, including, but not limited to, UTF-8, ISO-8859-2, ISO-8859-8, windows-1250, windows-1251, windows-1252, windows-1254, windows-1256, windows-1257, gb18030, Big5, ISO-2022-JP, Shift_JIS, EUC-KR, UTF-16BE, UTF-16LE, and x-user-defined. User agents must not support other encodings.

The above prohibits supporting, for example, CESU-8, UTF-7, BOCU-1, SCSU, EBCDIC, and UTF-32. This specification does not make any attempt to support prohibited encodings in its algorithms; support and use of prohibited encodings would thus lead to unexpected behavior. [[CESU8]] [[RFC2152]] [[BOCU1]] [[SCSU]]

#### Changing the encoding while parsing #### {#changing-the-encoding-while-parsing} When the parser requires the user agent to change the encoding, it must run the following steps. This might happen if the [=encoding sniffing algorithm=] described above failed to find a character encoding, or if it found a character encoding that was not the actual encoding of the file. 1. If the encoding that is already being used to interpret the input stream is a [=UTF-16 encoding=], then set the [=confidence=] to *certain* and abort these steps. The new encoding is ignored; if it was anything but the same encoding, then it would be clearly incorrect. 2. If the new encoding is a [=UTF-16 encoding=], then change it to [=UTF-8=]. 3. If the new encoding is the [=x-user-defined=] encoding, then change it to [=windows-1252=]. [[!ENCODING]] 4. If the new encoding is identical or equivalent to the encoding that is already being used to interpret the input stream, then set the [=confidence=] to *certain* and abort these steps. This happens when the encoding information found in the file matches what the [=encoding sniffing algorithm=] determined to be the encoding, and in the second pass through the parser if the first pass found that the [=encoding sniffing algorithm=] described in the earlier section failed to find the right encoding. 5. If all the bytes up to the last byte converted by the current decoder have the same Unicode interpretations in both the current encoding and the new encoding, and if the user agent supports changing the converter on the fly, then the user agent may change to the new converter for the encoding on the fly. Set the [=document's character encoding=] and the encoding used to convert the input stream to the new encoding, set the [=confidence=] to *certain*, and abort these steps. 6. Otherwise, [=navigate=] to the document again, with [=replacement enabled=], and using the same [=source browsing context=], but this time skip the [=encoding sniffing algorithm=] and instead just set the encoding to the new encoding and the [=confidence=] to *certain*. Whenever possible, this should be done without actually contacting the network layer (the bytes should be re-parsed from memory), even if, e.g., the document is marked as not being cacheable. If this is not possible and contacting the network layer would involve repeating a request that uses a method other than `GET`), then instead set the [=confidence=] to *certain* and ignore the new encoding. The resource will be misinterpreted. User agents may notify the user of the situation, to aid in application development.

This algorithm is only invoked when a new encoding is found declared on a <{meta}> element.

#### Preprocessing the input stream #### {#preprocessing-the-input-stream} The input stream consists of the characters pushed into it as the [=input byte stream=] is decoded or from the various APIs that directly manipulate the input stream. Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are [=parse errors=]. These are all [=control characters=] or permanently undefined Unicode characters (noncharacters). Any [=character=] that is a not a [=Unicode character=], i.e., any isolated surrogate, is a [=parse error=]. (These can only find their way into the input stream via script APIs such as {{Document/write()|document.write()}}.) U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) characters are treated specially. Any LF character that immediately follows a CR character must be ignored, and all CR characters must then be converted to LF characters. Thus, newlines in HTML DOMs are represented by LF characters, and there are never any CR characters in the input to the [[#tokenization|tokenization]] stage. The next input character is the first character in the [=input stream=] that has not yet been consumed or explicitly ignored by the requirements in this section. Initially, the [=next input character=] is the first character in the input. The current input character is the last character to have been *consumed*. The insertion point is the position (just before a character or just before the end of the input stream) where content inserted using {{Document/write()|document.write()}} is actually inserted. The insertion point is relative to the position of the character immediately after it, it is not an absolute offset into the input stream. Initially, the insertion point is undefined. The "EOF" character in the tables below is a conceptual character representing the end of the [=input stream=]. If the parser is a [=script-created parser=], then the end of the [=input stream=] is reached when an explicit "EOF" character (inserted by the {{Document/close()|document.close()}} method) is consumed. Otherwise, the "EOF" character is not a real character in the stream, but rather the lack of any further characters.

The handling of U+0000 NULL characters varies based on where the characters are found. In general, they are ignored, but for security reasons they are sometimes replaced with U+FFFD REPLACEMENT CHARACTER. This can happen in tokenization or tree construction.

### Parse state ### {#parse-state} #### The insertion mode #### {#the-insertion-mode} The insertion mode is a state variable that controls the primary operation of the tree construction stage. Initially, the [=insertion mode=] is "[=initial=]". It can change to "[=before html=]", "[=before head=]", "[=in head=]", "[=in head noscript=]", "[=after head=]", "[=in body=]", "[=in text|text=]", "[=in table=]", "[=in table text=]", "[=in caption=]", "[=in column group=]", "[=in table body=]", "[=in row=]", "[=in cell=]", "[=in select=]", "[=in select in table=]", "[=in template=]", "[=after body=]", "[=in frameset=]", "[=after frameset=]", "[=after after body=]", and "[=after after frameset=]" during the course of the parsing, as described in the tree construction stage. The insertion mode affects how tokens are processed and whether CDATA sections are supported. Several of these modes, namely "[=in head=]", "[=in body=]", "[=in table=]", and "[=in select=]", are special, in that the other modes defer to them at various times. When the algorithm below says that the user agent is to do something "using the rules for the |m| insertion mode", where |m| is one of these modes, the user agent must use the rules described under the |m| [=insertion mode=]'s section, but must leave the [=insertion mode=] unchanged unless the rules in |m| themselves switch the [=insertion mode=] to a new value. When the insertion mode is switched to "[=in text|text=]" or "[=in table text=]", the original insertion mode is also set. This is the insertion mode to which the tree construction stage will return. Similarly, to parse nested <{template}> elements, a stack of template insertion modes is used. It is initially empty. The current template insertion mode is the insertion mode that was most recently added to the [=stack of template insertion modes=]. The algorithms in the sections below will *push* insertion modes onto this stack, meaning that the specified insertion mode is to be added to the stack, and *pop* insertion modes from the stack, which means that the most recently added insertion mode must be removed from the stack. --- When the steps below require the UA to reset the insertion mode appropriately, it means the UA must follow these steps: 1. Let |last| be false. 2. Let |node| be the last node in the [=stack of open elements=]. 3. |Loop|: If |node| is the first node in the stack of open elements, then set |last| to true, and, if the parser was originally created as part of the [=HTML fragment parsing algorithm=] ([=fragment case=]), set |node| to the [=context=] element passed to that algorithm. 4. If |node| is a <{select}> element, run these substeps: 1. If |last| is true, jump to the step below labeled |Done|. 2. Let |ancestor| be |node|. 3. |Loop|: If |ancestor| is the first node in the [=stack of open elements=], jump to the step below labeled |Done|. 4. Let |ancestor| be the node before |ancestor| in the [=stack of open elements=]. 5. If |ancestor| is a <{template}> node, jump to the step below labeled |Done|. 6. If |ancestor| is a <{table}> node, switch the [=insertion mode=] to "[=in select in table=]" and abort these steps. 7. Jump back to the step labeled |Loop|. 8. |Done|: Switch the [=insertion mode=] to "[=in select=]" and abort these steps. 5. If |node| is a <{td}> or <{th}> element and |last| is false, then switch the [=insertion mode=] to "[=in cell=]" and abort these steps. 6. If |node| is a <{tr}> element, then switch the [=insertion mode=] to "[=in row=]" and abort these steps. 7. If |node| is a <{tbody}>, <{thead}>, or <{tfoot}> element, then switch the [=insertion mode=] to "[=in table body=]" and abort these steps. 8. If |node| is a <{caption}> element, then switch the [=insertion mode=] to "[=in caption=]" and abort these steps. 9. If |node| is a <{colgroup}> element, then switch the [=insertion mode=] to "[=in column group=]" and abort these steps. 10. If |node| is a <{table}> element, then switch the [=insertion mode=] to "[=in table=]" and abort these steps. 11. If |node| is a <{template}> element, then switch the [=insertion mode=] to the [=current template insertion mode=] and abort these steps. 12. If |node| is a <{head}> element and |last| is false, then switch the [=insertion mode=] to "[=in head=]" and abort these steps. 13. If |node| is a <{body}> element, then switch the [=insertion mode=] to "[=in body=]" and abort these steps. 14. If |node| is a <{frameset}> element, then switch the [=insertion mode=] to "[=in frameset=]" and abort these steps. ([=fragment case=]) 15. If |node| is an <{html}> element, run these substeps: 1. If the `head` element pointer is null, switch the [=insertion mode=] to "[=before head=]" and abort these steps. ([=fragment case=]) 2. Otherwise, the `head` element pointer is not null, switch the [=insertion mode=] to "[=after head=]" and abort these steps. 16. If |last| is true, then switch the [=insertion mode=] to "[=in body=]" and abort these steps. ([=fragment case=]) 17. Let |node| now be the node before |node| in the [=stack of open elements=]. 18. Return to the step labeled |Loop|. #### The stack of open elements #### {#the-stack-of-open-elements} Initially, the stack of open elements is empty. The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack (notwithstanding when the stack is manipulated in a random access fashion as part of [=adoption agency algorithm|the handling for misnested tags=]).

The "[=before html=]" [=insertion mode=] creates the <{html}> [=document element=], which is then added to the stack.

In the [=fragment case=], the [=stack of open elements=] is initialized to contain an <{html}> element that is created as part of [=HTML fragment parsing algorithm|that algorithm=]. (The [=fragment case=] skips the "[=before html=]" [=insertion mode=].)

The <{html}> node, however it is created, is the topmost node of the stack. It only gets popped off the stack when the parser [=stop parsing|finishes=]. The current node is the bottommost node in this [=stack of open elements=]. The adjusted current node is the [=context=] element if the parser was created by the [=HTML fragment parsing algorithm=] and the [=stack of open elements=] has only one element in it ([=fragment case=]); otherwise, the [=adjusted current node=] is the [=current node=]. Elements in the [=stack of open elements=] fall into the following categories: : Special :: The following elements have varying levels of special parsing rules: HTML's <{address}>, <{applet}>, <{area}>, <{article}>, <{aside}>, <{base}>, <{basefont}>, <{bgsound}>, <{blockquote}>, <{body}>, <{br}>, <{button}>, <{caption}>, <{center}>, <{col}>, <{colgroup}>, <{dd}>, <{details}>, <{dir}>, <{div}>, <{dl}>, <{dt}>, <{embed}>, <{fieldset}>, <{figcaption}>, <{figure}>, <{footer}>, <{form}>, <{frame}>, <{frameset}>, <{h1}>, <{h2}>, <{h3}>, <{h4}>, <{h5}>, <{h6}>, <{head}>, <{header}>, <{hr}>, <{html}>, <{iframe}>, <{img}>, <{input}>, <{li}>, <{link}>, <{listing}>, <{main}>, <{marquee}>, <{meta}>, <{nav}>, <{noembed}>, <{noframes}>, <{noscript}>, <{object}>, <{ol}>, <{p}>, <{param}>, <{plaintext}>, <{pre}>, <{script}>, <{section}>, <{select}>, <{source}>, <{style}>, <{summary}>, <{table}>, <{tbody}>, <{td}>, <{template}>, <{textarea}>, <{tfoot}>, <{th}>, <{thead}>, <{title}>, <{tr}>, <{track}>, <{ul}>, <{wbr}>, <{xmp}>; MathML <{mi}>, MathML <{mo}>, MathML <{mn}>, MathML <{ms}>, MathML <{mtext}>, and MathML <{annotation-xml}>; and SVG <{foreignObject}>, SVG <{desc}>, and SVG title.

An `image` start tag token is handled by the tree builder, but it is not in this list because it is not an element; it gets turned into an <{img}> element.

: Formatting :: The following HTML elements are those that end up in the [=list of active formatting elements=]: <{a}>, <{b}>, <{big}>, <{code}>, <{em}>, <{font}>, <{i}>, <{nobr}>, <{s}>, <{small}>, <{strike}>, <{strong}>, <{tt}>, and <{u}>. : Ordinary :: All other elements found while parsing an HTML document.

Typically, the [=special=] elements have the start and end tag tokens handled specifically, while [=ordinary=] elements' tokens fall into "any other start tag" and "any other end tag" clauses, and some parts of the tree builder check if a particular element in the [=stack of open elements=] is in the [=special=] category. However, some elements (e.g., the <{option}> element) have their start or end tag tokens handled specifically, but are still not in the [=special=] category, so that they get the [=ordinary=] handling elsewhere.

The [=stack of open elements=] is said to have an element |target node| in a specific scope consisting of a list of element types |list| when the following algorithm terminates in a match state: 1. Initialize |node| to be the [=current node=] (the bottommost node of the stack). 2. If |node| is the target node, terminate in a match state. 3. Otherwise, if |node| is one of the element types in |list|, terminate in a failure state. 4. Otherwise, set |node| to the previous entry in the [=stack of open elements=] and return to step 2. (This will never fail, since the loop will always terminate in the previous step if the top of the stack — an <{html}> element — is reached.) The [=stack of open elements=] is said to have a particular element in scope when it [=has that element in the specific scope=] consisting of the following element types:

title

The [=stack of open elements=] is said to have a particular element in list item scope when it [=has that element in the specific scope=] consisting of the following element types:

* All the element types listed above for the [=in scope|has an element in scope=] algorithm. * <{ol}> in the [=HTML namespace=] * <{ul}> in the [=HTML namespace=] The [=stack of open elements=] is said to have a particular element in button scope when it [=has that element in the specific scope=] consisting of the following element types:

* All the element types listed above for the [=in scope|has an element in scope=] algorithm. * <{button}> in the [=HTML namespace=] The [=stack of open elements=] is said to have a particular element in table scope when it [=has that element in the specific scope=] consisting of the following element types:

* <{html}> in the [=HTML namespace=] * <{table}> in the [=HTML namespace=] * <{template}> in the [=HTML namespace=] The [=stack of open elements=] is said to have a particular element in select scope when it [=has that element in the specific scope=] consisting of all element types *except* the following:

* <{optgroup}> in the [=HTML namespace=] * <{option}> in the [=HTML namespace=] Nothing happens if at any time any of the elements in the [=stack of open elements=] are moved to a new location in, or removed from, the {{Document}} tree. In particular, the stack is not changed in this situation. This can cause, amongst other strange effects, content to be appended to nodes that are no longer in the DOM.

In some cases (namely, when [=adoption agency algorithm|closing misnested formatting elements=]), the stack is manipulated in a random-access fashion.

#### The list of active formatting elements #### {#the-list-of-active-formatting-elements} Initially, the list of active formatting elements is empty. It is used to handle mis-nested [=formatting|formatting element tags=]. The list contains elements in the [=formatting=] category, and [=markers=]. The markers are inserted when entering <{applet}>, <{object}>, <{marquee}>, <{template}>, <{td}>, <{th}>, and <{caption}> elements, and are used to prevent formatting from "leaking" into <{applet}>, <{object}>, <{marquee}>, <{template}>, <{td}>, <{th}>, and <{caption}> elements. In addition, each element in the [=list of active formatting elements=] is associated with the token for which it was created, so that further elements can be created for that token if necessary. When the steps below require the UA to push onto the list of active formatting elements an element |element|, the UA must perform the following steps: 1. If there are already three elements in the [=list of active formatting elements=] after the last [=marker=], if any, or anywhere in the list if there are no [=markers=], that have the same tag name, namespace, and attributes as |element|, then remove the earliest such element from the [=list of active formatting elements=]. For these purposes, the attributes must be compared as they were when the elements were created by the parser; two elements have the same attributes if all their parsed attributes can be paired such that the two attributes in each pair have identical names, namespaces, and values (the order of the attributes does not matter).

This is the Noah's Ark clause. But with three per family instead of two.

2. Add |element| to the [=list of active formatting elements=]. When the steps below require the UA to reconstruct the active formatting elements, the UA must perform the following steps: 1. If there are no entries in the [=list of active formatting elements=], then there is nothing to reconstruct; stop this algorithm. 2. If the last (most recently added) entry in the [=list of active formatting elements=] is a [=marker=], or if it is an element that is in the [=stack of open elements=], then there is nothing to reconstruct; stop this algorithm. 3. Let |entry| be the last (most recently added) element in the [=list of active formatting elements=]. 4. |Rewind|: If there are no entries before |entry| in the [=list of active formatting elements=], then jump to the step labeled |Create|. 5. Let |entry| be the entry one earlier than |entry| in the [=list of active formatting elements=]. 6. If |entry| is neither a [=marker=] nor an element that is also in the [=stack of open elements=], go to the step labeled |Rewind|. 7. |Advance|: Let |entry| be the element one later than |entry| in the [=list of active formatting elements=]. 8. |Create|: [=Insert an HTML element=] for the token for which the element |entry| was created, to obtain |new element|. 9. Replace the entry for |entry| in the list with an entry for |new element|. 10. If the entry for |new element| in the [=list of active formatting elements=] is not the last entry in the list, return to the step labeled |Advance|. This has the effect of reopening all the formatting elements that were opened in the current body, cell, or caption (whichever is youngest) that haven't been explicitly closed.

The way this specification is written, the [=list of active formatting elements=] always consists of elements in chronological order with the least recently added element first and the most recently added element last (except for while steps 7 to 10 of the above algorithm are being executed, of course).

When the steps below require the UA to clear the list of active formatting elements up to the last marker, the UA must perform the following steps: 1. Let |entry| be the last (most recently added) entry in the [=list of active formatting elements=]. 2. Remove |entry| from the [=list of active formatting elements=]. 3. If |entry| was a [=marker=], then stop the algorithm at this point. The list has been cleared up to the last [=marker=]. 4. Go to step 1. #### The element pointers #### {#the-element-pointers} Initially, the <{head}> element pointer and the <{form}> element pointer are both null. Once a <{head}> element has been parsed (whether implicitly or explicitly) the `head` element pointer gets set to point to this node. The `form` element pointer points to the last <{form}> element that was opened and whose end tag has not yet been seen. It is used to make form controls associate with forms in the face of dramatically bad markup, for historical reasons. It is ignored inside <{template}> elements. #### Other parsing state flags #### {#other-parsing-state-flags} The scripting flag is set to "enabled" if [=scripting was enabled=] for the {{Document}} with which the parser is associated when the parser was created, and "disabled" otherwise.

The [=scripting flag=] can be enabled even when the parser was originally created for the [=HTML fragment parsing algorithm=], even though <{script}> elements don't execute in that case.

The frameset-ok flag is set to "ok" when the parser is created. It is set to "not ok" after certain tokens are seen. ### Tokenization ### {#tokenization} Implementations must act as if they used the following state machine to tokenize HTML. The state machine must start in the [=data state=]. Most states consume a single character, which may have various side-effects, and either switches the state machine to a new state to [=reconsume=] the [=current input character=], or switches it to a new state to consume the next character, or stays in the same state to consume the next character. Some states have more complicated behavior and can consume several characters before switching to another state. In some cases, the tokenizer state is also changed by the tree construction stage. When a state says to reconsume a matched character in a specified state, that means to switch to that state, but when it attempts to consume the [=next input character=], provide it with the [=current input character=] instead. The exact behavior of certain states depends on the [=insertion mode=] and the [=stack of open elements=]. Certain states also use a |temporary buffer| to track progress, and the [=character reference state=] uses a return state to return to the state it was invoked from. The output of the tokenization step is a series of zero or more of the following tokens: DOCTYPE, start tag, end tag, comment, character, end-of-file. DOCTYPE tokens have a name, a public identifier, a system identifier, and a force-quirks flag. When a DOCTYPE token is created, its name, public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string), and the [=force-quirks flag=] must be set to *off* (its other state is *on*). Start and end tag tokens have a tag name, a self-closing flag, and a list of attributes, each of which has a name and a value. When a start or end tag token is created, its [=self-closing flag=] must be unset (its other state is that it be set), and its attributes list must be empty. Comment and character tokens have data. When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the tokenization stage, and can insert additional characters into the stream. (For example, the <{script}> element can result in scripts executing and using the [=dynamic markup insertion=] APIs to insert characters into the stream being tokenized.)

Creating a token and emitting it are distinct actions. It is possible for a token to be created but implicitly abandoned (never emitted), e.g., if the file ends unexpectedly while processing the characters that are being parsed into a start tag token.

When a start tag token is emitted with its [=self-closing flag=] set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a [=parse error=]. When an end tag token is emitted with attributes, that is a [=parse error=]. When an end tag token is emitted with its [=self-closing flag=] set, that is a [=parse error=]. An appropriate end tag token is an end tag token whose tag name matches the tag name of the last start tag to have been emitted from this tokenizer, if any. If no start tag has been emitted from this tokenizer, then no end tag token is appropriate. Before each step of the tokenizer, the user agent must first check the [=parser pause flag=]. If it is true, then the tokenizer must abort the processing of any nested invocations of the tokenizer, yielding control back to the caller. The tokenizer state machine consists of the states defined in the following subsections. #### Data state #### {#data-state} Consume the [=next input character=]:

: U+0026 AMPERSAND (&) :: Set the [=return state=] to the [=data state=]. Switch to the [=character reference state=]. : U+003C LESS-THAN SIGN (<) :: Switch to the [=tag open state=]. : U+0000 NULL :: [=Parse error=]. Emit the [=current input character=] as a character token. : EOF :: Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### RCDATA state #### {#RCDATA-state} Consume the [=next input character=]:

: U+0026 AMPERSAND (&) :: Set the [=return state=] to the [=RCDATA state=]. Switch to the [=character reference state=]. : U+003C LESS-THAN SIGN (<) :: Switch to the [=RCDATA less-than sign state=]. : U+0000 NULL :: [=Parse error=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### RAWTEXT state #### {#rawtext-state} Consume the [=next input character=]:

: U+003C LESS-THAN SIGN (<) :: Switch to the [=RAWTEXT less-than sign state=]. : U+0000 NULL :: [=Parse error=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### Script data state #### {#script-data-state} Consume the [=next input character=]:

: U+003C LESS-THAN SIGN (<) :: Switch to the [=script data less-than sign state=]. : U+0000 NULL :: [=Parse error=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### PLAINTEXT state #### {#plaintext-state} Consume the [=next input character=]:

: U+0000 NULL :: [=Parse error=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### Tag open state #### {#tag-open-state} Consume the [=next input character=]:

: U+0021 EXCLAMATION MARK (!) :: Switch to the [=markup declaration open state=]. : U+002F SOLIDUS (/) :: Switch to the [=end tag open state=]. : [=ASCII letter=] :: Create a new start tag token, set its tag name to the empty string. [=Reconsume=] in the [=tag name state=]. : U+003F QUESTION MARK (?) :: [=Parse error=]. Create a comment token whose data is the empty string. [=Reconsume=] in the [=bogus comment state=]. : Anything else :: [=Parse error=]. Emit a U+003C LESS-THAN SIGN character token. [=Reconsume=] in the [=data state=]. #### End tag open state #### {#end-tag-open-state} Consume the [=next input character=]:

: [=ASCII letter=] :: Create a new end tag token, set its tag name to the empty string. [=Reconsume=] in the [=tag name state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Switch to the [=data state=]. : EOF :: [=Parse error=]. Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token and an end-of-file token. : Anything else :: [=Parse error=]. Create a comment token whose data is the empty string. [=Reconsume=] in the [=bogus comment state=]. #### Tag name state #### {#tag-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=before attribute name state=]. : U+002F SOLIDUS (/) :: Switch to the [=self-closing start tag state=]. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current tag token. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current tag token's tag name. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current tag token's tag name. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current tag token's tag name. #### RCDATA less-than sign state #### {#RCDATA-less-than-sign-state} Consume the [=next input character=]:

: U+002F SOLIDUS (/) :: Set the |temporary buffer| to the empty string. Switch to the [=RCDATA end tag open state=]. : Anything else :: Emit a U+003C LESS-THAN SIGN character token. [=Reconsume=] in the [=RCDATA state=]. #### RCDATA end tag open state #### {#RCDATA-end-tag-open-state} Consume the [=next input character=]:

: [=ASCII letter=] :: Create a new end tag token, set its tag name to the empty string. [=Reconsume=] in [=RCDATA end tag name state=]. : Anything else :: Emit a U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS character token. [=Reconsume=] in the [=RCDATA state=]. #### RCDATA end tag name state #### {#RCDATA-end-tag-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=before attribute name state=]. Otherwise, treat it as per the "anything else" entry below. : U+002F SOLIDUS (/) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=self-closing start tag state=]. Otherwise, treat it as per the "anything else" entry below. : U+003E GREATER-THAN SIGN (>) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=data state=] and emit the current tag token. Otherwise, treat it as per the "anything else" entry below. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : [=Lowercase ASCII letter=] :: Append the [=current input character=] to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : Anything else :: Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and a character token for each of the characters in the |temporary buffer| (in the order they were added to the buffer). [=Reconsume=] in the [=RCDATA state=]. #### RAWTEXT less-than sign state #### {#rawtext-less-than-sign-state} Consume the [=next input character=]:

: U+002F SOLIDUS (/) :: Set the |temporary buffer| to the empty string. Switch to the [=RAWTEXT end tag open state=]. : Anything else :: Emit a U+003C LESS-THAN SIGN character token. [=Reconsume=] in the [=RAWTEXT state=]. #### RAWTEXT end tag open state #### {#rawtext-end-tag-open-state} Consume the [=next input character=]:

: [=ASCII letter=] :: Create a new end tag token, set its tag name to the empty string. [=Reconsume=] in the [=RAWTEXT end tag name state=]. : Anything else :: Emit a U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS character token. [=Reconsume=] in the [=RAWTEXT state=]. #### RAWTEXT end tag name state #### {#rawtext-end-tag-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=before attribute name state=]. Otherwise, treat it as per the "anything else" entry below. : U+002F SOLIDUS (/) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=self-closing start tag state=]. Otherwise, treat it as per the "anything else" entry below. : U+003E GREATER-THAN SIGN (>) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=data state=] and emit the current tag token. Otherwise, treat it as per the "anything else" entry below. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : [=Lowercase ASCII letter=] :: Append the [=current input character=] to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : Anything else :: Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and a character token for each of the characters in the |temporary buffer| (in the order they were added to the buffer). [=Reconsume=] in the [=RAWTEXT state=]. #### Script data less-than sign state #### {#script-data-less-than-sign-state} Consume the [=next input character=]:

: U+002F SOLIDUS (/) :: Set the |temporary buffer| to the empty string. Switch to the [=script data end tag open state=]. : U+0021 EXCLAMATION MARK (!) :: Switch to the [=script data escape start state=]. Emit a U+003C LESS-THAN SIGN character token and a U+0021 EXCLAMATION MARK character token. : Anything else :: Emit a U+003C LESS-THAN SIGN character token. [=Reconsume=] in the [=script data state=]. #### Script data end tag open state #### {#script-data-end-tag-open-state} Consume the [=next input character=]:

: [=ASCII letter=] :: Create a new end tag token, set its tag name to the empty string. [=Reconsume=] in the [=script data end tag name state=]. : Anything else :: Emit a U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS character token. [=Reconsume=] in the [=script data state=]. #### Script data end tag name state #### {#script-data-end-tag-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=before attribute name state=]. Otherwise, treat it as per the "anything else" entry below. : U+002F SOLIDUS (/) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=self-closing start tag state=]. Otherwise, treat it as per the "anything else" entry below. : U+003E GREATER-THAN SIGN (>) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=data state=] and emit the current tag token. Otherwise, treat it as per the "anything else" entry below. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : [=Lowercase ASCII letter=] :: Append the [=current input character=] to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : Anything else :: Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and a character token for each of the characters in the |temporary buffer| (in the order they were added to the buffer). [=Reconsume=] in the [=script data state=]. #### Script data escape start state #### {#script-data-escape-start-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=script data escape start dash state=]. Emit a U+002D HYPHEN-MINUS character token. : Anything else :: [=Reconsume=] in the [=script data state=]. #### Script data escape start dash state #### {#script-data-escape-start-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=script data escaped dash dash state=]. Emit a U+002D HYPHEN-MINUS character token. : Anything else :: [=Reconsume=] in the [=script data state=]. #### Script data escaped state #### {#script-data-escaped-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=script data escaped dash state=]. Emit a U+002D HYPHEN-MINUS character token. : U+003C LESS-THAN SIGN (<) :: Switch to the [=script data escaped less-than sign state=]. : U+0000 NULL :: [=Parse error=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### Script data escaped dash state #### {#script-data-escaped-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=script data escaped dash dash state=]. Emit a U+002D HYPHEN-MINUS character token. : U+003C LESS-THAN SIGN (<) :: Switch to the [=script data escaped less-than sign state=]. : U+0000 NULL :: [=Parse error=]. Switch to the [=script data escaped state=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Switch to the [=script data escaped state=]. Emit the [=current input character=] as a character token. #### Script data escaped dash dash state #### {#script-data-escaped-dash-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Emit a U+002D HYPHEN-MINUS character token. : U+003C LESS-THAN SIGN (<) :: Switch to the [=script data escaped less-than sign state=]. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=script data state=]. Emit a U+003E GREATER-THAN SIGN character token. : U+0000 NULL :: [=Parse error=]. Switch to the [=script data escaped state=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Switch to the [=script data escaped state=]. Emit the [=current input character=] as a character token. #### Script data escaped less-than sign state #### {#script-data-escaped-less-than-sign-state} Consume the [=next input character=]:

: U+002F SOLIDUS (/) :: Set the |temporary buffer| to the empty string. Switch to the [=script data escaped end tag open state=]. : [=ASCII letter=] :: Set the |temporary buffer| to the empty string. Emit a U+003C LESS-THAN SIGN character token. [=Reconsume=] in the [=script data double escape start state=]. : Anything else :: Emit a U+003C LESS-THAN SIGN character token. [=Reconsume=] in the [=script data escaped state=]. #### Script data escaped end tag open state #### {#script-data-escaped-end-tag-open-state} Consume the [=next input character=]:

: [=ASCII letter=] :: Create a new end tag token. [=Reconsume=] in the [=script data escaped end tag name state=]. (Don't emit the token yet; further details will be filled in before it is emitted.) : Anything else :: Emit a U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS character token. [=Reconsume=] in the [=script data escaped state=]. #### Script data escaped end tag name state #### {#script-data-escaped-end-tag-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=before attribute name state=]. Otherwise, treat it as per the "anything else" entry below. : U+002F SOLIDUS (/) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=self-closing start tag state=]. Otherwise, treat it as per the "anything else" entry below. : U+003E GREATER-THAN SIGN (>) :: If the current end tag token is an [=appropriate end tag token=], then switch to the [=data state=] and emit the current tag token. Otherwise, treat it as per the "anything else" entry below. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : [=Lowercase ASCII letter=] :: Append the [=current input character=] to the current tag token's tag name. Append the [=current input character=] to the |temporary buffer|. : Anything else :: Emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and a character token for each of the characters in the |temporary buffer| (in the order they were added to the buffer). [=Reconsume=] in the [=script data escaped state=]. #### Script data double escape start state #### {#script-data-double-escape-start-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE : U+002F SOLIDUS (/) : U+003E GREATER-THAN SIGN (>) :: If the |temporary buffer| is the string "`script`", then switch to the [=script data double escaped state=]. Otherwise, switch to the [=script data escaped state=]. Emit the [=current input character=] as a character token. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the |temporary buffer|. Emit the [=current input character=] as a character token. : [=Lowercase ASCII letter=] :: Append the [=current input character=] to the |temporary buffer|. Emit the [=current input character=] as a character token. : Anything else :: [=Reconsume=] in the [=script data escaped state=]. #### Script data double escaped state #### {#script-data-double-escaped-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=script data double escaped dash state=]. Emit a U+002D HYPHEN-MINUS character token. : U+003C LESS-THAN SIGN (<) :: Switch to the [=script data double escaped less-than sign state=]. Emit a U+003C LESS-THAN SIGN character token. : U+0000 NULL :: [=Parse error=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token. #### Script data double escaped dash state #### {#script-data-double-escaped-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=script data double escaped dash dash state=]. Emit a U+002D HYPHEN-MINUS character token. : U+003C LESS-THAN SIGN (<) :: Switch to the [=script data double escaped less-than sign state=]. Emit a U+003C LESS-THAN SIGN character token. : U+0000 NULL :: [=Parse error=]. Switch to the [=script data double escaped state=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Switch to the [=script data double escaped state=]. Emit the [=current input character=] as a character token. #### Script data double escaped dash dash state #### {#script-data-double-escaped-dash-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Emit a U+002D HYPHEN-MINUS character token. : U+003C LESS-THAN SIGN (<) :: Switch to the [=script data double escaped less-than sign state=]. Emit a U+003C LESS-THAN SIGN character token. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=script data state=]. Emit a U+003E GREATER-THAN SIGN character token. : U+0000 NULL :: [=Parse error=]. Switch to the [=script data double escaped state=]. Emit a U+FFFD REPLACEMENT CHARACTER character token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Switch to the [=script data double escaped state=]. Emit the [=current input character=] as a character token. #### Script data double escaped less-than sign state #### {#script-data-double-escaped-less-than-sign-state} Consume the [=next input character=]:

: U+002F SOLIDUS (/) :: Set the |temporary buffer| to the empty string. Switch to the [=script data double escape end state=]. Emit a U+002F SOLIDUS character token. : Anything else :: [=Reconsume=] in the [=script data double escaped state=]. #### Script data double escape end state #### {#script-data-double-escape-end-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE : U+002F SOLIDUS (/) : U+003E GREATER-THAN SIGN (>) :: If the |temporary buffer| is the string "`script`", then switch to the [=script data escaped state=]. Otherwise, switch to the [=script data double escaped state=]. Emit the [=current input character=] as a character token. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the |temporary buffer|. Emit the [=current input character=] as a character token. : [=Lowercase ASCII letter=] :: Append the [=current input character=] to the |temporary buffer|. Emit the [=current input character=] as a character token. : Anything else :: [=Reconsume=] in the [=script data double escaped state=]. #### Before attribute name state #### {#before-attribute-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+002F SOLIDUS (/) : U+003E GREATER-THAN SIGN (>) : EOF :: [=Reconsume=] in the [=after attribute name state=]. : U+003D EQUALS SIGN (=) :: [=Parse error=]. Start a new attribute in the current tag token. Set that attribute's name to the [=current input character=], and its value to the empty string. Switch to the [=attribute name state=]. : Anything else :: Start a new attribute in the current tag token. Set that attribute's name and value to the empty string. [=Reconsume=] in the [=attribute name state=]. #### Attribute name state #### {#attribute-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE : U+002F SOLIDUS (/) : U+003E GREATER-THAN SIGN (>) : EOF :: [=Reconsume=] in the [=after attribute name state=]. : U+003D EQUALS SIGN (=) :: Switch to the [=before attribute value state=]. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current attribute's name. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current attribute's name. : U+0022 QUOTATION MARK (") : U+0027 APOSTROPHE (') : U+003C LESS-THAN SIGN (<) :: [=Parse error=]. Treat it as per the "anything else" entry below. : Anything else :: Append the [=current input character=] to the current attribute's name. When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute's name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a [=parse error=] and the new attribute must be removed from the token.

If an attribute is so removed from a token, it, and the value that gets associated with it, if any, are never subsequently used by the parser, and are therefore effectively discarded. Removing the attribute in this way does not change its status as the "current attribute" for the purposes of the tokenizer, however.

#### After attribute name state #### {#after-attribute-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+002F SOLIDUS (/) :: Switch to the [=self-closing start tag state=]. : U+003D EQUALS SIGN (=) :: Switch to the [=before attribute value state=]. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current tag token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Start a new attribute in the current tag token. Set that attribute's name and value to the empty string. [=Reconsume=] in the [=attribute name state=]. #### Before attribute value state #### {#before-attribute-value-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+0022 QUOTATION MARK (") :: Switch to the [=attribute value (double-quoted) state=]. : U+0027 APOSTROPHE (') :: Switch to the [=attribute value (single-quoted) state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Treat it as per the "anything else" entry below. : Anything else :: [=Reconsume=] in the [=attribute value (unquoted) state=]. #### Attribute value (double-quoted) state #### {#attribute-value-double-quoted-state} Consume the [=next input character=]:

: U+0022 QUOTATION MARK (") :: Switch to the [=after attribute value (quoted) state=]. : U+0026 AMPERSAND (&) :: Set the [=return state=] to the [=attribute value (double-quoted) state=]. Switch to the [=character reference state=]. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current attribute's value. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current attribute's value. #### Attribute value (single-quoted) state #### {#attribute-value-single-quoted-state} Consume the [=next input character=]:

: U+0027 APOSTROPHE (') :: Switch to the [=after attribute value (quoted) state=]. : U+0026 AMPERSAND (&) :: Set the [=return state=] to the [=attribute value (single-quoted) state=]. Switch to the [=character reference state=]. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current attribute's value. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current attribute's value. #### Attribute value (unquoted) state #### {#attribute-value-unquoted-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=before attribute name state=]. : U+0026 AMPERSAND (&) :: Set the [=return state=] to the [=attribute value (unquoted) state=]. Switch to the [=character reference state=]. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current tag token. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current attribute's value. : U+0022 QUOTATION MARK (") : U+0027 APOSTROPHE (') : U+003C LESS-THAN SIGN (<) : U+003D EQUALS SIGN (=) : U+0060 GRAVE ACCENT (`) :: [=Parse error=]. Treat it as per the "anything else" entry below. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current attribute's value. #### After attribute value (quoted) state #### {#after-attribute-value-quoted-state} Consume the [=next input character=]:

: U+003E GREATER-THAN SIGN (>) :: Set the [=self-closing flag=] of the current tag token. Switch to the [=data state=]. Emit the current tag token. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=before attribute name state=]. #### Bogus comment state #### {#bogus-comment-state} Consume the [=next input character=]:

: U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the comment token. : EOF :: Emit the comment. Emit an end-of-file token. : U+0000 NULL :: Append a U+FFFD REPLACEMENT CHARACTER character to the comment token's data. : Anything else :: Append the [=current input character=] to the comment token's data. #### Markup declaration open state #### {#markup-declaration-open-state} If the next two characters are both U+002D HYPHEN-MINUS characters (-), consume those two characters, create a comment token whose data is the empty string, and switch to the [=comment start state=]. Otherwise, if the next seven characters are an [=ASCII case-insensitive=] match for the word "DOCTYPE", then consume those characters and switch to the [=DOCTYPE state=]. Otherwise, if there is an [=adjusted current node=] and it is not an element in the [=HTML namespace=] and the next seven characters are a [=case-sensitive=] match for the string "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET character before and after), then consume those characters and switch to the [=CDATA section state=]. Otherwise, this is a [=parse error=]. Create a comment token whose data is the empty string. Switch to the [=bogus comment state=] (don't consume anything in the current state). #### Comment start state #### {#comment-start-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=comment start dash state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Switch to the [=data state=]. Emit the comment token. : Anything else :: [=Reconsume=] in the [=comment state=]. #### Comment start dash state #### {#comment-start-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=comment end state=] : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Switch to the [=data state=]. Emit the comment token. : EOF :: [=Parse error=]. Emit the comment token. Emit an end-of-file token. : Anything else :: Append a U+002D HYPHEN-MINUS character (-) to the comment token's data. [=Reconsume=] in the [=comment state=]. #### Comment state #### {#comment-state} Consume the [=next input character=]:

: U+003C LESS-THAN SIGN (<) :: Append the [=current input character=] to the comment token's data. Switch to the [=comment less-than sign state=]. : U+002D HYPHEN-MINUS (-) :: Switch to the [=comment end dash state=] : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the comment token's data. : EOF :: [=Parse error=]. Emit the comment token. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the comment token's data. #### Comment less-than sign state #### {#comment-less-than-sign-state} Consume the [=next input character=]:

: U+0021 EXCLAMATION MARK (!) :: Append the [=current input character=] to the comment token's data. Switch to the [=comment less-than sign bang state=]. : U+003C LESS-THAN SIGN (<) :: Append the [=current input character=] to the comment token's data. : Anything else :: [=Reconsume=] in the [=comment state=]. #### Comment less-than sign bang state #### {#comment-less-than-sign-bang-state} Consume the next input character:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=comment less-than sign bang dash state=]. : Anything else :: [=Reconsume=] in the [=comment state=]. #### Comment less-than sign bang dash state #### {#comment-less-than-sign-bang-dash-state} Consume the next input character:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=comment less-than sign bang dash dash state=]. : Anything else :: [=Reconsume=] in the [=comment end dash state=]. #### Comment less-than sign bang dash dash state #### {#comment-less-than-sign-bang-dash-dash-state} Consume the next input character:

: U+003E GREATER-THAN SIGN (>) : EOF :: [=Reconsume=] in the [=comment end state=]. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=comment end state=]. #### Comment end dash state #### {#comment-end-dash-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Switch to the [=comment end state=] : EOF :: [=Parse error=]. Emit the comment token. Emit an end-of-file token. : Anything else :: Append a U+002D HYPHEN-MINUS character (-) to the comment token's data. [=Reconsume=] in the [=comment state=]. #### Comment end state #### {#comment-end-state} Consume the [=next input character=]:

: U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the comment token. : U+0021 EXCLAMATION MARK (!) :: Switch to the [=comment end bang state=]. : U+002D HYPHEN-MINUS (-) :: Append a U+002D HYPHEN-MINUS character (-) to the comment token's data. : EOF :: [=Parse error=]. Emit the comment token. Emit an end-of-file token. : Anything else :: Append two U+002D HYPHEN-MINUS characters (-) to the comment token's data. [=Reconsume=] in the [=comment state=]. #### Comment end bang state #### {#comment-end-bang-state} Consume the [=next input character=]:

: U+002D HYPHEN-MINUS (-) :: Append two U+002D HYPHEN-MINUS characters (-) and a U+0021 EXCLAMATION MARK character (!) to the comment token's data. Switch to the [=comment end dash state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Switch to the [=data state=]. Emit the comment token. : EOF :: [=Parse error=]. Emit the comment token. Emit an end-of-file token. : Anything else :: Append two U+002D HYPHEN-MINUS characters (-) and a U+0021 EXCLAMATION MARK character (!) to the comment token's data. [=Reconsume=] in the [=comment state=]. #### DOCTYPE state #### {#doctype-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=before DOCTYPE name state=]. : EOF :: [=Parse error=]. Create a new DOCTYPE token. Set its [=force-quirks flag=] to *on*. Emit the token. Emit an end-of-file token. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=before DOCTYPE name state=]. #### Before DOCTYPE name state #### {#before-doctype-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : [=Uppercase ASCII letter=] :: Create a new DOCTYPE token. Set the token's name to the lowercase version of the [=current input character=] (add 0x0020 to the character's code point). Switch to the [=DOCTYPE name state=]. : U+0000 NULL :: [=Parse error=]. Create a new DOCTYPE token. Set the token's name to a U+FFFD REPLACEMENT CHARACTER character. Switch to the [=DOCTYPE name state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Create a new DOCTYPE token. Set its [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit the token. : EOF :: [=Parse error=]. Create a new DOCTYPE token. Set its [=force-quirks flag=] to *on*. Emit the token. Emit an end-of-file token. : Anything else :: Create a new DOCTYPE token. Set the token's name to the [=current input character=]. Switch to the [=DOCTYPE name state=]. #### DOCTYPE name state #### {#doctype-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=after DOCTYPE name state=]. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current DOCTYPE token. : [=Uppercase ASCII letter=] :: Append the lowercase version of the [=current input character=] (add 0x0020 to the character's code point) to the current DOCTYPE token's name. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current DOCTYPE token's name. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current DOCTYPE token's name. #### After DOCTYPE name state #### {#after-doctype-name-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: If the six characters starting from the [=current input character=] are an [=ASCII case-insensitive=] match for the word "PUBLIC", then consume those characters and switch to the [=after DOCTYPE public keyword state=]. Otherwise, if the six characters starting from the [=current input character=] are an [=ASCII case-insensitive=] match for the word "SYSTEM", then consume those characters and switch to the [=after DOCTYPE system keyword state=]. Otherwise, this is a [=parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### After DOCTYPE public keyword state #### {#after-doctype-public-keyword-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=before DOCTYPE public identifier state=]. : U+0022 QUOTATION MARK (") :: [=Parse error=]. Set the DOCTYPE token's public identifier to the empty string (not missing), then switch to the [=DOCTYPE public identifier (double-quoted) state=]. : U+0027 APOSTROPHE (') :: [=Parse error=]. Set the DOCTYPE token's public identifier to the empty string (not missing), then switch to the [=DOCTYPE public identifier (single-quoted) state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### Before DOCTYPE public identifier state #### {#before-doctype-public-identifier-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+0022 QUOTATION MARK (") :: Set the DOCTYPE token's public identifier to the empty string (not missing), then switch to the [=DOCTYPE public identifier (double-quoted) state=]. : U+0027 APOSTROPHE (') :: Set the DOCTYPE token's public identifier to the empty string (not missing), then switch to the [=DOCTYPE public identifier (single-quoted) state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### DOCTYPE public identifier (double-quoted) state #### {#doctype-public-identifier-double-quoted-state} Consume the [=next input character=]:

: U+0022 QUOTATION MARK (") :: Switch to the [=after DOCTYPE public identifier state=]. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current DOCTYPE token's public identifier. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current DOCTYPE token's public identifier. #### DOCTYPE public identifier (single-quoted) state #### {#doctype-public-identifier-single-quoted-state} Consume the [=next input character=]:

: U+0027 APOSTROPHE (') :: Switch to the [=after DOCTYPE public identifier state=]. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current DOCTYPE token's public identifier. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current DOCTYPE token's public identifier. #### After DOCTYPE public identifier state #### {#after-doctype-public-identifier-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=between DOCTYPE public and system identifiers state=]. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current DOCTYPE token. : U+0022 QUOTATION MARK (") :: [=Parse error=]. Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (double-quoted) state=]. : U+0027 APOSTROPHE (') :: [=Parse error=]. Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (single-quoted) state=]. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### Between DOCTYPE public and system identifiers state #### {#between-doctype-public-and-system-identifiers-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the current DOCTYPE token. : U+0022 QUOTATION MARK (") :: Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (double-quoted) state=]. : U+0027 APOSTROPHE (') :: Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (single-quoted) state=]. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### After DOCTYPE system keyword state #### {#after-doctype-system-keyword-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Switch to the [=before DOCTYPE system identifier state=]. : U+0022 QUOTATION MARK (") :: [=Parse error=]. Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (double-quoted) state=]. : U+0027 APOSTROPHE (') :: [=Parse error=]. Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (single-quoted) state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### Before DOCTYPE system identifier state #### {#before-doctype-system-identifier-state} Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE :: Ignore the character. : U+0022 QUOTATION MARK (") :: Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (double-quoted) state=]. : U+0027 APOSTROPHE (') :: Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the [=DOCTYPE system identifier (single-quoted) state=]. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=bogus DOCTYPE state=]. #### DOCTYPE system identifier (double-quoted) state #### {#doctype-system-identifier-double-quoted-state} Consume the [=next input character=]:

: U+0022 QUOTATION MARK (") :: Switch to the [=after DOCTYPE system identifier state=]. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current DOCTYPE token's system identifier. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current DOCTYPE token's system identifier. #### DOCTYPE system identifier (single-quoted) state #### {#doctype-system-identifier-single-quoted-state} Consume the [=next input character=]:

: U+0027 APOSTROPHE (') :: Switch to the [=after DOCTYPE system identifier state=]. : U+0000 NULL :: [=Parse error=]. Append a U+FFFD REPLACEMENT CHARACTER character to the current DOCTYPE token's system identifier. : U+003E GREATER-THAN SIGN (>) :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Switch to the [=data state=]. Emit that DOCTYPE token. : EOF :: [=Parse error=]. Set the DOCTYPE token's [=force-quirks flag=] to *on*. Emit that DOCTYPE token. Emit an end-of-file token. : Anything else :: Append the [=current input character=] to the current DOCTYPE token's system identifier. #### After DOCTYPE system identifier state #### {#after-doctype-system-identifier-state} Consume the [=next input character=]:

: U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. Emit the DOCTYPE token. : EOF :: Emit the DOCTYPE token. Emit an end-of-file token. : Anything else :: Ignore the character. #### CDATA section state #### {#CDATA-section-state} Consume the [=next input character=]:

: U+005D RIGHT SQUARE BRACKET (]) :: Switch to the [=CDATA section bracket state=]. : EOF :: [=Parse error=]. Emit an end-of-file token. : Anything else :: Emit the [=current input character=] as a character token.

U+0000 NULL characters are handled in the tree construction stage, as part of the [[#the-rules-for-parsing-tokens-in-foreign-content|in foreign content]] insertion mode, which is the only place where CDATA sections can appear.

#### CDATA section bracket state #### {#CDATA-section-bracket-state} Consume the [=next input character=]:

: U+005D RIGHT SQUARE BRACKET (]) :: Switch to the [=CDATA section end state=]. : Anything else :: Emit a U+005D RIGHT SQUARE BRACKET character token. [=Reconsume=] in the [=CDATA section state=] #### CDATA section end state #### {#CDATA-section-end-state} Consume the [=next input character=]:

: U+005D RIGHT SQUARE BRACKET (]) :: Emit a U+005D RIGHT SQUARE BRACKET character token. : U+003E GREATER-THAN SIGN (>) :: Switch to the [=data state=]. : Anything else :: Emit two U+005D RIGHT SQUARE BRACKET character tokens. [=Reconsume=] in the [=CDATA section state=] #### Character reference state #### {#character-reference-state} Set the [=temporary buffer=] to the empty string. Append a U+0026 AMPERSAND (&) character to the temporary buffer. Consume the [=next input character=]:

: U+0009 CHARACTER TABULATION (tab) : U+000A LINE FEED (LF) : U+000C FORM FEED (FF) : U+0020 SPACE : U+003C LESS-THAN SIGN : U+0026 AMPERSAND : EOF :: [=Reconsume=] in the [=character reference end state=]. : U+0023 NUMBER SIGN (#) :: Append the [=current input character=] to the [=temporary buffer=]. Switch to the [=numeric character reference state=]. : Anything else :: Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the [[#named-character-references]] table (in a [=case-sensitive=] manner). Append each character to the [=temporary buffer=] when it's consumed. If no match can be made and the [=temporary buffer=] consists of a U+0026 AMPERSAND character (&) followed by a sequence of one or more [=alphanumeric ASCII characters=] and a U+003B SEMICOLON character (;), then this is a [=parse error=]. If no match can be made, switch to the [=character reference end state=]. If the character reference was consumed as part of an attribute ([=return state=] is either [=attribute value (double-quoted) state=], [=attribute value (single-quoted) state=] or [=attribute value (unquoted) state=]), and the last character matched is not a U+003B SEMICOLON character (;), and the [=next input character=] is either a U+003D EQUALS SIGN character (=) or an [=alphanumeric ASCII character=], then, for historical reasons, switch to the [=character reference end state=]. If the last character matched is not a U+003B SEMICOLON character (;), this is a [=parse error=]. Set the [=temporary buffer=] to the empty string. Append one or two characters corresponding to the character reference name (as given by the second column of the [[#named-character-references]] table) to the [=temporary buffer=]. Switch to the [=character reference end state=].

If the markup contains (not in an attribute) the string `I'm ¬it; I tell you`, the character reference is parsed as "not", as in, `I'm ¬it; I tell you` (and this is a parse error). But if the markup was `I'm ∉ I tell you`, the character reference would be parsed as "notin;", resulting in `I'm ∉ I tell you` (and no parse error). However, if the markup contains the string `I'm &notit; I tell you` in an attribute, there is no parse error and the string `I'm ¬it; I tell you` is the result of parsing.

#### Numeric character reference state #### {#numeric-character-reference-state} Set the character reference code to zero (0). Consume the [=next input character=]:

: U+0078 LATIN SMALL LETTER X : U+0058 LATIN CAPITAL LETTER X :: Append the [=current input character=] to the [=temporary buffer=]. Switch to the [=hexadecimal character reference start state=]. : Anything else :: [=Reconsume=] in the [=decimal character reference start state=]. #### Hexadecimal character reference start state #### {#hexadecimal-character-reference-start-state} Consume the [=next input character=]:

: [=ASCII hex digit=] :: [=Reconsume=] in the [=hexadecimal character reference state=]. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=character reference end state=]. #### Decimal character reference start state #### {#decimal-character-reference-start-state} Consume the [=next input character=]:

: [=ASCII digit=] :: [=Reconsume=] in the [=decimal character reference state=]. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=character reference end state=]. #### Hexadecimal character reference state #### {#hexadecimal-character-reference-state} Consume the [=next input character=]:

: [=ASCII digit=] :: Multiply the [=character reference code=] by 16. Add a numeric version of the [=current input character=] (subtract 0x0030 from the character's code point) to the [=character reference code=]. : [=Uppercase ASCII hex digit=] :: Multiply the [=character reference code=] by 16. Add a numeric version of the [=current input character=] as a hexadecimal digit (subtract 0x0037 from the character's code point) to the [=character reference code=]. : [=Lowercase ASCII hex digit=] :: Multiply the [=character reference code=] by 16. Add a numeric version of the [=current input character=] as a hexadecimal digit (subtract 0x0057 from the character's code point) to the [=character reference code=]. : U+003B SEMICOLON character (;) :: Switch to the [=numeric character reference end state=]. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=numeric character reference end state=]. #### Decimal character reference state #### {#decimal-character-reference-state} Consume the [=next input character=]:

: [=ASCII digit=] :: Multiply the [=character reference code=] by 10. Add a numeric version of the [=current input character=] (subtract 0x0030 from the character's code point) to the [=character reference code=]. : U+003B SEMICOLON character (;) :: Switch to the [=numeric character reference end state=]. : Anything else :: [=Parse error=]. [=Reconsume=] in the [=numeric character reference end state=]. #### Numeric character reference end state #### {#numeric-character-reference-end-state} Check the [=character reference code=]. If that number is one of the numbers in the first column of the following table, then this is a [=parse error=]. Find the row with that number in the first column, and set the [=character reference code=] to the number in the second column of that row.

Number	Unicode character
0x00	U+FFFD	REPLACEMENT CHARACTER
0x80	U+20AC	EURO SIGN (€)
0x82	U+201A	SINGLE LOW-9 QUOTATION MARK (‚)
0x83	U+0192	LATIN SMALL LETTER F WITH HOOK (ƒ)
0x84	U+201E	DOUBLE LOW-9 QUOTATION MARK („)
0x85	U+2026	HORIZONTAL ELLIPSIS (…)
0x86	U+2020	DAGGER (†)
0x87	U+2021	DOUBLE DAGGER (‡)
0x88	U+02C6	MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ)
0x89	U+2030	PER MILLE SIGN (‰)
0x8A	U+0160	LATIN CAPITAL LETTER S WITH CARON (Š)
0x8B	U+2039	SINGLE LEFT-POINTING ANGLE QUOTATION MARK (‹)
0x8C	U+0152	LATIN CAPITAL LIGATURE OE (Œ)
0x8E	U+017D	LATIN CAPITAL LETTER Z WITH CARON (Ž)
0x91	U+2018	LEFT SINGLE QUOTATION MARK (‘)
0x92	U+2019	RIGHT SINGLE QUOTATION MARK (’)
0x93	U+201C	LEFT DOUBLE QUOTATION MARK (“)
0x94	U+201D	RIGHT DOUBLE QUOTATION MARK (”)
0x95	U+2022	BULLET (•)
0x96	U+2013	EN DASH (–)
0x97	U+2014	EM DASH (—)
0x98	U+02DC	SMALL TILDE (˜)
0x99	U+2122	TRADE MARK SIGN (™)
0x9A	U+0161	LATIN SMALL LETTER S WITH CARON (š)
0x9B	U+203A	SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (›)
0x9C	U+0153	LATIN SMALL LIGATURE OE (œ)
0x9E	U+017E	LATIN SMALL LETTER Z WITH CARON (ž)
0x9F	U+0178	LATIN CAPITAL LETTER Y WITH DIAERESIS (Ÿ)

If the number is in the range 0xD800 to 0xDFFF or is greater than 0x10FFFF, then this is a [=parse error=]. Set the [=character reference code=] to 0xFFFD. If the number is in the range 0x0001 to 0x0008, 0x000D to 0x001F, 0x007F to 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a [=parse error=]. Set the [=temporary buffer=] to the empty string. Append the Unicode character with code point equal to the [=character reference code=] to the [=temporary buffer=]. Switch to the [=character reference end state=]. #### Character reference end state #### {#character-reference-end-state} Consume the [=next input character=]. Check the [=return state=]:

: [=attribute value (double-quoted) state=] : [=attribute value (single-quoted) state=] : [=attribute value (unquoted) state=] :: Append each character in the [=temporary buffer=] (in the order they were added to the buffer) to the current attribute's value. : Anything else :: For each of the characters in the [=temporary buffer=] (in the order they were added to the buffer), emit the character as a character token. [=Reconsume=] in the [=return state=]. ### Tree construction ### {#tree-construction} The input to the tree construction stage is a sequence of tokens from the [[#tokenization|tokenization]] stage. The tree construction stage is associated with a DOM {{Document}} object when a parser is created. The "output" of this stage consists of dynamically modifying or extending that document's DOM tree. This specification does not define when an interactive user agent has to render the {{Document}} so that it is available to the user, or when it has to begin accepting user input. --- As each token is emitted from the tokenizer, the user agent must follow the appropriate steps from the following list, known as the tree construction dispatcher:

: If the [=stack of open elements=] is empty : If the [=adjusted current node=] is an element in the [=HTML namespace=] : If the [=adjusted current node=] is a [=MathML text integration point=] and the token is a start tag whose tag name is neither "mglyph" nor "malignmark" : If the [=adjusted current node=] is a [=MathML text integration point=] and the token is a character token : If the [=adjusted current node=] is a MathML <{annotation-xml}> element and the token is a start tag whose tag name is "svg" : If the [=adjusted current node=] is an [=HTML integration point=] and the token is a start tag : If the [=adjusted current node=] is an [=HTML integration point=] and the token is a character token : If the token is an end-of-file token :: Process the token according to the rules given in the section corresponding to the current [=insertion mode=] in HTML content. : Otherwise :: Process the token according to the rules given in the section for parsing tokens [=in foreign content=]. The next token is the token that is about to be processed by the [=tree construction dispatcher=] (even if the token is subsequently just ignored). A node is a MathML text integration point if it is one of the following elements:

* A MathML <{mi}> element * A MathML <{mo}> element * A MathML <{mn}> element * A MathML <{ms}> element * A MathML <{mtext}> element A node is an HTML integration point if it is one of the following elements:

[[#text-html|text/html]]

title

If the node in question is the [=context=] element passed to the [=HTML fragment parsing algorithm=], then the start tag token for that element is the "fake" token created during by that [=HTML fragment parsing algorithm=].

---

Not all of the tag names mentioned below are conformant tag names in this specification; many are included to handle legacy content. They still form part of the algorithm that implementations are required to implement to claim conformance.

The algorithm described below places no limit on the depth of the DOM tree generated, or on the length of tag names, attribute names, attribute values, {{Text}} nodes, etc. While implementors are encouraged to avoid arbitrary limits, it is recognized that [=practical concerns=] will likely force user agents to impose nesting depth constraints.

#### Creating and inserting nodes #### {#creating-and-inserting-nodes} While the parser is processing a token, it can enable or disable foster parenting. This affects the following algorithm. The appropriate place for inserting a node, optionally using a particular |override target|, is the position in an element returned by running the following steps: 1. If there was an |override target| specified, then let |target| be the |override target|. Otherwise, let |target| be the [=current node=]. 2. Determine the |adjusted insertion location| using the first matching steps from the following list:

: If [=foster parenting=] is enabled and |target| is a <{table}>, <{tbody}>, <{tfoot}>, <{thead}>, or <{tr}> element ::

Foster parenting happens when content is misnested in tables.

Run these substeps: 1. Let |last template| be the last <{template}> element in the [=stack of open elements=], if any. 2. Let |last table| be the last <{table}> element in the [=stack of open elements=], if any. 3. If there is a |last template| and either there is no |last table|, or there is one, but |last template| is lower (more recently added) than |last table| in the [=stack of open elements=], then: let |adjusted insertion location| be inside |last template|'s [=template contents=], after its last child (if any), and abort these substeps. 4. If there is no |last table|, then let |adjusted insertion location| be inside the first element in the [=stack of open elements=] (the <{html}> element), after its last child (if any), and abort these substeps. ([=fragment case=]) 5. If |last table| has a parent node, then let |adjusted insertion location| be inside |last table|'s parent node, immediately before |last table|, and abort these substeps. 6. Let |previous element| be the element immediately above |last table| in the [=stack of open elements=]. 7. Let |adjusted insertion location| be inside |previous element|, after its last child (if any).

These steps are involved in part because it's possible for elements, the <{table}> element in this case in particular, to have been moved by a script around in the DOM, or indeed removed from the DOM entirely, after the element was inserted by the parser.

: Otherwise :: Let |adjusted insertion location| be inside |target|, after its last child (if any). 3. If the |adjusted insertion location| is inside a <{template}> element, let it instead be inside the <{template}> element's [=template contents=], after its last child (if any). 4. Return the |adjusted insertion location|. --- When the steps below require the UA to create an element for a token in a particular |given namespace| and with a particular |intended parent|, the UA must run the following steps: 1. Let |document| be |intended parent|'s [=node document=]. 2. Let |local name| be the tag name of the token. 3. Let |is| be the value of the "`is`" attribute in the given token, if such an attribute exists, or null otherwise. 4. Let |definition| be the result of [=looking up a custom element definition=] given |document|, |given namespace|, |local name|, and |is|. 5. If |definition| is non-null and the parser was not originally created for the [=HTML fragment parsing algorithm=], then let |will execute script| be true. Otherwise, let it be false. 6. If |will execute script| is true, then: 1. Increment |document|'s [=throw-on-dynamic-markup-insertion counter=]. 2. If the [=JavaScript execution context stack=] is empty, then [=perform a microtask checkpoint=]. 3. Push a new [=element queue=] onto the [=custom element reactions stack=]. 7. Let |element| be the result of [=creating an element=] given |document|, |local name|, |given namespace|, null, and |is|. If |will execute script| is true, set the synchronous custom elements flag; otherwise, leave it unset.

This will cause [=custom element constructors=] to run, if will execute script is true. However, since we incremented the [=throw-on-dynamic-markup-insertion counter=], this cannot cause {{Document/write()|new characters to be inserted into the tokenizer}}, or {{Document/open()|the document to be blown away}}.

If this step throws an exception, then report the exception, and let element be instead a new element that implements {{HTMLUnknownElement}}, with no attributes, namespace set to given namespace, namespace prefix set to null, custom element state set to "failed", custom element definition set to null, and node document set to document. 8. [=Append=] each attribute in the given token to |element|.

This can [=enqueue a custom element callback reaction=] for the `attributeChangedCallback`, which might run immediately (in the next step).

Even though the `is` attribute governs the [=create an element|creation=] of a [=customized built-in element=], it is not present during the execution of the relevant [=custom element constructor=]; it is appended in this step, along with all other attributes.

9. If |will execute script| is true, then: 1. Let |queue| be the result of popping the [=current element queue=] from the [=custom element reactions stack=]. (This will be the same [=element queue=] as was pushed above.) 2. [=Invoke custom element reactions=] in |queue|. 3. Decrement |document|'s [=throw-on-dynamic-markup-insertion counter=]. 10. If |element| has an <{xmlns/xmlns}> attribute *in the [=XMLNS namespace=]* whose value is not exactly the same as the element's namespace, that is a [=parse error=]. Similarly, if |element| has an <{xlink/xlink|xmlns:xlink}> attribute in the [=XMLNS namespace=] whose value is not the [=XLink namespace=], that is a [=parse error=]. 11. If |element| is a [=resettable element=], invoke its [=reset algorithm=]. (This initializes the element's [=forms/value=] and [=forms/checkedness=] based on the element's attributes.) 12. If |element| is a [=form-associated element=], and the `form` element pointer is not null, and there is no <{template}> element on the [=stack of open elements=], and |element| is either not [=listed elements|listed=] or doesn't have a <{formelements/form}> attribute, and the |intended parent| is in the same [=tree=] as the element pointed to by the `form` element pointer, [=associate=] |element| with the <{form}> element pointed to by the `form` element pointer, and suppress the running of the [=reset the form owner=] algorithm when the parser subsequently attempts to insert the element. 13. Return |element|. --- When the steps below require the user agent to insert a foreign element for a token in a given namespace, the user agent must run these steps: 1. Let the |adjusted insertion location| be the [=appropriate place for inserting a node=]. 2. Let |element| be the result of [=create an element for the token|creating an element for the token=] in the given namespace, with the intended parent being the element in which the |adjusted insertion location| finds itself. 3. If it is possible to insert |element| at the |adjusted insertion location|, then: 1. Push a new [=element queue=] onto the [=custom element reactions stack=]. 2. Insert |element| at the |adjusted insertion location|. 3. Pop the [=element queue=] from the [=custom element reactions stack=], and [=invoke custom element reactions=] in that queue.

If the |adjusted insertion location| cannot accept more elements, e.g., because it's a {{Document}} that already has an element child, then |element| is dropped on the floor.

4. Push |element| onto the [=stack of open elements=] so that it is the new [=current node=]. 5. Return |element|. When the steps below require the user agent to insert an HTML element for a token, the user agent must [=insert a foreign element=] for the token, in the [=HTML namespace=]. --- When the steps below require the user agent to adjust MathML attributes for a token, then, if the token has an attribute named `definitionurl`, change its name to `definitionURL` (note the case difference). When the steps below require the user agent to adjust SVG attributes for a token, then, for each attribute on the token whose attribute name is one of the ones in the first column of the following table, change the attribute's name to the name given in the corresponding cell in the second column. (This fixes the case of SVG attributes that are not all lowercase.)

Attribute name on token	Attribute name on element
`attributename`	`attributeName`
`attributetype`	`attributeType`
`basefrequency`	`baseFrequency`
`baseprofile`	`baseProfile`
`calcmode`	`calcMode`
`clippathunits`	`clipPathUnits`
`diffuseconstant`	`diffuseConstant`
`edgemode`	`edgeMode`
`filterunits`	`filterUnits`
`glyphref`	`glyphRef`
`gradienttransform`	`gradientTransform`
`gradientunits`	`gradientUnits`
`kernelmatrix`	`kernelMatrix`
`kernelunitlength`	`kernelUnitLength`
`keypoints`	`keyPoints`
`keysplines`	`keySplines`
`keytimes`	`keyTimes`
`lengthadjust`	`lengthAdjust`
`limitingconeangle`	`limitingConeAngle`
`markerheight`	`markerHeight`
`markerunits`	`markerUnits`
`markerwidth`	`markerWidth`
`maskcontentunits`	`maskContentUnits`
`maskunits`	`maskUnits`
`numoctaves`	`numOctaves`
`pathlength`	`pathLength`
`patterncontentunits`	`patternContentUnits`
`patterntransform`	`patternTransform`
`patternunits`	`patternUnits`
`pointsatx`	`pointsAtX`
`pointsaty`	`pointsAtY`
`pointsatz`	`pointsAtZ`
`preservealpha`	`preserveAlpha`
`preserveaspectratio`	`preserveAspectRatio`
`primitiveunits`	`primitiveUnits`
`refx`	`refX`
`refy`	`refY`
`repeatcount`	`repeatCount`
`repeatdur`	`repeatDur`
`requiredextensions`	`requiredExtensions`
`requiredfeatures`	`requiredFeatures`
`specularconstant`	`specularConstant`
`specularexponent`	`specularExponent`
`spreadmethod`	`spreadMethod`
`startoffset`	`startOffset`
`stddeviation`	`stdDeviation`
`stitchtiles`	`stitchTiles`
`surfacescale`	`surfaceScale`
`systemlanguage`	`systemLanguage`
`tablevalues`	`tableValues`
`targetx`	`targetX`
`targety`	`targetY`
`textlength`	`textLength`
`viewbox`	`viewBox`
`viewtarget`	`viewTarget`
`xchannelselector`	`xChannelSelector`
`ychannelselector`	`yChannelSelector`
`zoomandpan`	`zoomAndPan`

When the steps below require the user agent to adjust foreign attributes for a token, then, if any of the attributes on the token match the strings given in the first column of the following table, let the attribute be a namespaced attribute, with the prefix being the string given in the corresponding cell in the second column, the local name being the string given in the corresponding cell in the third column, and the namespace being the namespace given in the corresponding cell in the fourth column. (This fixes the use of namespaced attributes, in particular <{global/lang}> attributes in the [=XML namespace=].)

Attribute name	Prefix	Local name	Namespace
<{xlink/actuate\|xlink:actuate}>	`xlink`	`actuate`	[=XLink namespace=]
<{xlink/arcrole\|xlink:arcrole}>	`xlink`	`arcrole`	[=XLink namespace=]
<{xlink/href\|xlink:href}>	`xlink`	`href`	[=XLink namespace=]
<{xlink/role\|xlink:role}>	`xlink`	`role`	[=XLink namespace=]
<{xlink/show\|xlink:show}>	`xlink`	`show`	[=XLink namespace=]
<{xlink/title\|xlink:title}>	`xlink`	`title`	[=XLink namespace=]
<{xlink/type\|xlink:type}>	`xlink`	`type`	[=XLink namespace=]
<{xml/lang\|xml:lang}>	`xml`	`lang`	[=XML namespace=]
<{xml/space\|xml:space}>	`xml`	`space`	[=XML namespace=]
<{xmlns/xmlns}>	(none)	`xmlns`	[=XMLNS namespace=]
<{xlink/xlink\|xmlns:xlink}>	`xmlns`	`xlink`	[=XMLNS namespace=]

--- When the steps below require the user agent to insert a character while processing a token, the user agent must run the following steps: 1. Let |data| be the characters passed to the algorithm, or, if no characters were explicitly specified, the character of the character token being processed. 2. Let the |adjusted insertion location| be the [=appropriate place for inserting a node=]. 3. If the |adjusted insertion location| is in a {{Document}} node, then abort these steps.

The DOM will not let {{Document}} nodes have {{Text}} node children, so the children are ignored.

4. If there is a {{Text}} node immediately before the |adjusted insertion location|, then append |data| to that {{Text}} node's data. Otherwise, create a new {{Text}} node whose data is |data| and whose [=node document=] is the same as that of the element in which the |adjusted insertion location| finds itself, and insert the newly created node at the |adjusted insertion location|.

Here are some sample inputs to the parser and the corresponding number of {{Text}} nodes that they result in, assuming a user agent that executes scripts.

Input	Number of {{Text}} nodes
	One {{Text}} node in the document, containing "AB".
	Three {{Text}} nodes; "A" before the script, the script's contents, and "BC" after the script (the parser appends to the {{Text}} node created by the script).
	Two adjacent {{Text}} nodes in the document, containing "A" and "BC".
	One {{Text}} node before the table, containing "ABCD". (This is caused by [=foster parenting=].)
	One {{Text}} node before the table, containing "A B C" (A-space-B-space-C). (This is caused by [=foster parenting=].)
	One {{Text}} node before the table, containing "A BC" (A-space-B-C), and one {{Text}} node inside the table (as a child of a <{tbody}>) with a single space character. (Space characters separated from non-space characters by non-character tokens are not affected by [=foster parenting=], even if those other tokens then get ignored.)

--- When the steps below require the user agent to insert a comment while processing a comment token, optionally with an explicitly insertion position |position|, the user agent must run the following steps: 1. Let |data| be the data given in the comment token being processed. 2. If |position| was specified, then let the |adjusted insertion location| be |position|. Otherwise, let |adjusted insertion location| be the [=appropriate place for inserting a node=]. 3. Create a {{Comment}} node whose {{CharacterData/data}} attribute is set to |data| and whose [=node document=] is the same as that of the node in which the |adjusted insertion location| finds itself. 4. Insert the newly created node at the |adjusted insertion location|. --- DOM mutation events must not fire for changes caused by the UA parsing the document. This includes the parsing of any content inserted using {{Document/write()|document.write()}} and {{Document/writeln()|document.writeln()}} calls. [[!UIEVENTS]] However, [=mutation observers=] *do* fire, as required by the DOM specification. #### Parsing elements that contain only text #### {#parsing-elements-that-contain-only-text} The generic raw text element parsing algorithm and the generic RCDATA element parsing algorithm consist of the following steps. These algorithms are always invoked in response to a start tag token. 1. [=Insert an HTML element=] for the token. 2. If the algorithm that was invoked is the [=generic raw text element parsing algorithm=], switch the tokenizer to the [=RAWTEXT state=]; otherwise the algorithm invoked was the [=generic RCDATA element parsing algorithm=], switch the tokenizer to the [=RCDATA state=]. 3. Let the [=original insertion mode=] be the current [=insertion mode=]. 4. Then, switch the [=insertion mode=] to "[=in text|text=]". #### Closing elements that have implied end tags #### {#closing-elements-that-have-implied-end-tags} When the steps below require the UA to generate implied end tags, then, while the [=current node=] is a <{dd}> element, a <{dt}> element, an <{li}> element, an <{optgroup}> element, an <{option}> element, a <{p}> element, an <{rb}> element, an <{rp}> element, an <{rt}> element, or an <{rtc}> element, the UA must pop the [=current node=] off the [=stack of open elements=]. If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list. When the steps below require the UA to generate all implied end tags thoroughly, then, while the [=current node=] is a <{caption}> element, a <{colgroup}> element, a <{dd}> element, a <{dt}> element, an <{li}> element, an <{optgroup}> element, an <{option}> element, a <{p}> element, an <{rb}> element, an <{rp}> element, an <{rt}> element, an <{rtc}> element, a <{tbody}> element, a <{td}> element, a <{tfoot}> element, a <{th}> element, a <{thead}> element, or a <{tr}> element, the UA must pop the [=current node=] off the [=stack of open elements=]. #### The rules for parsing tokens in HTML content #### {#the-rules-for-parsing-tokens-in-html-content} ##### The "initial" insertion mode ##### {#the-initial-insertion-mode} When the user agent is to apply the rules for the "[=initial=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: Ignore the token. : A comment token :: [=Insert a comment=] as the last child of the {{Document}} object. : A DOCTYPE token :: If the DOCTYPE token's name is not a [=case-sensitive=] match for the string "`html`", or the token's public identifier is not missing, or the token's system identifier is neither missing nor a [=case-sensitive=] match for the string "about:legacy-compat", then there is a [=parse error=]. Append a {{DocumentType}} node to the {{Document}} node, with the {{DocumentType/name}} attribute set to the name given in the DOCTYPE token, or the empty string if the name was missing; the {{DocumentType/publicId}} attribute set to the public identifier given in the DOCTYPE token, or the empty string if the public identifier was missing; the {{DocumentType/systemId}} attribute set to the system identifier given in the DOCTYPE token, or the empty string if the system identifier was missing; and the other attributes specific to {{DocumentType}} objects set to null and empty lists as appropriate. Associate the {{DocumentType}} node with the {{Document}} object so that it is returned as the value of the {{Document/doctype}} attribute of the {{Document}} object. Then, if the document is *not* an `iframe` `srcdoc` document, and the DOCTYPE token matches one of the conditions in the following list, then set the {{Document}} to [=quirks mode=]:

* The [=force-quirks flag=] is set to *on*. * The name is set to anything other than "`html`" (compared [=case-sensitively=]). * The public identifier is set to: "`-//W3O//DTD W3 HTML Strict 3.0//EN//`" * The public identifier is set to: "`-/W3C/DTD HTML 4.0 Transitional/EN`" * The public identifier is set to: "`HTML`" * The system identifier is set to: "`http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd`" * The public identifier starts with: "`+//Silmaril//dtd html Pro v0r11 19970101//`" * The public identifier starts with: "`-//AS//DTD HTML 3.0 asWedit + extensions//`" * The public identifier starts with: "`-//AdvaSoft Ltd//DTD HTML 3.0 asWedit + extensions//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.0 Level 1//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.0 Level 2//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.0 Strict Level 1//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.0 Strict Level 2//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.0 Strict//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.0//`" * The public identifier starts with: "`-//IETF//DTD HTML 2.1E//`" * The public identifier starts with: "`-//IETF//DTD HTML 3.0//`" * The public identifier starts with: "`-//IETF//DTD HTML 3.2 Final//`" * The public identifier starts with: "`-//IETF//DTD HTML 3.2//`" * The public identifier starts with: "`-//IETF//DTD HTML 3//`" * The public identifier starts with: "`-//IETF//DTD HTML Level 0//`" * The public identifier starts with: "`-//IETF//DTD HTML Level 1//`" * The public identifier starts with: "`-//IETF//DTD HTML Level 2//`" * The public identifier starts with: "`-//IETF//DTD HTML Level 3//`" * The public identifier starts with: "`-//IETF//DTD HTML Strict Level 0//`" * The public identifier starts with: "`-//IETF//DTD HTML Strict Level 1//`" * The public identifier starts with: "`-//IETF//DTD HTML Strict Level 2//`" * The public identifier starts with: "`-//IETF//DTD HTML Strict Level 3//`" * The public identifier starts with: "`-//IETF//DTD HTML Strict//`" * The public identifier starts with: "`-//IETF//DTD HTML//`" * The public identifier starts with: "`-//Metrius//DTD Metrius Presentational//`" * The public identifier starts with: "`-//Microsoft//DTD Internet Explorer 2.0 HTML Strict//`" * The public identifier starts with: "`-//Microsoft//DTD Internet Explorer 2.0 HTML//`" * The public identifier starts with: "`-//Microsoft//DTD Internet Explorer 2.0 Tables//`" * The public identifier starts with: "`-//Microsoft//DTD Internet Explorer 3.0 HTML Strict//`" * The public identifier starts with: "`-//Microsoft//DTD Internet Explorer 3.0 HTML//`" * The public identifier starts with: "`-//Microsoft//DTD Internet Explorer 3.0 Tables//`" * The public identifier starts with: "`-//Netscape Comm. Corp.//DTD HTML//`" * The public identifier starts with: "`-//Netscape Comm. Corp.//DTD Strict HTML//`" * The public identifier starts with: "`-//O'Reilly and Associates//DTD HTML 2.0//`" * The public identifier starts with: "`-//O'Reilly and Associates//DTD HTML Extended 1.0//`" * The public identifier starts with: "`-//O'Reilly and Associates//DTD HTML Extended Relaxed 1.0//`" * The public identifier starts with: "`-//SQ//DTD HTML 2.0 HoTMetaL + extensions//`" * The public identifier starts with: "`-//SoftQuad Software//DTD HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//`" * The public identifier starts with: "`-//SoftQuad//DTD HoTMetaL PRO 4.0::19971010::extensions to HTML 4.0//`" * The public identifier starts with: "`-//Spyglass//DTD HTML 2.0 Extended//`" * The public identifier starts with: "`-//Sun Microsystems Corp.//DTD HotJava HTML//`" * The public identifier starts with: "`-//Sun Microsystems Corp.//DTD HotJava Strict HTML//`" * The public identifier starts with: "`-//W3C//DTD HTML 3 1995-03-24//`" * The public identifier starts with: "`-//W3C//DTD HTML 3.2 Draft//`" * The public identifier starts with: "`-//W3C//DTD HTML 3.2 Final//`" * The public identifier starts with: "`-//W3C//DTD HTML 3.2//`" * The public identifier starts with: "`-//W3C//DTD HTML 3.2S Draft//`" * The public identifier starts with: "`-//W3C//DTD HTML 4.0 Frameset//`" * The public identifier starts with: "`-//W3C//DTD HTML 4.0 Transitional//`" * The public identifier starts with: "`-//W3C//DTD HTML Experimental 19960712//`" * The public identifier starts with: "`-//W3C//DTD HTML Experimental 970421//`" * The public identifier starts with: "`-//W3C//DTD W3 HTML//`" * The public identifier starts with: "`-//W3O//DTD W3 HTML 3.0//`" * The public identifier starts with: "`-//WebTechs//DTD Mozilla HTML 2.0//`" * The public identifier starts with: "`-//WebTechs//DTD Mozilla HTML//`" * The system identifier is missing and the public identifier starts with: "`-//W3C//DTD HTML 4.01 Frameset//`" * The system identifier is missing and the public identifier starts with: "`-//W3C//DTD HTML 4.01 Transitional//`" Otherwise, if the document is *not* an `iframe` `srcdoc` document, and the DOCTYPE token matches one of the conditions in the following list, then set the {{Document}} to [=limited-quirks mode=]:

* The public identifier starts with: "`-//W3C//DTD XHTML 1.0 Frameset//`" * The public identifier starts with: "`-//W3C//DTD XHTML 1.0 Transitional//`" * The system identifier is not missing and the public identifier starts with: "`-//W3C//DTD HTML 4.01 Frameset//`" * The system identifier is not missing and the public identifier starts with: "`-//W3C//DTD HTML 4.01 Transitional//`" The system identifier and public identifier strings must be compared to the values given in the lists above in an [=ASCII case-insensitive=] manner. A system identifier whose value is the empty string is not considered missing for the purposes of the conditions above. Then, switch the [=insertion mode=] to "[=before html=]". : Anything else :: If the document is *not* an `iframe` `srcdoc` document, then this is a [=parse error=]; set the {{Document}} to [=quirks mode=]. In any case, switch the [=insertion mode=] to "[=before html=]", then reprocess the token. ##### The "before html" insertion mode ##### {#the-before-html-insertion-mode} When the user agent is to apply the rules for the "[=before html=]" [=insertion mode=], the user agent must handle the token as follows:

: A DOCTYPE token :: [=Parse error=]. : A comment token :: [=Insert a comment=] as the last child of the {{Document}} object. : A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: Ignore the token. : A start tag whose tag name is "html" :: [=Create an element for the token=] in the [=HTML namespace=], with the {{Document}} as the intended parent. Append it to the {{Document}} object. Put this element in the [=stack of open elements=]. Switch the [=insertion mode=] to "[=before head=]". : An end tag whose tag name is one of: "head", "body", "html", "br" :: Act as described in the "anything else" entry below. : Any other end tag :: [=Parse error=]. : Anything else :: Create an <{html}> element whose [=node document=] is the {{Document}} object. Append it to the {{Document}} object. Put this element in the [=stack of open elements=]. Switch the [=insertion mode=] to "[=before head=]", then reprocess the token. The [=document element=] can end up being removed from the {{Document}} object, e.g., by scripts; nothing in particular happens in such cases, content continues being appended to the nodes as described in the next section. ##### The "before head" insertion mode ##### {#the-before-head-insertion-mode} When the user agent is to apply the rules for the "[=before head=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: Ignore the token. : A comment token :: [=Insert a comment=]. : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : A start tag whose tag name is "head" :: [=Insert an HTML element=] for the token. Set the `head` element pointer to the newly created <{head}> element. Switch the [=insertion mode=] to "[=in head=]". : An end tag whose tag name is one of: "head", "body", "html", "br" :: Act as described in the "anything else" entry below. : Any other end tag :: [=Parse error=]. : Anything else :: [=Insert an HTML element=] for a "head" start tag token with no attributes. Set the `head` element pointer to the newly created <{head}> element. Switch the [=insertion mode=] to "[=in head=]". Reprocess the current token. ##### The "in head" insertion mode ##### {#the-in-head-insertion-mode} When the user agent is to apply the rules for the "[=in head=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: [=Insert the character=]. : A comment token :: [=Insert a comment=]. : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : A start tag whose tag name is one of: "base", "basefont", "bgsound", "link" :: [=Insert an HTML element=] for the token. Immediately pop the [=current node=] off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. : A start tag whose tag name is "meta" :: [=Insert an HTML element=] for the token. Immediately pop the [=current node=] off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. If the element has a <{meta/charset}> attribute, and [=getting an encoding=] from its value results in an [=encoding=], and the [=confidence=] is currently *tentative*, then [=change the encoding=] to the resulting encoding. Otherwise, if the element has an <{meta/http-equiv}> attribute whose value is an [=ASCII case-insensitive=] match for the string "`Content-Type`", and the element has a <{meta/content}> attribute, and applying the algorithm for extracting a character encoding from a `meta` element to that attribute's value returns an [=encoding=], and the [=confidence=] is currently *tentative*, then [=change the encoding=] to the extracted encoding. : A start tag whose tag name is "title" :: Follow the [=generic RCDATA element parsing algorithm=]. : A start tag whose tag name is "noscript", if the [=scripting flag=] is enabled : A start tag whose tag name is one of: "noframes", "style" :: Follow the [=generic raw text element parsing algorithm=]. : A start tag whose tag name is "noscript", if the [=scripting flag=] is disabled :: [=Insert an HTML element=] for the token. Switch the [=insertion mode=] to "[=in head noscript=]". : A start tag whose tag name is "script" :: Run these steps: 1. Let the |adjusted insertion location| be the [=appropriate place for inserting a node=]. 2. [=Create an element for the token=] in the [=HTML namespace=], with the intended parent being the element in which the |adjusted insertion location| finds itself. 3. Mark the element as being "[=parser-inserted=]" and unset the element's "[=non-blocking=]" flag.

This ensures that, if the script is external, any {{Document/write()|document.write()}} calls in the script will execute in-line, instead of blowing the document away, as would happen in most other cases. It also prevents the script from executing until the end tag is seen.

4. If the parser was originally created for the [=HTML fragment parsing algorithm=], then mark the <{script}> element as "[=already started=]". ([=fragment case=]) 5. Insert the newly created element at the |adjusted insertion location|. 6. Push the element onto the [=stack of open elements=] so that it is the new [=current node=]. 7. Switch the tokenizer to the [=script data state=]. 8. Let the [=original insertion mode=] be the current [=insertion mode=]. 9. Switch the [=insertion mode=] to "[=in text|text=]". : An end tag whose tag name is "head" :: Pop the [=current node=] (which will be the <{head}> element) off the [=stack of open elements=]. Switch the [=insertion mode=] to "[=after head=]". : An end tag whose tag name is one of: "body", "html", "br" :: Act as described in the "anything else" entry below. : A start tag whose tag name is "template" :: [=Insert an HTML element=] for the token. Insert a [=marker=] at the end of the [=list of active formatting elements=]. Set the [=frameset-ok flag=] to "not ok". Switch the [=insertion mode=] to "[=in template=]". Push "[=in template=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. : An end tag whose tag name is "template" :: If there is no <{template}> element on the [=stack of open elements=], then this is a [=parse error=]; ignore the token. Otherwise, run these steps: 1. [=Generate all implied end tags thoroughly=]. 2. If the [=current node=] is not a <{template}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until a <{template}> element has been popped from the stack. 4. [=Clear the list of active formatting elements up to the last marker=]. 5. Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. 6. [=Reset the insertion mode appropriately=]. : A start tag whose tag name is "head" : Any other end tag :: [=Parse error=]. : Anything else :: Pop the [=current node=] (which will be the <{head}> element) off the [=stack of open elements=]. Switch the [=insertion mode=] to "[=after head=]". Reprocess the token. ##### The "in head noscript" insertion mode ##### {#the-in-head-noscript-insertion-mode} When the user agent is to apply the rules for the "[=in head noscript=]" [=insertion mode=], the user agent must handle the token as follows:

: A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : An end tag whose tag name is "noscript" :: Pop the [=current node=] (which will be a <{noscript}> element) from the [=stack of open elements=]; the new [=current node=] will be a <{head}> element. Switch the [=insertion mode=] to "[=in head=]". : A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE : A comment token : A start tag whose tag name is one of: "basefont", "bgsound", "link", "meta", "noframes", "style" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : An end tag whose tag name is "br" :: Act as described in the "anything else" entry below. : A start tag whose tag name is one of: "head", "noscript" : Any other end tag :: [=Parse error=]. : Anything else :: [=Parse error=]. Pop the [=current node=] (which will be a <{noscript}> element) from the [=stack of open elements=]; the new [=current node=] will be a <{head}> element. Switch the [=insertion mode=] to "[=in head=]". Reprocess the token. ##### The "after head" insertion mode ##### {#the-after-head-insertion-mode} When the user agent is to apply the rules for the "[=after head=]" [=insertion mode=], the user agent must handle the token as follows:

The `head` element pointer cannot be null at this point.

: An end tag whose tag name is "template" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : An end tag whose tag name is one of: "body", "html", "br" :: Act as described in the "anything else" entry below. : A start tag whose tag name is "head" : Any other end tag :: [=Parse error=]. : Anything else :: [=Insert an HTML element=] for a "body" start tag token with no attributes. Switch the [=insertion mode=] to "[=in body=]". Reprocess the current token. ##### The "in body" insertion mode ##### {#the-in-body-insertion-mode} When the user agent is to apply the rules for the "[=in body=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is U+0000 NULL :: [=Parse error=]. : A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: [=Reconstruct the active formatting elements=], if any. [=Insert the token's character=]. : Any other character token :: [=Reconstruct the active formatting elements=], if any. [=Insert the token's character=]. Set the [=frameset-ok flag=] to "not ok". : A comment token :: [=Insert a comment=]. : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "html" :: [=Parse error=]. If there is a <{template}> element on the [=stack of open elements=], then ignore the token. Otherwise, for each attribute on the token, check to see if the attribute is already present on the top element of the [=stack of open elements=]. If it is not, add the attribute and its corresponding value to that element. : A start tag whose tag name is one of: "base", "basefont", "bgsound", "link", "meta", "noframes", "script", "style", "template", "title" : An end tag whose tag name is "template" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : A start tag whose tag name is "body" :: [=Parse error=]. If the second element on the [=stack of open elements=] is not a <{body}> element, if the [=stack of open elements=] has only one node on it, or if there is a <{template}> element on the [=stack of open elements=], then ignore the token. ([=fragment case=]) Otherwise, set the [=frameset-ok flag=] to "not ok"; then, for each attribute on the token, check to see if the attribute is already present on the <{body}> element (the second element) on the [=stack of open elements=], and if it is not, add the attribute and its corresponding value to that element. : A start tag whose tag name is "frameset" :: [=Parse error=]. If the [=stack of open elements=] has only one node on it, or if the second element on the [=stack of open elements=] is not a <{body}> element, then ignore the token. ([=fragment case=]) If the [=frameset-ok flag=] is set to "not ok", ignore the token. Otherwise, run the following steps: 1. Remove the second element on the [=stack of open elements=] from its parent node, if it has one. 2. Pop all the nodes from the bottom of the [=stack of open elements=], from the [=current node=] up to, but not including, the root <{html}> element. 3. [=Insert an HTML element=] for the token. 4. Switch the [=insertion mode=] to "[=in frameset=]". : An end-of-file token :: If the [=stack of template insertion modes=] is not empty, then process the token [=using the rules for=] the "[=in template=]" [=insertion mode=]. Otherwise, follow these steps: 1. If there is a node in the [=stack of open elements=] that is not either a <{dd}> element, a <{dt}> element, an <{li}> element, an <{optgroup}> element, an <{option}> element, a <{p}> element, an <{rb}> element, an <{rp}> element, an <{rt}> element, an <{rtc}> element, a <{tbody}> element, a <{td}> element, a <{tfoot}> element, a <{th}> element, a <{thead}> element, a <{tr}> element, the <{body}> element, or the <{html}> element, then this is a [=parse error=]. 2. [=Stop parsing=]. : An end tag whose tag name is "body" :: If the [=stack of open elements=] does not have a `body` element in scope, this is a [=parse error=]; ignore the token. Otherwise, if there is a node in the [=stack of open elements=] that is not either a <{dd}> element, a <{dt}> element, an <{li}> element, an <{optgroup}> element, an <{option}> element, a <{p}> element, an <{rb}> element, an <{rp}> element, an <{rt}> element, an <{rtc}> element, a <{tbody}> element, a <{td}> element, a <{tfoot}> element, a <{th}> element, a <{thead}> element, a <{tr}> element, the <{body}> element, or the <{html}> element, then this is a [=parse error=]. Switch the [=insertion mode=] to "[=after body=]". : An end tag whose tag name is "html" :: If the [=stack of open elements=] does not have a `body` element in scope, this is a [=parse error=]; ignore the token. Otherwise, if there is a node in the [=stack of open elements=] that is not either a <{dd}> element, a <{dt}> element, an <{li}> element, an <{optgroup}> element, an <{option}> element, a <{p}> element, an <{rb}> element, an <{rp}> element, an <{rt}> element, an <{rtc}> element, a <{tbody}> element, a <{td}> element, a <{tfoot}> element, a <{th}> element, a <{thead}> element, a <{tr}> element, the <{body}> element, or the <{html}> element, then this is a [=parse error=]. Switch the [=insertion mode=] to "[=after body=]". Reprocess the token. : A start tag whose tag name is one of: "address", "article", "aside", "blockquote", "center", "details", "dialog", "dir", "div", "dl", "fieldset", "figcaption", "figure", "footer", "header", "main", "nav", "ol", "p", "section", "summary", "ul" :: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Insert an HTML element=] for the token. : A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" :: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. If the [=current node=] is an [=HTML element=] whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", then this is a [=parse error=]; pop the [=current node=] off the [=stack of open elements=]. [=Insert an HTML element=] for the token. : A start tag whose tag name is one of: "pre", "listing" :: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Insert an HTML element=] for the token. If the [=next token=] is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of <{pre}> blocks are ignored as an authoring convenience.) Set the [=frameset-ok flag=] to "not ok". : A start tag whose tag name is "form" :: If the `form` element pointer is not null, and there is no <{template}> element on the [=stack of open elements=], then this is a [=parse error=]; ignore the token. Otherwise: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Insert an HTML element=] for the token, and, if there is no <{template}> element on the [=stack of open elements=], set the `form` element pointer to point to the element created. : A start tag whose tag name is "li" :: Run these steps: 1. Set the [=frameset-ok flag=] to "not ok". 2. Initialize |node| to be the [=current node=] (the bottommost node of the stack). 3. |Loop|: If |node| is an <{li}> element, then run these substeps: 1. [=Generate implied end tags=], except for <{li}> elements. 2. If the [=current node=] is not an <{li}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until an <{li}> element has been popped from the stack. 4. Jump to the step labeled |Done| below. 4. If |node| is in the [=special=] category, but is not an <{address}>, <{div}>, or <{p}> element, then jump to the step labeled |Done| below. 5. Otherwise, set |node| to the previous entry in the [=stack of open elements=] and return to the step labeled |Loop|. 6. |Done|: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. 7. Finally, [=insert an HTML element=] for the token. : A start tag whose tag name is one of: "dd", "dt" :: Run these steps: 1. Set the [=frameset-ok flag=] to "not ok". 2. Initialize |node| to be the [=current node=] (the bottommost node of the stack). 3. |Loop|: If |node| is a <{dd}> element, then run these substeps: 1. [=Generate implied end tags=], except for <{dd}> elements. 2. If the [=current node=] is not a <{dd}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until a <{dd}> element has been popped from the stack. 4. Jump to the step labeled |Done| below. 4. If |node| is a <{dt}> element, then run these substeps: 1. [=Generate implied end tags=], except for <{dt}> elements. 2. If the [=current node=] is not a <{dt}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until a <{dt}> element has been popped from the stack. 4. Jump to the step labeled |Done| below. 5. If |node| is in the [=special=] category, but is not an <{address}>, <{div}>, or <{p}> element, then jump to the step labeled |Done| below. 6. Otherwise, set |node| to the previous entry in the [=stack of open elements=] and return to the step labeled |Loop|. 7. |Done|: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. 8. Finally, [=insert an HTML element=] for the token. : A start tag whose tag name is "plaintext" :: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Insert an HTML element=] for the token. Switch the tokenizer to the [[#plaintext-state]].

Once a start tag with the tag name "plaintext" has been seen, that will be the last token ever seen other than character tokens (and the end-of-file token), because there is no way to switch out of the [[#plaintext-state]].

: A start tag whose tag name is "button" :: 1. If the [=stack of open elements=] has a `button` element in scope, then run these substeps: 1. [=Parse error=]. 2. [=Generate implied end tags=]. 3. Pop elements from the [=stack of open elements=] until a <{button}> element has been popped from the stack. 2. [=Reconstruct the active formatting elements=], if any. 3. [=Insert an HTML element=] for the token. 4. Set the [=frameset-ok flag=] to "not ok". : An end tag whose tag name is one of: "address", "article", "aside", "blockquote", "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset", "figcaption", "figure", "footer", "header", "listing", "main", "nav", "ol", "pre", "section", "summary", "ul" :: If the [=stack of open elements=] does not [=in scope|have an element in scope=] that is an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]; ignore the token. Otherwise, run these steps: 1. [=Generate implied end tags=]. 2. If the [=current node=] is not an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until an [=HTML element=] with the same tag name as the token has been popped from the stack. : An end tag whose tag name is "form" :: If there is no <{template}> element on the [=stack of open elements=], then run these substeps: 1. Let |node| be the element that the `form` element pointer is set to, or null if it is not set to an element. 2. Set the `form` element pointer to null. 3. If |node| is null or if the [=stack of open elements=] does not have |node| in scope, then this is a [=parse error=]; abort these steps and ignore the token. 4. [=Generate implied end tags=]. 5. If the [=current node=] is not |node|, then this is a [=parse error=]. 6. Remove |node| from the [=stack of open elements=]. If there *is* a <{template}> element on the [=stack of open elements=], then run these substeps instead: 1. If the [=stack of open elements=] does not have a `form` element in scope, then this is a [=parse error=]; abort these steps and ignore the token. 2. [=Generate implied end tags=]. 3. If the [=current node=] is not a <{form}> element, then this is a [=parse error=]. 4. Pop elements from the [=stack of open elements=] until a <{form}> element has been popped from the stack. : An end tag whose tag name is "p" :: If the [=stack of open elements=] does not have a `p` element in button scope, then this is a [=parse error=]; [=insert an HTML element=] for a "p" start tag token with no attributes. Close a `p` element. : An end tag whose tag name is "li" :: If the [=stack of open elements=] does not have an `li` element in list item scope, then this is a [=parse error=]; ignore the token. Otherwise, run these steps: 1. [=Generate implied end tags=], except for <{li}> elements. 2. If the [=current node=] is not an <{li}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until an <{li}> element has been popped from the stack. : An end tag whose tag name is one of: "dd", "dt" :: If the [=stack of open elements=] does not [=in scope|have an element in scope=] that is an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]; ignore the token. Otherwise, run these steps: 1. [=Generate implied end tags=], except for [=HTML elements=] with the same tag name as the token. 2. If the [=current node=] is not an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until an [=HTML element=] with the same tag name as the token has been popped from the stack. : An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" :: If the [=stack of open elements=] does not [=in scope|have an element in scope=] that is an [=HTML element=] and whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", then this is a [=parse error=]; ignore the token. Otherwise, run these steps: 1. [=Generate implied end tags=]. 2. If the [=current node=] is not an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until an [=HTML element=] whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6" has been popped from the stack. : An end tag whose tag name is "sarcasm" :: Take a deep breath, then act as described in the "any other end tag" entry below. : A start tag whose tag name is "a" :: If the [=list of active formatting elements=] contains an <{a}> element between the end of the list and the last [=marker=] on the list (or the start of the list if there is no [=marker=] on the list), then this is a [=parse error=]; run the [=adoption agency algorithm=] for the token, then remove that element from the [=list of active formatting elements=] and the [=stack of open elements=] if the [=adoption agency algorithm=] didn't already remove it (it might not have if the element is not [=in table scope=]).

In the non-conforming stream:

<a href="a">a<table><a href="b">b</table>x The first <{a}> element would be closed upon seeing the second one, and the "x" character would be inside a link to "b", not to "a". This is despite the fact that the outer <{a}> element is not in table scope (meaning that a regular </a> end tag at the start of the table wouldn't close the outer <{a}> element). The result is that the two <{a}> elements are indirectly nested inside each other — non-conforming markup will often result in non-conforming DOMs when parsed.

[=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. [=Push onto the list of active formatting elements=] that element. : A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i", "s", "small", "strike", "strong", "tt", "u" :: [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. [=Push onto the list of active formatting elements=] that element. : A start tag whose tag name is "nobr" :: [=Reconstruct the active formatting elements=], if any. If the [=stack of open elements=] has a `nobr` element in scope, then this is a [=parse error=]; run the [=adoption agency algorithm=] for the token, then once again [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. [=Push onto the list of active formatting elements=] that element. : An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", "nobr", "s", "small", "strike", "strong", "tt", "u" :: Run the [=adoption agency algorithm=] for the token. : A start tag whose tag name is one of: "applet", "marquee", "object" :: [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. Insert a [=marker=] at the end of the [=list of active formatting elements=]. Set the [=frameset-ok flag=] to "not ok". : An end tag token whose tag name is one of: "applet", "marquee", "object" :: If the [=stack of open elements=] does not [=in scope|have an element in scope=] that is an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]; ignore the token. Otherwise, run these steps: 1. [=Generate implied end tags=]. 2. If the [=current node=] is not an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until an [=HTML element=] with the same tag name as the token has been popped from the stack. 4. [=Clear the list of active formatting elements up to the last marker=]. : A start tag whose tag name is "table" :: If the {{Document}} is *not* set to [=quirks mode=], and the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Insert an HTML element=] for the token. Set the [=frameset-ok flag=] to "not ok". Switch the [=insertion mode=] to "[=in table=]". : An end tag whose tag name is "br" :: [=Parse error=]. Drop the attributes from the token, and act as described in the next entry; i.e., act as if this was a "br" start tag token with no attributes, rather than the end tag token that it actually is. : A start tag whose tag name is one of: "area", "br", "embed", "img", "wbr" :: [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. Immediately pop the [=current node=] off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. Set the [=frameset-ok flag=] to "not ok". : A start tag whose tag name is "input" :: [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. Immediately pop the [=current node=] off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. If the token does not have an attribute with the name "type", or if it does, but that attribute's value is not an [=ASCII case-insensitive=] match for the string "`hidden`", then: set the [=frameset-ok flag=] to "not ok". : A start tag whose tag name is one of: "param", "source", "track" :: [=Insert an HTML element=] for the token. Immediately pop the [=current node=] off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. : A start tag whose tag name is "hr" :: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Insert an HTML element=] for the token. Immediately pop the [=current node=] off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. Set the [=frameset-ok flag=] to "not ok". : A start tag whose tag name is "image" :: [=Parse error=]. Change the token's tag name to "img" and reprocess it. (Don't ask.) : A start tag whose tag name is "textarea" :: Run these steps: 1. [=Insert an HTML element=] for the token. 2. If the [=next token=] is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of <{textarea}> elements are ignored as an authoring convenience.) 3. Switch the tokenizer to the [=RCDATA state=]. 4. Let the [=original insertion mode=] be the current [=insertion mode=]. 5. Set the [=frameset-ok flag=] to "not ok". 6. Switch the [=insertion mode=] to "[=in text|text=]". : A start tag whose tag name is "xmp" :: If the [=stack of open elements=] has a `p` element in button scope, then close a `p` element. [=Reconstruct the active formatting elements=], if any. Set the [=frameset-ok flag=] to "not ok". Follow the [=generic raw text element parsing algorithm=]. : A start tag whose tag name is "iframe" :: Set the [=frameset-ok flag=] to "not ok". Follow the [=generic raw text element parsing algorithm=]. : A start tag whose tag name is "noembed" : A start tag whose tag name is "noscript", if the [=scripting flag=] is enabled :: Follow the [=generic raw text element parsing algorithm=]. : A start tag whose tag name is "select" :: [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. Set the [=frameset-ok flag=] to "not ok". If the [=insertion mode=] is one of "[=in table=]", "[=in caption=]", "[=in table body=]", "[=in row=]", or "[=in cell=]", then switch the [=insertion mode=] to "[=in select in table=]". Otherwise, switch the [=insertion mode=] to "[=in select=]". : A start tag whose tag name is one of: "optgroup", "option" :: If the [=current node=] is an <{option}> element, then pop the [=current node=] off the [=stack of open elements=]. [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token. : A start tag whose tag name is: "rb" :: If the [=stack of open elements=] has a `ruby` element in scope, then [=generate implied end tags=]. If the [=current node=] is not now a <{ruby}> element nor a child of a <{ruby}> element, this is a [=parse error=]. [=Insert an HTML element=] for the token. : A start tag whose tag name is one of: "rp", "rt" :: If the [=stack of open elements=] has a `ruby` element in scope, then [=generate implied end tags=], except for <{rtc}> elements. If the [=current node=] is not now a <{rtc}> element or a <{ruby}> element, this is a [=parse error=]. [=Insert an HTML element=] for the token. : A start tag whose tag name is: "rtc" :: If the [=stack of open elements=] has a `ruby` element in scope, then [=generate implied end tags=]. If the [=current node=] is not now a <{ruby}> element, this is a [=parse error=]. [=Insert an HTML element=] for the token. : A start tag whose tag name is "math" :: [=Reconstruct the active formatting elements=], if any. [=Adjust MathML attributes=] for the token. (This fixes the case of MathML attributes that are not all lowercase.) [=Adjust foreign attributes=] for the token. (This fixes the use of namespaced attributes, in particular XLink.) [=Insert a foreign element=] for the token, in the [=MathML namespace=]. If the token has its [=self-closing flag=] set, pop the [=current node=] off the [=stack of open elements=] and [=acknowledged|acknowledge the token's *self-closing flag*=]. : A start tag whose tag name is "svg" :: [=Reconstruct the active formatting elements=], if any. [=Adjust SVG attributes=] for the token. (This fixes the case of SVG attributes that are not all lowercase.) [=Adjust foreign attributes=] for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.) [=Insert a foreign element=] for the token, in the [=SVG namespace=]. If the token has its [=self-closing flag=] set, pop the [=current node=] off the [=stack of open elements=] and [=acknowledged|acknowledge the token's *self-closing flag*=]. : A start tag whose tag name is one of: "caption", "col", "colgroup", "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr" :: [=Parse error=]. : Any other start tag :: [=Reconstruct the active formatting elements=], if any. [=Insert an HTML element=] for the token.

This element will be an [=ordinary=]element.

: Any other end tag :: Run these steps: 1. Initialize |node| to be the [=current node=] (the bottommost node of the stack). 2. |Loop|: If |node| is an [=HTML element=] with the same tag name as the token, then: 1. [=Generate implied end tags=], except for [=HTML elements=] with the same tag name as the token. 2. If |node| is not the [=current node=], then this is a [=parse error=]. 3. Pop all the nodes from the [=current node=] up to |node|, including |node|, then stop these steps. 3. Otherwise, if |node| is in the [=special=] category, then this is a [=parse error=]; ignore the token, and abort these steps. 4. Set |node| to the previous entry in the [=stack of open elements=]. 5. Return to the step labeled |Loop|. When the steps above say the UA is to close a <{p}> element, it means that the UA must run the following steps: 1. [=Generate implied end tags=], except for <{p}> elements. 2. If the [=current node=] is not a <{p}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] until a <{p}> element has been popped from the stack. The adoption agency algorithm, which takes as its only argument a token |token| for which the algorithm is being run, consists of the following steps: 1. Let |subject| be |token|'s tag name. 2. If the [=current node=] is an [=HTML element=] whose tag name is |subject|, and the [=current node=] is not in the [=list of active formatting elements=], then pop the [=current node=] off the [=stack of open elements=], and abort these steps. 3. Let |outer loop counter| be zero. 4. |Outer loop|: If |outer loop counter| is greater than or equal to eight, then abort these steps. 5. Increment |outer loop counter| by one. 6. Let |formatting element| be the last element in the [=list of active formatting elements=] that: * is between the end of the list and the last [=marker=] in the list, if any, or the start of the list otherwise, and * has the tag name |subject|. If there is no such element, then abort these steps and instead act as described in the "any other end tag" entry above. 7. If |formatting element| is not in the [=stack of open elements=], then this is a [=parse error=]; remove the element from the list, and abort these steps. 8. If |formatting element| is in the [=stack of open elements=], but the element is not [=in scope=], then this is a [=parse error=]; abort these steps. 9. If |formatting element| is not the [=current node=], this is a [=parse error=]. (But do not abort these steps.) 10. Let |furthest block| be the topmost node in the [=stack of open elements=] that is lower in the stack than |formatting element|, and is an element in the [=special=] category. There might not be one. 11. If there is no |furthest block|, then the UA must first pop all the nodes from the bottom of the [=stack of open elements=], from the [=current node=] up to and including |formatting element|, then remove |formatting element| from the [=list of active formatting elements=], and finally abort these steps. 12. Let |common ancestor| be the element immediately above |formatting element| in the [=stack of open elements=]. 13. Let a bookmark note the position of |formatting element| in the [=list of active formatting elements=] relative to the elements on either side of it in the list. 14. Let |node| and |last node| be |furthest block|. Follow these steps: 1. Let |inner loop counter| be zero. 2. |Inner loop|: Increment |inner loop counter| by one. 3. Let |node| be the element immediately above |node| in the [=stack of open elements=], or if |node| is no longer in the [=stack of open elements=] (e.g., because it got removed by this algorithm), the element that was immediately above |node| in the [=stack of open elements=] before |node| was removed. 4. If |node| is |formatting element|, then go to the next step in the overall algorithm. 5. If |inner loop counter| is greater than three and |node| is in the [=list of active formatting elements=], then remove |node| from the [=list of active formatting elements=]. 6. If |node| is not in the [=list of active formatting elements=], then remove |node| from the [=stack of open elements=] and then go back to the step labeled |Inner loop|. 7. [=Create an element for the token=] for which the element |node| was created, in the [=HTML namespace=], with |common ancestor| as the intended parent; replace the entry for |node| in the [=list of active formatting elements=] with an entry for the new element, replace the entry for |node| in the [=stack of open elements=] with an entry for the new element, and let |node| be the new element. 8. If |last node| is |furthest block|, then move the aforementioned bookmark to be immediately after the new |node| in the [=list of active formatting elements=]. 9. Insert |last node| into |node|, first removing it from its previous parent node if any. 10. Let |last node| be |node|. 11. Return to the step labeled |Inner loop|. 15. Insert whatever |last node| ended up being in the previous step at the [=appropriate place for inserting a node=], but using |common ancestor| as the |override target|. 16. [=Create an element for the token=] for which |formatting element| was created, in the [=HTML namespace=], with |furthest block| as the intended parent. 17. Take all of the child nodes of |furthest block| and append them to the element created in the last step. 18. Append that new element to |furthest block|. 19. Remove |formatting element| from the [=list of active formatting elements=], and insert the new element into the [=list of active formatting elements=] at the position of the aforementioned bookmark. 20. Remove |formatting element| from the [=stack of open elements=], and insert the new element into the [=stack of open elements=] immediately below the position of |furthest block| in that stack. 21. Jump back to the step labeled |Outer loop|.

This algorithm's name, the "adoption agency algorithm", comes from the way it causes elements to change parents.

##### The "text" insertion mode ##### {#sec-the-text-insertion-mode} When the user agent is to apply the rules for the "[=in text|text=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token :: [=Insert the token's character=].

This can never be a U+0000 NULL character; the tokenizer converts those to U+FFFD REPLACEMENT CHARACTER characters.

: An end-of-file token :: [=Parse error=]. If the [=current node=] is a <{script}> element, mark the <{script}> element as "[=already started=]". Pop the [=current node=] off the [=stack of open elements=]. Switch the [=insertion mode=] to the [=original insertion mode=] and reprocess the token. : An end tag whose tag name is "script" :: If the [=JavaScript execution context stack=] is empty, [=perform a microtask checkpoint=]. Let |script| be the [=current node=] (which will be a <{script}> element). Pop the [=current node=] off the [=stack of open elements=]. Switch the [=insertion mode=] to the [=original insertion mode=]. Let the |old insertion point| have the same value as the current [=insertion point=]. Let the [=insertion point=] be just before the [=next input character=]. Increment the parser's [=script nesting level=] by one. Prepare the |script|. This might cause some script to execute, which might cause {{Document/write()|new characters to be inserted into the tokenizer}}, and might cause the tokenizer to output more tokens, resulting in a [=reentrant|reentrant invocation of the parser=]. Decrement the parser's [=script nesting level=] by one. If the parser's [=script nesting level=] is zero, then set the [=parser pause flag=] to false. Let the [=insertion point=] have the value of the |old insertion point|. (In other words, restore the [=insertion point=] to its previous value. This value might be the "undefined" value.) At this stage, if there is a [=pending parsing-blocking script=], then:

: If the [=script nesting level=] is not zero: :: Set the [=parser pause flag=] to true, and abort the processing of any nested invocations of the tokenizer, yielding control back to the caller. (Tokenization will resume when the caller returns to the "outer" tree construction stage.)

The tree construction stage of this particular parser is [=reentrant|being called reentrantly=], say from a call to {{Document/write()|document.write()}}.

: Otherwise: :: Run these steps: 1. Let |the script| be the [=pending parsing-blocking script=]. There is no longer a [=pending parsing-blocking script=]. 2. Block the [[#tokenization|tokenizer]] for this instance of the [=HTML parser=], such that the [=event loop=] will not run [=tasks=] that invoke the [[#tokenization|tokenizer]]. 3. If the parser's {{Document}} [=has a style sheet that is blocking scripts=] or |the script|'s "[=ready to be parser-executed=]" flag is not set: [=spin the event loop=] until the parser's {{Document}} [=has no style sheet that is blocking scripts=] and |the script|'s "[=ready to be parser-executed=]" flag is set. 4. If this [=abort the parser|parser has been aborted=] in the meantime, abort these steps.

This could happen if, e.g., while the [=spin the event loop=] algorithm is running, the [=browsing context=] gets closed, or the {{Document/open()|document.open()}} method gets invoked on the {{Document}}.

5. Unblock the [[#tokenization|tokenizer]] for this instance of the [=HTML parser=], such that [=tasks=] that invoke the [[#tokenization|tokenizer]] can again be run. 6. Let the [=insertion point=] be just before the [=next input character=]. 7. Increment the parser's [=script nesting level=] by one (it should be zero before this step, so this sets it to one). 8. [=Execute=] |the script|. 9. Decrement the parser's [=script nesting level=] by one. If the parser's [=script nesting level=] is zero (which it always should be at this point), then set the [=parser pause flag=] to false. 10. Let the [=insertion point=] be undefined again. 11. If there is once again a [=pending parsing-blocking script=], then repeat these steps from step 1. : Any other end tag :: Pop the [=current node=] off the [=stack of open elements=]. Switch the [=insertion mode=] to the [=original insertion mode=]. ##### The "in table" insertion mode ##### {#the-in-table-insertion-mode} When the user agent is to apply the rules for the "[=in table=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token, if the [=current node=] is <{table}>, <{tbody}>, <{tfoot}>, <{thead}>, or <{tr}> element :: Let the |pending table character tokens| be an empty list of tokens. Let the [=original insertion mode=] be the current [=insertion mode=]. Switch the [=insertion mode=] to "[=in table text=]" and reprocess the token. : A comment token :: [=Insert a comment=]. : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "caption" :: [=Clear the stack back to a table context=]. (See below.) Insert a [=marker=] at the end of the [=list of active formatting elements=]. [=Insert an HTML element=] for the token, then switch the [=insertion mode=] to "[=in caption=]". : A start tag whose tag name is "colgroup" :: [=Clear the stack back to a table context=]. (See below.) [=Insert an HTML element=] for the token, then switch the [=insertion mode=] to "[=in column group=]". : A start tag whose tag name is "col" :: [=Clear the stack back to a table context=]. (See below.) [=Insert an HTML element=] for a "colgroup" start tag token with no attributes, then switch the [=insertion mode=] to "[=in column group=]". Reprocess the current token. : A start tag whose tag name is one of: "tbody", "tfoot", "thead" :: [=Clear the stack back to a table context=]. (See below.) [=Insert an HTML element=] for the token, then switch the [=insertion mode=] to "[=in table body=]". : A start tag whose tag name is one of: "td", "th", "tr" :: [=Clear the stack back to a table context=]. (See below.) [=Insert an HTML element=] for a "tbody" start tag token with no attributes, then switch the [=insertion mode=] to "[=in table body=]". Reprocess the current token. : A start tag whose tag name is "table" :: [=Parse error=]. If the [=stack of open elements=] does not have a `table` element in table scope, ignore the token. Otherwise: Pop elements from this stack until a <{table}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. Reprocess the token. : An end tag whose tag name is "table" :: If the [=stack of open elements=] does not have a `table` element in table scope, this is a [=parse error=]; ignore the token. Otherwise: Pop elements from this stack until a <{table}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. : An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" :: [=Parse error=]. : A start tag whose tag name is one of: "style", "script", "template" : An end tag whose tag name is "template" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : A start tag whose tag name is "input" :: If the token does not have an attribute with the name "type", or if it does, but that attribute's value is not an [=ASCII case-insensitive=] match for the string "`hidden`", then: act as described in the "anything else" entry below. Otherwise: [=Parse error=]. [=Insert an HTML element=] for the token. Pop that <{input}> element off the [=stack of open elements=]. [=acknowledged|Acknowledge the token's *self-closing flag*=], if it is set. : A start tag whose tag name is "form" :: [=Parse error=]. If there is a <{template}> element on the [=stack of open elements=], or if the `form` element pointer is not null, ignore the token. Otherwise: [=Insert an HTML element=] for the token, and set the `form` element pointer to point to the element created. Pop that <{form}> element off the [=stack of open elements=]. : An end-of-file token :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : Anything else :: [=Parse error=]. Enable [=foster parenting=], process the token [=using the rules for=] the "[=in body=]" [=insertion mode=], and then disable [=foster parenting=]. When the steps above require the UA to clear the stack back to a table context, it means that the UA must, while the [=current node=] is not a <{table}>, <{template}>, or <{html}> element, pop elements from the [=stack of open elements=].

This is the same list of elements as used in the [=in table scope|has an element in table scope=] steps.

The [=current node=] being an <{html}> element after this process is a [=fragment case=].

##### The "in table text" insertion mode ##### {#the-in-table-text-insertion-mode} When the user agent is to apply the rules for the "[=in table text=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is U+0000 NULL :: [=Parse error=]. : Any other character token :: Append the character token to the |pending table character tokens| list. : Anything else :: If any of the tokens in the |pending table character tokens| list are character tokens that are not [=space characters=], then this is a [=parse error=]: reprocess the character tokens in the |pending table character tokens| list using the rules given in the "anything else" entry in the "[=in table=]" insertion mode. Otherwise, [=insert the characters=] given by the |pending table character tokens| list. Switch the [=insertion mode=] to the [=original insertion mode=] and reprocess the token. ##### The "in caption" insertion mode ##### {#the-in-caption-insertion-mode} When the user agent is to apply the rules for the "[=in caption=]" [=insertion mode=], the user agent must handle the token as follows:

: An end tag whose tag name is "caption" :: If the [=stack of open elements=] does not have a `caption` element in table scope, this is a [=parse error=]; ignore the token. ([=fragment case=]) Otherwise: [=Generate implied end tags=]. Now, if the [=current node=] is not a <{caption}> element, then this is a [=parse error=]. Pop elements from this stack until a <{caption}> element has been popped from the stack. [=Clear the list of active formatting elements up to the last marker=]. Switch the [=insertion mode=] to "[=in table=]". : A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr" : An end tag whose tag name is "table" :: If the [=stack of open elements=] does not have a `caption` element in table scope, this is a [=parse error=]; ignore the token. ([=fragment case=]) Otherwise: [=Generate implied end tags=]. Now, if the [=current node=] is not a <{caption}> element, then this is a [=parse error=]. Pop elements from this stack until a <{caption}> element has been popped from the stack. [=Clear the list of active formatting elements up to the last marker=]. Switch the [=insertion mode=] to "[=in table=]". Reprocess the token. : An end tag whose tag name is one of: "body", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" :: [=Parse error=]. : Anything else :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. ##### The "in column group" insertion mode ##### {#the-in-column-group-insertion-mode} When the user agent is to apply the rules for the "[=in column group=]" [=insertion mode=], the user agent must handle the token as follows:

: A start tag whose tag name is "tr" :: [=Clear the stack back to a table body context=]. (See below.) [=Insert an HTML element=] for the token, then switch the [=insertion mode=] to "[=in row=]". : A start tag whose tag name is one of: "th", "td" :: [=Parse error=]. [=Clear the stack back to a table body context=]. (See below.) [=Insert an HTML element=] for a "tr" start tag token with no attributes, then switch the [=insertion mode=] to "[=in row=]". Reprocess the current token. : An end tag whose tag name is one of: "tbody", "tfoot", "thead" :: If the [=stack of open elements=] does not [=in table scope|have an element in table scope=] that is an [=HTML element=] with the same tag name as the token, this is a [=parse error=]; ignore the token. Otherwise: [=Clear the stack back to a table body context=]. (See below.) Pop the [=current node=] from the [=stack of open elements=]. Switch the [=insertion mode=] to "[=in table=]". : A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead" : An end tag whose tag name is "table" :: If the [=stack of open elements=] does not have a `tbody`, `thead`, or `tfoot` element in table scope, this is a [=parse error=]; ignore the token. Otherwise: [=Clear the stack back to a table body context=]. (See below.) Pop the [=current node=] from the [=stack of open elements=]. Switch the [=insertion mode=] to "[=in table=]". Reprocess the token. : An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th", "tr" :: [=Parse error=]. : Anything else :: Process the token [=using the rules for=] the "[=in table=]" [=insertion mode=]. When the steps above require the UA to clear the stack back to a table body context, it means that the UA must, while the [=current node=] is not a <{tbody}>, <{tfoot}>, <{thead}>, <{template}>, or <{html}> element, pop elements from the [=stack of open elements=].

The [=current node=] being an <{html}> element after this process is a [=fragment case=].

##### The "in row" insertion mode ##### {#the-in-row-insertion-mode} When the user agent is to apply the rules for the "[=in row=]" [=insertion mode=], the user agent must handle the token as follows:

: A start tag whose tag name is one of: "th", "td" :: [=Clear the stack back to a table row context=]. (See below.) [=Insert an HTML element=] for the token, then switch the [=insertion mode=] to "[=in cell=]". Insert a [=marker=] at the end of the [=list of active formatting elements=]. : An end tag whose tag name is "tr" :: If the [=stack of open elements=] does not have a `tr` element in table scope, this is a [=parse error=]; ignore the token. Otherwise: [=Clear the stack back to a table row context=]. (See below.) Pop the [=current node=] (which will be a <{tr}> element) from the [=stack of open elements=]. Switch the [=insertion mode=] to "[=in table body=]". : A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr" : An end tag whose tag name is "table" :: If the [=stack of open elements=] does not have a `tr` element in table scope, this is a [=parse error=]; ignore the token. Otherwise: [=Clear the stack back to a table row context=]. (See below.) Pop the [=current node=] (which will be a <{tr}> element) from the [=stack of open elements=]. Switch the [=insertion mode=] to "[=in table body=]". Reprocess the token. : An end tag whose tag name is one of: "tbody", "tfoot", "thead" :: If the [=stack of open elements=] does not [=in table scope|have an element in table scope=] that is an [=HTML element=] with the same tag name as the token, this is a [=parse error=]; ignore the token. If the [=stack of open elements=] does not have a `tr` element in table scope, ignore the token. Otherwise: [=Clear the stack back to a table row context=]. (See below.) Pop the [=current node=] (which will be a <{tr}> element) from the [=stack of open elements=]. Switch the [=insertion mode=] to "[=in table body=]". Reprocess the token. : An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th" :: [=Parse error=]. : Anything else :: Process the token [=using the rules for=] the "[=in table=]" [=insertion mode=]. When the steps above require the UA to clear the stack back to a table row context, it means that the UA must, while the [=current node=] is not a <{tr}>, <{template}>, or <{html}> element, pop elements from the [=stack of open elements=].

The [=current node=] being an <{html}> element after this process is a [=fragment case=].

##### The "in cell" insertion mode ##### {#the-in-cell-insertion-mode} When the user agent is to apply the rules for the "[=in cell=]" [=insertion mode=], the user agent must handle the token as follows:

: An end tag whose tag name is one of: "td", "th" :: If the [=stack of open elements=] does not [=in table scope|have an element in table scope=] that is an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]; ignore the token. Otherwise: [=Generate implied end tags=]. Now, if the [=current node=] is not an [=HTML element=] with the same tag name as the token, then this is a [=parse error=]. Pop elements from the [=stack of open elements=] stack until an [=HTML element=] with the same tag name as the token has been popped from the stack. [=Clear the list of active formatting elements up to the last marker=]. Switch the [=insertion mode=] to "[=in row=]". : A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr" :: If the [=stack of open elements=] does *not* have a `td` or `th` element in table scope, then this is a [=parse error=]; ignore the token. ([=fragment case=]) Otherwise, [=close the cell=] (see below) and reprocess the token. : An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html" :: [=Parse error=]. : An end tag whose tag name is one of: "table", "tbody", "tfoot", "thead", "tr" :: If the [=stack of open elements=] does not [=in table scope|have an element in table scope=] that is an [=HTML element=] with the same tag name as that of the token, then this is a [=parse error=]; ignore the token. Otherwise, [=close the cell=] (see below) and reprocess the token. : Anything else :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. Where the steps above say to close the cell, they mean to run the following algorithm: 1. [=Generate implied end tags=]. 2. If the [=current node=] is not now a <{td}> element or a <{th}> element, then this is a [=parse error=]. 3. Pop elements from the [=stack of open elements=] stack until a <{td}> element or a <{th}> element has been popped from the stack. 4. [=Clear the list of active formatting elements up to the last marker=]. 5. Switch the [=insertion mode=] to "[=in row=]".

The [=stack of open elements=] cannot have both a <{td}> and a <{th}> element [=in table scope=] at the same time, nor can it have neither when the [=close the cell=] algorithm is invoked.

##### The "in select" insertion mode ##### {#the-in-select-insertion-mode} When the user agent is to apply the rules for the "[=in select=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is U+0000 NULL :: [=Parse error=]. : Any other character token :: [=Insert the token's character=]. : A comment token :: [=Insert a comment=]. : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : A start tag whose tag name is "option" :: If the [=current node=] is an <{option}> element, pop that node from the [=stack of open elements=]. [=Insert an HTML element=] for the token. : A start tag whose tag name is "optgroup" :: If the [=current node=] is an <{option}> element, pop that node from the [=stack of open elements=]. If the [=current node=] is an <{optgroup}> element, pop that node from the [=stack of open elements=]. [=Insert an HTML element=] for the token. : An end tag whose tag name is "optgroup" :: First, if the [=current node=] is an <{option}> element, and the node immediately before it in the [=stack of open elements=] is an <{optgroup}> element, then pop the [=current node=] from the [=stack of open elements=]. If the [=current node=] is an <{optgroup}> element, then pop that node from the [=stack of open elements=]. Otherwise, this is a [=parse error=]; ignore the token. : An end tag whose tag name is "option" :: If the [=current node=] is an <{option}> element, then pop that node from the [=stack of open elements=]. Otherwise, this is a [=parse error=]; ignore the token. : An end tag whose tag name is "select" :: If the [=stack of open elements=] does not have a `select` element in select scope, this is a [=parse error=]; ignore the token. ([=fragment case=]) Otherwise: Pop elements from the [=stack of open elements=] until a <{select}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. : A start tag whose tag name is "select" :: [=Parse error=]. If the [=stack of open elements=] does not have a `select` element in select scope, ignore the token. ([=fragment case=]) Otherwise: Pop elements from the [=stack of open elements=] until a <{select}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. Note: It just gets treated like an end tag. : A start tag whose tag name is one of: "input", "textarea" :: [=Parse error=]. If the [=stack of open elements=] does not have a `select` element in select scope, ignore the token. ([=fragment case=]) Otherwise: Pop elements from the [=stack of open elements=] until a <{select}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. Reprocess the token. : A start tag whose tag name is one of: "script", "template" : An end tag whose tag name is "template" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : An end-of-file token :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : Anything else :: [=Parse error=]. ##### The "in select in table" insertion mode ##### {#the-in-select-in-table-insertion-mode} When the user agent is to apply the rules for the "[=in select in table=]" [=insertion mode=], the user agent must handle the token as follows:

: A start tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" :: [=Parse error=]. Pop elements from the [=stack of open elements=] until a <{select}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. Reprocess the token. : An end tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" :: [=Parse error=]. If the [=stack of open elements=] does not [=in table scope|have an element in table scope=] that is an [=HTML element=] with the same tag name as that of the token, then ignore the token. Otherwise: Pop elements from the [=stack of open elements=] until a <{select}> element has been popped from the stack. [=Reset the insertion mode appropriately=]. Reprocess the token. : Anything else :: Process the token [=using the rules for=] the "[=in select=]" [=insertion mode=]. ##### The "in template" insertion mode ##### {#the-in-template-insertion-mode} When the user agent is to apply the rules for the "[=in template=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token : A comment token : A DOCTYPE token :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : A start tag whose tag name is one of: "base", "basefont", "bgsound", "link", "meta", "noframes", "script", "style", "template", "title" : An end tag whose tag name is "template" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : A start tag whose tag name is one of: "caption", "colgroup", "tbody", "tfoot", "thead" :: Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. Push "[=in table=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. Switch the [=insertion mode=] to "[=in table=]", and reprocess the token. : A start tag whose tag name is "col" :: Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. Push "[=in column group=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. Switch the [=insertion mode=] to "[=in column group=]", and reprocess the token. : A start tag whose tag name is "tr" :: Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. Push "[=in table body=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. Switch the [=insertion mode=] to "[=in table body=]", and reprocess the token. : A start tag whose tag name is one of: "td", "th" :: Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. Push "[=in row=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. Switch the [=insertion mode=] to "[=in row=]", and reprocess the token. : Any other start tag :: Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. Push "[=in body=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. Switch the [=insertion mode=] to "[=in body=]", and reprocess the token. : Any other end tag :: [=Parse error=]. : An end-of-file token :: If there is no <{template}> element on the [=stack of open elements=], then [=stop parsing=]. ([=fragment case=]) Otherwise, this is a [=parse error=]. Pop elements from the [=stack of open elements=] until a <{template}> element has been popped from the stack. [=Clear the list of active formatting elements up to the last marker=]. Pop the [=current template insertion mode=] off the [=stack of template insertion modes=]. [=Reset the insertion mode appropriately=]. Reprocess the token. ##### The "after body" insertion mode ##### {#the-after-body-insertion-mode} When the user agent is to apply the rules for the "[=after body=]" [=insertion mode=], the user agent must handle the token as follows:

: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : A comment token :: [=Insert a comment=] as the last child of the first element in the [=stack of open elements=] (the <{html}> element). : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : An end tag whose tag name is "html" :: If the parser was originally created as part of the [=HTML fragment parsing algorithm=], this is a [=parse error=]; ignore the token. ([=fragment case=]) Otherwise, switch the [=insertion mode=] to "[=after after body=]". : An end-of-file token :: [=Stop parsing=]. : Anything else :: [=Parse error=]. Switch the [=insertion mode=] to "[=in body=]" and reprocess the token. ##### The "in frameset" insertion mode ##### {#the-in-frameset-insertion-mode} When the user agent is to apply the rules for the "[=in frameset=]" [=insertion mode=], the user agent must handle the token as follows:

The [=current node=] can only be the root <{html}> element in the [=fragment case=].

[=Stop parsing=]. : Anything else :: [=Parse error=]. ##### The "after frameset" insertion mode ##### {#the-after-frameset-insertion-mode} When the user agent is to apply the rules for the "[=after frameset=]" [=insertion mode=], the user agent must handle the token as follows:

: A comment token :: [=Insert a comment=] as the last child of the {{Document}} object. : A DOCTYPE token : A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : An end-of-file token :: [=Stop parsing=]. : Anything else :: [=Parse error=]. Switch the [=insertion mode=] to "[=in body=]" and reprocess the token. ##### The "after after frameset" insertion mode ##### {#the-after-after-frameset-insertion-mode} When the user agent is to apply the rules for the "[=after after frameset=]" [=insertion mode=], the user agent must handle the token as follows:

: A comment token :: [=Insert a comment=] as the last child of the {{Document}} object. : A DOCTYPE token : A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE : A start tag whose tag name is "html" :: Process the token [=using the rules for=] the "[=in body=]" [=insertion mode=]. : An end-of-file token :: [=Stop parsing=]. : A start tag whose tag name is "noframes" :: Process the token [=using the rules for=] the "[=in head=]" [=insertion mode=]. : Anything else :: [=Parse error=]. #### The rules for parsing tokens in foreign content #### {#the-rules-for-parsing-tokens-in-foreign-content} When the user agent is to apply the rules for parsing tokens in foreign content, the user agent must handle the token as follows:

: A character token that is U+0000 NULL :: [=Parse error=]. [=insert a character|Insert a U+FFFD REPLACEMENT CHARACTER character=]. : A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE :: [=Insert the token's character=]. : Any other character token :: [=Insert the token's character=]. Set the [=frameset-ok flag=] to "not ok". : A comment token :: [=Insert a comment=]. : A DOCTYPE token :: [=Parse error=]. : A start tag whose tag name is one of: "b", "big", "blockquote", "body", "br", "center", "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5", "h6", "head", "hr", "i", "img", "li", "listing", "meta", "nobr", "ol", "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup", "table", "tt", "u", "ul", "var" : A start tag whose tag name is "font", if the token has any attributes named "color", "face", or "size" :: [=Parse error=]. If the parser was originally created for the [=HTML fragment parsing algorithm=], then act as described in the "any other start tag" entry below. ([=fragment case=]) Otherwise: Pop an element from the [=stack of open elements=], and then keep popping more elements from the [=stack of open elements=] until the [=current node=] is a [=MathML text integration point=], an [=HTML integration point=], or an element in the [=HTML namespace=]. Then, reprocess the token. : Any other start tag :: If the [=adjusted current node=] is an element in the [=MathML namespace=], [=adjust MathML attributes=] for the token. (This fixes the case of MathML attributes that are not all lowercase.) If the [=adjusted current node=] is an element in the [=SVG namespace=], and the token's tag name is one of the ones in the first column of the following table, change the tag name to the name given in the corresponding cell in the second column. (This fixes the case of SVG elements that are not all lowercase.)

Tag name	Element name
`altglyph`	`altGlyph`
`altglyphdef`	`altGlyphDef`
`altglyphitem`	`altGlyphItem`
`animatecolor`	`animateColor`
`animatemotion`	`animateMotion`
`animatetransform`	`animateTransform`
`clippath`	`clipPath`
`feblend`	`feBlend`
`fecolormatrix`	`feColorMatrix`
`fecomponenttransfer`	`feComponentTransfer`
`fecomposite`	`feComposite`
`feconvolvematrix`	`feConvolveMatrix`
`fediffuselighting`	`feDiffuseLighting`
`fedisplacementmap`	`feDisplacementMap`
`fedistantlight`	`feDistantLight`
`fedropshadow`	`feDropShadow`
`feflood`	`feFlood`
`fefunca`	`feFuncA`
`fefuncb`	`feFuncB`
`fefuncg`	`feFuncG`
`fefuncr`	`feFuncR`
`fegaussianblur`	`feGaussianBlur`
`feimage`	`feImage`
`femerge`	`feMerge`
`femergenode`	`feMergeNode`
`femorphology`	`feMorphology`
`feoffset`	`feOffset`
`fepointlight`	`fePointLight`
`fespecularlighting`	`feSpecularLighting`
`fespotlight`	`feSpotLight`
`fetile`	`feTile`
`feturbulence`	`feTurbulence`
`foreignobject`	`foreignObject`
`glyphref`	`glyphRef`
`lineargradient`	`linearGradient`
`radialgradient`	`radialGradient`
`textpath`	`textPath`

If the [=adjusted current node=] is an element in the [=SVG namespace=], [=adjust SVG attributes=] for the token. (This fixes the case of SVG attributes that are not all lowercase.) [=Adjust foreign attributes=] for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.) [=Insert a foreign element=] for the token, in the same namespace as the [=adjusted current node=]. If the token has its [=self-closing flag=] set, then run the appropriate steps from the following list:

: If the token's tag name is "script", and the new [=current node=] is in the [=SVG namespace=] :: [=acknowledged|Acknowledge the token's *self-closing flag*=], and then act as described in the steps for a "script" end tag below. : Otherwise :: Pop the [=current node=] off the [=stack of open elements=] and [=acknowledged|acknowledge the token's *self-closing flag*=]. : An end tag whose tag name is "script", if the [=current node=] is an SVG script element :: Pop the [=current node=] off the [=stack of open elements=]. Let the |old insertion point| have the same value as the current [=insertion point=]. Let the [=insertion point=] be just before the [=next input character=]. Increment the parser's [=script nesting level=] by one. Set the [=parser pause flag=] to true. Process the SVG `script` element according to the SVG rules, if the user agent supports SVG. [[!SVG11]]

Even if this causes {{Document/write()|new characters to be inserted into the tokenizer}}, the parser will not be executed reentrantly, since the [=parser pause flag=] is true.

Decrement the parser's [=script nesting level=] by one. If the parser's [=script nesting level=] is zero, then set the [=parser pause flag=] to false. Let the [=insertion point=] have the value of the |old insertion point|. (In other words, restore the [=insertion point=] to its previous value. This value might be the "undefined" value.) : Any other end tag :: Run these steps: 1. Initialize |node| to be the [=current node=] (the bottommost node of the stack). 2. If |node|'s tag name, converted to [=ASCII lowercase=], is not the same as the tag name of the token, then this is a [=parse error=]. 3. |Loop|: If |node| is the topmost element in the [=stack of open elements=], abort these steps. ([=fragment case=]) 4. If |node|'s tag name, converted to [=ASCII lowercase=], is the same as the tag name of the token, pop elements from the [=stack of open elements=] until |node| has been popped from the stack, and then abort these steps. 5. Set |node| to the previous entry in the [=stack of open elements=]. 6. If |node| is not an element in the [=HTML namespace=], return to the step labeled |Loop|. 7. Otherwise, process the token according to the rules given in the section corresponding to the current [=insertion mode=] in HTML content. ### The end ### {#the-end} Once the user agent stops parsing the document, the user agent must run the following steps: 1. Set the [=current document readiness=] to "`interactive`" and the [=insertion point=] to undefined. 2. Pop *all* the nodes off the [=stack of open elements=]. 3. If the [=list of scripts that will execute when the document has finished parsing=] is not empty, run these substeps: 1. [=Spin the event loop=] until the first <{script}> in the [=list of scripts that will execute when the document has finished parsing=] has its "[=ready to be parser-executed=]" flag set *and* the parser's {{Document}} [=has no style sheet that is blocking scripts=]. 2. [=Execute=] the first <{script}> in the [=list of scripts that will execute when the document has finished parsing=]. 3. Remove the first <{script}> element from the [=list of scripts that will execute when the document has finished parsing=] (i.e., shift out the first entry in the list). 4. If the [=list of scripts that will execute when the document has finished parsing=] is still not empty, repeat these substeps again from substep 1. 4. [=Queue a task=] to run the following substeps: 1. [=fire an event=] named `DOMContentLoaded` at the {{Document}} object, with its {{Event/bubbles}} attribute initialized to true. 2. Enable the [=client message queue=] of the {{ServiceWorkerContainer}} object whose associated [=service worker client=] is the {{Document}} object's [=relevant settings object=]. 5. [=Spin the event loop=] until the [=set of scripts that will execute as soon as possible=] and the [=list of scripts that will execute in order as soon as possible=] are empty. 6. [=Spin the event loop=] until there is nothing that delays the load event in the {{Document}}. 7. [=Queue a task=] to run the following substeps: 1. Set the [=current document readiness=] to "`complete`". 2. *Load event*: If the {{Document}} has a [=browsing context=], then [=fire an event=] named `load` at the {{Document}} object's {{Window}} object, with |legacy target override flag| set. 8. If the {{Document}} has a [=browsing context=], then [=queue a task=] to run the following substeps: 1. If the {{Document}}'s [=page showing=] flag is true, then abort this task (i.e., don't fire the event below). 2. Set the {{Document}}'s [=page showing=] flag to true. 3. [=Fire an event=] named `pageshow` at the {{Document}} object's {{Window}} object using {{PageTransitionEvent}}, with the {{PageTransitionEvent/persisted}} attribute initialized to false, and |legacy target override flag| set. 9. If the {{Document}}'s [=print when loaded=] flag is set, then run the [=printing steps=]. 10. The {{Document}} is now ready for post-load tasks. 11. [=Queue a task=] to mark the {{Document}} as completely loaded. When the user agent is to abort a parser, it must run the following steps: 1. Throw away any pending content in the [=input stream=], and discard any future content that would have been added to it. 2. Set the [=current document readiness=] to "`interactive`". 3. Pop *all* the nodes off the [=stack of open elements=]. 4. Set the [=current document readiness=] to "`complete`". Except where otherwise specified, the [=task source=] for the [=tasks=] mentioned in this section is the [=DOM manipulation task source=]. ### Coercing an HTML DOM into an infoset ### {#coercing-an-html-dom-into-an-infoset} When an application uses an [=HTML parser=] in conjunction with an XML pipeline, it is possible that the constructed DOM is not compatible with the XML tool chain in certain subtle ways. For example, an XML toolchain might not be able to represent attributes with the name <{xmlns/xmlns}>, since they conflict with the Namespaces in XML syntax. There is also some data that the [=HTML parser=] generates that isn't included in the DOM itself. This section specifies some rules for handling these issues. If the XML API being used doesn't support DOCTYPEs, the tool may drop DOCTYPEs altogether. If the XML API doesn't support attributes in no namespace that are named "<{xmlns/xmlns}>", attributes whose names start with "`xmlns:`", or attributes in the [=XMLNS namespace=], then the tool may drop such attributes. The tool may annotate the output with any namespace declarations required for proper operation. If the XML API being used restricts the allowable characters in the local names of elements and attributes, then the tool may map all element and attribute local names that the API wouldn't support to a set of names that *are* allowed, by replacing any character that isn't supported with the uppercase letter U and the six digits of the character's Unicode code point when expressed in hexadecimal, using digits 0-9 and capital letters A-F as the symbols, in increasing numeric order.

For example, the element name `foo<bar`, which can be output by the [=HTML parser=], though it is neither a legal HTML element name nor a well-formed XML element name, would be converted into `fooU00003Cbar`, which *is* a well-formed XML element name (though it's still not legal in HTML by any means).

As another example, consider the attribute <{xlink/href|xlink:href}>. Used on a MathML element, it becomes, after being [=adjust foreign attributes|adjusted=], an attribute with a prefix "`xlink`" and a local name "`href`". However, used on an HTML element, it becomes an attribute with no prefix and the local name "<{xlink/href|xlink:href}>", which is not a valid NCName, and thus might not be accepted by an XML API. It could thus get converted, becoming "`xlinkU00003Ahref`".

The resulting names from this conversion conveniently can't clash with any attribute generated by the [=HTML parser=], since those are all either lowercase or those listed in the [=adjust foreign attributes=] algorithm's table.

If the XML API restricts comments from having two consecutive U+002D HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE character between any such offending characters. If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS character (-), the tool may insert a single U+0020 SPACE character at the end of such comments. If the XML API restricts allowed characters in character data, attribute values, or comments, the tool may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE character, and any other literal non-XML character with a U+FFFD REPLACEMENT CHARACTER. If the tool has no way to convey out-of-band information, then the tool may drop the following information: * Whether the document is set to *no-quirks mode*, *limited-quirks mode*, or *quirks mode* * The association between form controls and forms that aren't their nearest <{form}> element ancestor (use of the `form` element pointer in the parser) * The [=template contents=] of any <{template}> elements.

The mutations allowed by this section apply *after* the [=HTML parser=]'s rules have been applied. For example, a `<a::>` start tag will be closed by a `</a::>` end tag, and never by a `</aU00003AU00003A>` end tag, even if the user agent is using the rules above to then generate an actual element in the DOM with the name `aU00003AU00003A` for that start tag.

### An introduction to error handling and strange cases in the parser ### {#an-introduction-to-error-handling-and-strange-cases-in-the-parser} *This section is non-normative.* This section examines some erroneous markup and discusses how the [=HTML parser=] handles these cases. #### Misnested tags: <b><i></b></i> #### {#misnested-tags-b-i-b-i} *This section is non-normative.* The most-often discussed example of erroneous markup is as follows: <p>1<b>2<i>3</b>4</i>5</p> The parsing of this markup is straightforward up to the "3". At this point, the DOM looks like this:

<{html}>
- <{head}>
- <{body}>
  - <{p}>
    - {{Text|#text}}: *1*
    - <{b}>
      - {{Text|#text}}: *2*
      - <{i}>
        
        {{Text|#text}}: *3*

Here, the [=stack of open elements=] has five elements on it: <{html}>, <{body}>, <{p}>, <{b}>, and <{i}>. The [=list of active formatting elements=] just has two: <{b}> and <{i}>. The [=insertion mode=] is "[=in body=]". Upon receiving the end tag token with the tag name "b", the "[=adoption agency algorithm=]" is invoked. This is a simple case, in that the |formatting element| is the <{b}> element, and there is no |furthest block|. Thus, the [=stack of open elements=] ends up with just three elements: <{html}>, <{body}>, and <{p}>, while the [=list of active formatting elements=] has just one: <{i}>. The DOM tree is unmodified at this point. The next token is a character ("4"), triggers the [=reconstruct the active formatting elements|reconstruction of the active formatting elements=], in this case just the <{i}> element. A new <{i}> element is thus created for the "4" {{Text}} node. After the end tag token for the "i" is also received, and the "5" {{Text}} node is inserted, the DOM looks as follows:

<{html}>
- <{head}>
- <{body}>
  - <{p}>
    - {{Text|#text}}: *1*
    - <{b}>
      - {{Text|#text}}: *2*
      - <{i}>
        
        {{Text|#text}}: *3*
    - <{i}>
      - {{Text|#text}}: *4*
    - {{Text|#text}}: *5*

#### Misnested tags: <b><p></b></p> #### {#misnested-tags-b-p-b-p} *This section is non-normative.* A case similar to the previous one is the following: <b>1<p>2</b>3</p> Up to the "2" the parsing here is straightforward:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
    - {{Text|#text}}: *1*
    - <{p}>
      - {{Text|#text}}: *2*

The interesting part is when the end tag token with the tag name "b" is parsed. Before that token is seen, the [=stack of open elements=] has four elements on it: <{html}>, <{body}>, <{b}>, and <{p}>. The [=list of active formatting elements=] just has the one: <{b}>. The [=insertion mode=] is "[=in body=]". Upon receiving the end tag token with the tag name "b", the "[=adoption agency algorithm=]" is invoked, as in the previous example. However, in this case, there *is* a |furthest block|, namely the <{p}> element. Thus, this time the adoption agency algorithm isn't skipped over. The |common ancestor| is the <{body}> element. A conceptual "bookmark" marks the position of the <{b}> in the [=list of active formatting elements=], but since that list has only one element in it, the bookmark won't have much effect. As the algorithm progresses, |node| ends up set to the formatting element (<{b}>), and |last node| ends up set to the |furthest block| (<{p}>). The |last node| gets appended (moved) to the |common ancestor|, so that the DOM looks like:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
    - {{Text|#text}}: *1*
  - <{p}>
    - {{Text|#text}}: *2*

A new <{b}> element is created, and the children of the <{p}> element are moved to it:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
    - {{Text|#text}}: *1*
  - <{p}>

<{b}>
- {{Text|#text}}: *2*

Finally, the new <{b}> element is appended to the <{p}> element, so that the DOM looks like:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
    - {{Text|#text}}: *1*
  - <{p}>
    - <{b}>
      - {{Text|#text}}: *2*

The <{b}> element is removed from the [=list of active formatting elements=] and the [=stack of open elements=], so that when the "3" is parsed, it is appended to the <{p}> element:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
    - {{Text|#text}}: *1*
  - <{p}>
    - <{b}>
      - {{Text|#text}}: *2*
    - {{Text|#text}}: *3*

#### Unexpected markup in tables #### {#unexpected-markup-in-tables} *This section is non-normative.* Error handling in tables is, for historical reasons, especially strange. For example, consider the following markup:

    <table><b><tr><td>aaa</td></tr>bbb</table>ccc

The highlighted <{b}> element start tag is not allowed directly inside a table like that, and the parser handles this case by placing the element *before* the table. (This is called [=foster parenting=].) This can be seen by examining the DOM tree as it stands just after the <{table}> element's start tag has been seen:

<{html}>
- <{head}>
- <{body}>
  - <{table}>

...and then immediately after the <{b}> element start tag has been seen:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
  - <{table}>

At this point, the [=stack of open elements=] has on it the elements <{html}>, <{body}>, <{table}>, and <{b}> (in that order, despite the resulting DOM tree); the [=list of active formatting elements=] just has the <{b}> element in it; and the [=insertion mode=] is "[=in table=]". The <{tr}> start tag causes the <{b}> element to be popped off the stack and a <{tbody}> start tag to be implied; the <{tbody}> and <{tr}> elements are then handled in a rather straight-forward manner, taking the parser through the "[=in table body=]" and "[=in row=]" insertion modes, after which the DOM looks as follows:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
  - <{table}>
    - <{tbody}>
      - <{tr}>

Here, the [=stack of open elements=] has on it the elements <{html}>, <{body}>, <{table}>, <{tbody}>, and <{tr}>; the [=list of active formatting elements=] still has the <{b}> element in it; and the [=insertion mode=] is "[=in row=]". The <{td}> element start tag token, after putting a <{td}> element on the tree, puts a [=marker=] on the [=list of active formatting elements=] (it also switches to the "[=in cell=]" [=insertion mode=]).

<{html}>
- <{head}>
- <{body}>
  - <{b}>
  - <{table}>
    - <{tbody}>
      - <{tr}>
        
        <{td}>

The [=marker=] means that when the "aaa" character tokens are seen, no <{b}> element is created to hold the resulting {{Text}} node:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
  - <{table}>
    - <{tbody}>
      - <{tr}>
        
        <{td}>
        
        {{Text|#text}}: *aaa*

The end tags are handled in a straight-forward manner; after handling them, the [=stack of open elements=] has on it the elements <{html}>, <{body}>, <{table}>, and <{tbody}>; the [=list of active formatting elements=] still has the <{b}> element in it (the [=marker=] having been removed by the "td" end tag token); and the [=insertion mode=] is "[=in table body=]". Thus it is that the "bbb" character tokens are found. These trigger the "[=in table text=]" insertion mode to be used (with the [=original insertion mode=] set to "[=in table body=]"). The character tokens are collected, and when the next token (the <{table}> element end tag) is seen, they are processed as a group. Since they are not all spaces, they are handled as per the "anything else" rules in the "[=in table=]" insertion mode, which defer to the "[=in body=]" insertion mode but with [=foster parenting=]. When [=reconstruct the active formatting elements|the active formatting elements are reconstructed=], a <{b}> element is created and [=foster parenting|foster parented=], and then the "bbb" {{Text}} node is appended to it:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
  - <{b}>
    - {{Text|#text}}: *bbb*
  - <{table}>
    - <{tbody}>
      - <{tr}>
        
        <{td}>
        
        {{Text|#text}}: *aaa*

The [=stack of open elements=] has on it the elements <{html}>, <{body}>, <{table}>, <{tbody}>, and the new <{b}> (again, note that this doesn't match the resulting tree!); the [=list of active formatting elements=] has the new <{b}> element in it; and the [=insertion mode=] is still "[=in table body=]". Had the character tokens been only [=space characters=] instead of "bbb", then those [=space characters=] would just be appended to the <{tbody}> element. Finally, the <{table}> is closed by a "table" end tag. This pops all the nodes from the [=stack of open elements=] up to and including the <{table}> element, but it doesn't affect the [=list of active formatting elements=], so the "ccc" character tokens after the table result in yet another <{b}> element being created, this time after the table:

<{html}>
- <{head}>
- <{body}>
  - <{b}>
  - <{b}>
    - {{Text|#text}}: *bbb*
  - <{table}>
    - <{tbody}>
      - <{tr}>
        
        <{td}>
        
        {{Text|#text}}: *aaa*
  - <{b}>
    - {{Text|#text}}: *ccc*

#### Scripts that modify the page as it is being parsed #### {#scripts-that-modify-the-page-as-it-is-being-parsed} *This section is non-normative.* Consider the following markup, which for this example we will assume is the document with [=url/URL=] `https://example.com/inner`, being rendered as the content of an <{iframe}> in another document with the [=url/URL=] `https://example.com/outer`: <div id="a"> <script> var div = document.getElementById("a"); parent.document.body.appendChild(div); </script> <script> alert(document.URL); </script> </div> <script> alert(document.URL); </script> Up to the first "script" end tag, before the script is parsed, the result is relatively straightforward:

<{html}>
- <{head}>
- <{body}>
  - <{div}> <{global/id}>="a"
    - {{Text|#text}}:
    - <{script}>
      - {{Text|#text}}: var div = document.getElementById("a"); ⏎ parent.document.body.appendChild(div);

After the script is parsed, though, the <{div}> element and its child <{script}> element are gone:

<{html}>
- <{head}>
- <{body}>

They are, at this point, in the {{Document}} of the aforementioned outer [=browsing context=]. However, the [=stack of open elements=] *still contains the <{div}> element*. Thus, when the second <{script}> element is parsed, it is inserted *into the outer {{Document}} object*. Those parsed into different {{Document}}s than the one the parser was created for do not execute, so the first alert does not show. Once the <{div}> element's end tag is parsed, the <{div}> element is popped off the stack, and so the next <{script}> element is in the inner {{Document}}:

<{html}>
- <{head}>
- <{body}>
  - <{script}>
    - {{Text|#text}}: alert(document.URL);

This script does execute, resulting in an alert that says "https://example.com/inner". #### The execution of scripts that are moving across multiple documents #### {#the-execution-of-scripts-that-are-moving-across-multiple-documents} *This section is non-normative.* Elaborating on the example in the previous section, consider the case where the second <{script}> element is an external script (i.e., one with a <{script/src}> attribute). Since the element was not in the parser's {{Document}} when it was created, that external script is not even downloaded. In a case where a <{script}> element with a <{script/src}> attribute is parsed normally into its parser's {{Document}}, but while the external script is being downloaded, the element is moved to another document, the script continues to download, but does not execute.

In general, moving <{script}> elements between {{Document}}s is considered a bad practice.

#### Unclosed formatting elements #### {#unclosed-formatting-elements} *This section is non-normative.* The following markup shows how nested formatting elements (such as <{b}>) get collected and continue to be applied even as the elements they are contained in are closed, but that excessive duplicates are thrown away. <!DOCTYPE html> <p><b class="x"><b class="x"><b><b class="x"><b class="x"><b>X <p>X <p><b><b class="x"><b>X <p></b></b></b></b></b></b>X The resulting DOM tree is as follows:

DOCTYPE: `html`
<{html}>
- <{head}>
- <{body}>
  - <{p}>
    - <{b}> <{global/class}>="x"
      - <{b}> <{global/class}>="x"
        
        <{b}>
        
        <{b}> <{global/class}>="x"
        
        <{b}> <{global/class}>="x"
        
        <{b}>
        
        {{Text|#text}}: X⏎
  - <{p}>
    - <{b}> <{global/class}>="x"
      - <{b}>
        
        <{b}> <{global/class}>="x"
        
        <{b}> <{global/class}>="x"
        
        <{b}>
        
        {{Text|#text}}: X⏎
  - <{p}>
    - <{b}> <{global/class}>="x"
      - <{b}>
        
        <{b}> <{global/class}>="x"
        
        <{b}> <{global/class}>="x"
        
        <{b}>
        
        <{b}>
        
        <{b}> <{global/class}>="x"
        
        <{b}>
        
        {{Text|#text}}: X⏎
  - <{p}>
    - {{Text|#text}}: X⏎

Note how the second <{p}> element in the markup has no explicit <{b}> elements, but in the resulting DOM, up to three of each kind of formatting element (in this case three <{b}> elements with the class attribute, and two unadorned <{b}> elements) get reconstructed before the element's "X". Also note how this means that in the final paragraph only six <{b}> end tags are needed to completely clear the [=list of active formatting elements=], even though nine <{b}> start tags have been seen up to this point.

Serializing HTML fragments

The following steps form the HTML fragment serialization algorithm. The algorithm takes as input a DOM {{Element}}, {{Document}}, or {{DocumentFragment}} referred to as |the node|, and returns a string.

This algorithm serializes the *children* of the node being serialized, not the node itself.

1. Let |s| be a string, and initialize it to the empty string. 2. If |the node| is a <{template}> element, then let |the node| instead be the <{template}> element's [=template contents=] (a {{DocumentFragment}} node). 3. For each child node of |the node|, in [=tree order=], run the following steps: 1. Let |current node| be the child node being processed. 2. Append the appropriate string from the following list to |s|:

: If |current node| is an {{Element}} :: If |current node| is an element in the [=HTML namespace=], the [=MathML namespace=], or the [=SVG namespace=], then let |tagname| be |current node|'s local name. Otherwise, let |tagname| be |current node|'s qualified name. Append a U+003C LESS-THAN SIGN character (<), followed by |tagname|.

For [=HTML elements=] created by the [=HTML parser=] or {{Document/createElement()}}, |tagname| will be lowercase.

For each attribute that the element has, append a U+0020 SPACE character, the [=attribute's serialized name|attribute's serialized name as described below=], a U+003D EQUALS SIGN character (=), a U+0022 QUOTATION MARK character ("), the attribute's value, [=escaping a string|escaped as described below=] in *attribute mode*, and a second U+0022 QUOTATION MARK character ("). An attribute's serialized name for the purposes of the previous paragraph must be determined as follows:

: If the attribute has no namespace :: The attribute's serialized name is the attribute's local name.

For attributes on [=HTML elements=] set by the [=HTML parser=] or by {{Element/setAttribute()|Element.setAttribute()}}, the local name will be lowercase.

: If the attribute is in the [=XML namespace=] :: The attribute's serialized name is the string "`xml:`" followed by the attribute's local name. : If the attribute is in the [=XMLNS namespace=] and the attribute's local name is <{xmlns/xmlns}> :: The attribute's serialized name is the string "<{xmlns/xmlns}>". : If the attribute is in the [=XMLNS namespace=] and the attribute's local name is not <{xmlns/xmlns}> :: The attribute's serialized name is the string "`xmlns:`" followed by the attribute's local name. : If the attribute is in the [=XLink namespace=] :: The attribute's serialized name is the string "`xlink:`" followed by the attribute's local name. : If the attribute is in some other namespace :: The attribute's serialized name is the attribute's qualified name. While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order. Append a U+003E GREATER-THAN SIGN character (>). If |current node| is an <{area}>, <{base}>, <{basefont}>, <{bgsound}>, <{br}>, <{col}>, <{embed}>, <{frame}>, <{hr}>, <{img}>, <{input}>, <{link}>, <{meta}>, <{param}>, <{source}>, <{track}> or <{wbr}> element, then continue on to the next child node at this point. Append the value of running the [=HTML fragment serialization algorithm=] on the |current node| element (thus recursing into this algorithm for that element), followed by a U+003C LESS-THAN SIGN character (<), a U+002F SOLIDUS character (/), |tagname| again, and finally a U+003E GREATER-THAN SIGN character (>). : If |current node| is a {{Text}} node :: If the parent of |current node| is a <{style}>, <{script}>, <{xmp}>, <{iframe}>, <{noembed}>, <{noframes}>, or <{plaintext}> element, or if the parent of |current node| is a <{noscript}> element and [=scripting is enabled=] for the node, then append the value of |current node|'s {{CharacterData/data}} IDL attribute literally. Otherwise, append the value of |current node|'s {{CharacterData/data}} IDL attribute, [=escaping a string|escaped as described below=]. : If |current node| is a {{Comment}} :: Append the literal string "``" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN). : If |current node| is a {{ProcessingInstruction}} :: Append the literal string "`<?`" (U+003C LESS-THAN SIGN, U+003F QUESTION MARK), followed by the value of |current node|'s {{ProcessingInstruction/target}} IDL attribute, followed by a single U+0020 SPACE character, followed by the value of |current node|'s {{CharacterData/data}} IDL attribute, followed by a single U+003E GREATER-THAN SIGN character (>). : If |current node| is a {{DocumentType}} :: Append the literal string "`<!DOCTYPE`" (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a space (U+0020 SPACE), followed by the value of |current node|'s {{DocumentType/name}} IDL attribute, followed by the literal string "`>`" (U+003E GREATER-THAN SIGN). 4. The result of the algorithm is the string |s|.

It is possible that the output of this algorithm, if parsed with an [=HTML parser=], will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by the [=HTML parser=] itself, although such cases are typically non-conforming.

For instance, if a <{textarea}> element to which a {{Comment}} node has been appended is serialized and the output is then reparsed, the comment will end up being displayed in the text field. Similarly, if, as a result of DOM manipulation, an element contains a comment that contains the literal string "`-->`", then when the result of serializing the element is parsed, the comment will be truncated at that point and the rest of the comment will be interpreted as markup. More examples would be making a <{script}> element contain a {{Text}} node with the text string "`</script>`", or having a <{p}> element that contains a <{ul}> element (as the <{ul}> element's [=start tag=] would imply the end tag for the <{p}>). This can enable cross-site scripting attacks. An example of this would be a page that lets the user enter some font family names that are then inserted into a CSS <{style}> block via the DOM and which then uses the {{Element/innerHTML}} IDL attribute to get the HTML serialization of that <{style}> element: if the user enters "`</style><script>attack</script>`" as a font family name, {{Element/innerHTML}} will return markup that, if parsed in a different context, would contain a <{script}> node, even though no <{script}> node existed in the original DOM.

For example, consider the following markup: <form id="outer"><div></form><form id="inner"><input> This will be parsed into:

<{html}>
- <{head}>
- <{body}>
  - <{form}> <{global/id}>="outer"
    - <{div}>
      - <{form}> <{global/id}>="inner"
        
        <{input}>

The input element will be associated with the inner form element. Now, if this tree structure is serialized and reparsed, the <form id="inner"> start tag will be ignored, and so the input element will be associated with the outer form element instead.

      <html>
        <head></head>
        <body>
          <form id="outer">
            <div>
              <form id="inner">
                <input>
              </form>
            </div>
          </form>
        </body>
      </html>

<{html}>
- <{head}>
- <{body}>
  - <{form}> <{global/id}>="outer"
    - <{div}>
      - <{input}>

As another example, consider the following markup: <a><table><a> This will be parsed into:

<{html}>
- <{head}>
- <{body}>
  - <{a}>
    - <{a}>
    - <{table}>

That is, the a elements are nested, because the second a element is foster parented. After a serialize-reparse roundtrip, the a elements and the table element would all be siblings, because the second <a> start tag implicitly closes the first a element.

      <html><head></head><body><a><a></a><table></table></a></body></html>

<{html}>
- <{head}>
- <{body}>
  - <{a}>
  - <{a}>
  - <{table}>

For historical reasons, this algorithm does not round-trip an initial U+000A LINE FEED (LF) character in pre, textarea, or listing elements, even though (in the first two cases) the markup being round-tripped can be conforming. The HTML parser will drop such a character during parsing, but this algorithm does not serialize an extra U+000A LINE FEED (LF) character.

For example, consider the following markup: <pre> Hello.</pre> When this document is first parsed, the pre element's child text content starts with a single newline character. After a serialize-reparse roundtrip, the pre element's child text content is simply "Hello.".

Escaping a string (for the purposes of the algorithm above) consists of running the following steps: 1. Replace any occurrence of the "`&`" character by the string "`&`". 2. Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string "` `". 3. If the algorithm was invoked in the *attribute mode*, replace any occurrences of the "`"`" character by the string "`"`". 4. If the algorithm was *not* invoked in the *attribute mode*, replace any occurrences of the "`<`" character by the string "`<`", and any occurrences of the "`>`" character by the string "`>`". ## Parsing HTML fragments ## {#parsing-html-fragments} The following steps form the HTML fragment parsing algorithm. The algorithm takes as input an {{Element}} node, referred to as the context element, which gives the context for the parser, as well as |input|, a string to parse, and returns a list of zero or more nodes.

Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm. The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a [=fragment case=] to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.

1. Create a new {{Document}} node, and mark it as being an [=HTML document=]. 2. If the [=node document=] of the |context| element is in [=quirks mode=], then let the {{Document}} be in [=quirks mode=]. Otherwise, the [=node document=] of the |context| element is in [=limited-quirks mode=], then let the {{Document}} be in [=limited-quirks mode=]. Otherwise, leave the {{Document}} in [=no-quirks mode=]. 3. Create a new [=HTML parser=], and associate it with the just created {{Document}} node. 4. Set the state of the [=HTML parser=]'s [[#tokenization|tokenization]] stage as follows, switching on the [=context=] element:

: <{title}> : <{textarea}> :: Switch the tokenizer to the [=RCDATA state=]. : <{style}> : <{xmp}> : <{iframe}> : <{noembed}> : <{noframes}> :: Switch the tokenizer to the [=RAWTEXT state=]. : <{script}> :: Switch the tokenizer to the [=script data state=]. : <{noscript}> :: If the [=scripting flag=] is enabled, switch the tokenizer to the [=RAWTEXT state=]. Otherwise, leave the tokenizer in the [=data state=]. : <{plaintext}> :: Switch the tokenizer to the [[#plaintext-state]]. : Any other element :: Leave the tokenizer in the [=data state=].

For performance reasons, an implementation that does not report errors and that uses the actual state machine described in this specification directly could use the PLAINTEXT state instead of the RAWTEXT and script data states where those are mentioned in the list above. Except for rules regarding parse errors, they are equivalent, since there is no [=appropriate end tag token=] in the fragment case, yet they involve far fewer state transitions.

5. Let |root| be a new <{html}> element with no attributes. 6. Append the element |root| to the {{Document}} node created above. 7. Set up the parser's [=stack of open elements=] so that it contains just the single element |root|. 8. If the |context| element is a <{template}> element, push "[=in template=]" onto the [=stack of template insertion modes=] so that it is the new [=current template insertion mode=]. 9. Create a start tag token whose name is the local name of |context| and whose attributes are the attributes of |context|. Let this start tag token be the start tag token of the |context| node, e.g., for the purposes of determining if it is an [=HTML integration point=]. 10. [=reset the insertion mode appropriately|Reset the parser's insertion mode appropriately=].

The parser will reference the |context| element as part of that algorithm.

11. Set the parser's `form` element pointer to the nearest node to the |context| element that is a <{form}> element (going straight up the ancestor chain, and including the element itself, if it is a <{form}> element), if any. (If there is no such <{form}> element, the `form` element pointer keeps its initial value, null.) 12. Place the |input| into the [=input stream=] for the [=HTML parser=] just created. The encoding [=confidence=] is *irrelevant*. 13. Start the parser and let it run until it has consumed all the characters just inserted into the input stream. 14. Return the child nodes of |root|, in [=tree order=]. ## Named character references ## {#named-character-references} This table lists the character reference names that are supported by HTML, and the code points to which they refer. It is referenced by the previous sections.

path: includes/entities.include

This data is also available as a JSON file. *The glyphs displayed above are non-normative. Refer to the Unicode specifications for formal definitions of the characters listed above.*

The character reference names originate from the XML Entity Definitions for Characters specification, though only the above is considered normative. [[[XML-ENTITY-NAMES]]]