The HTML syntax {#syntax}
==========================
This section only describes the rules for resources labeled with an
[=HTML MIME type=]. Rules for XML resources are discussed in the section below entitled
"[=The XML syntax=]".
## Writing HTML documents ## {#writing-html-documents}
*This section only applies to documents, authoring tools, and markup generators. In particular, it
does not apply to conformance checkers; conformance checkers must use the requirements given in
the next section ("parsing HTML documents").*
Documents must consist of the following parts, in the given order:
1. Optionally, a single U+FEFF BYTE ORDER MARK (BOM) character.
2. Any number of [=comments=] and [=space characters=].
3. A [=DOCTYPE=].
4. Any number of [=comments=] and [=space characters=].
5. The [=document element=], in the form of an <{html}> element.
6. Any number of [=comments=] and [=space characters=].
The various types of content mentioned above are described in the next few sections.
In addition, there are some restrictions on how [=character encoding declarations=] are to be
serialized, as discussed in the section on that topic.
Space characters before the <{html}> element, and space characters at the start of the <{html}>
element and before the <{head}> element, will be dropped when the document is parsed; space
characters *after* the <{html}> element will be parsed as if they were at the end of the
<{body}> element. Thus, space characters around the [=document element=] do not round-trip.
It is suggested that newlines be inserted after the DOCTYPE, after any comments that are before
the [=document element=], after the <{html}> element's start tag (if it is not [=omitted=]), and
after any comments that are inside the <{html}> element but before the <{head}> element.
Many strings in the HTML syntax (e.g., the names of elements and their attributes) are
case-insensitive, but only for [=uppercase ASCII letters=] and [=lowercase ASCII letters=]. For
convenience, in this section this is just referred to as "case-insensitive".
### The DOCTYPE ### {#the-doctype}
A DOCTYPE is a required preamble.
DOCTYPEs are required for legacy reasons. When omitted, browsers tend to use a
different rendering mode that is incompatible with some specifications. Including the DOCTYPE in a
document ensures that the browser makes a best-effort attempt at following the relevant
specifications.
A DOCTYPE must consist of the following components, in this order:
1. A string that is an [=ASCII case-insensitive=] match for the string "`<!DOCTYPE`".
2. One or more [=space characters=].
3. A string that is an [=ASCII case-insensitive=] match for the string "`html`".
4. Optionally, a [=DOCTYPE legacy string=].
5. Zero or more [=space characters=].
6. A U+003E GREATER-THAN SIGN character (>).
In other words, <!DOCTYPE html>, case-insensitively.
For the purposes of HTML generators that cannot output HTML markup with the short DOCTYPE
"<!DOCTYPE html>", a DOCTYPE legacy string may be inserted
into the DOCTYPE (in the position defined above). This string must consist of:
1. One or more [=space characters=].
2. A string that is an [=ASCII case-insensitive=] match for the string "SYSTEM".
3. One or more [=space characters=].
4. A U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (the |quote mark|).
5. The literal string "about:legacy-compat".
6. A matching U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (i.e., the same character as
in the earlier step labeled |quote mark|).
In other words, <!DOCTYPE html SYSTEM "about:legacy-compat"> or
<!DOCTYPE html SYSTEM 'about:legacy-compat'>, case-insensitively except for the
part in single or double quotes.
The [=DOCTYPE legacy string=] should not be used unless the document is generated from a system
that cannot output the shorter string.
### Elements ### {#writing-html-documents-elements}
There are six different kinds of elements:
[=void elements=], the <{template}> elements, [=raw text elements=], [=escapable raw text elements=], [=foreign elements=],
and [=normal elements=].
: Void elements
:: <{area}>, <{base}>, <{br}>, <{col}>, <{embed}>, <{hr}>, <{img}>, <{input}>, <{link}>, <{meta}>,
<{param}>, <{source}>, <{track}>, <{wbr}>
: The <{template}> elements
:: <{template}>
: Raw text elements
:: <{script}>, <{style}>
: escapable raw text elements
:: <{textarea}>, <{title}>
: Foreign elements
:: Elements from the [=MathML namespace=] and the [=SVG namespace=].
: Normal elements
:: All other allowed [=HTML elements=] are normal elements.
Tags are used to delimit the start and end of elements in the markup. [=Raw text=],
[=escapable raw text=], and [=normal elements=] have a [=start tag=] to indicate where they begin,
and an [=end tag=] to indicate where they end. The start and end tags of certain
[=normal elements=] can be omitted, as described in the section on [=omitted|optional tags=].
Those that cannot be omitted must not be omitted. [=Void elements=] only have a start tag; end
tags must not be specified for [=void elements=]. [=Foreign elements=] must either have a start
tag and an end tag, or a start tag that is marked as self-closing, in which case they must not
have an end tag.
The contents of the element must be placed between just after the start tag (which
[=omitted|might be implied, in certain cases=]) and just before the end tag (which again,
[=omitted|might be implied, in certain cases=]). The exact allowed contents of each individual
element depend on the [=content model=] of that element, as described earlier in this
specification. Elements must not contain content that their content model disallows. In addition
to the restrictions placed on the contents by those content models, however, the five types of
elements have additional *syntactic* requirements.
[=Void elements=] can't have any contents (since there's no end tag, no content can be put between
the start tag and the end tag).
The <{template}> element can have template contents, but such template contents are not children of the <{template}> element itself. Instead, they are stored in a {{DocumentFragment}} associated with a different {{Document}} — without a [=browsing context=] — so as to avoid the <{template}> contents interfering with the main {{Document}}. The markup for the template contents of a <{template}> element is placed just after the <{template}> element's start tag and just before <{template}> element's end tag (as with other elements), and may consist of any [=text=], [=character references=], [=kind of element|elements=], and [=comments=], but the text must not contain the character U+003C LESS-THAN SIGN (<) or an [=ambiguous ampersand=].
[=Raw text elements=] can have [=text=], though it has
[[#restrictions-on-the-contents-of-raw-text-and-escapable-raw-text-elements|restrictions]]
described below.
[=Escapable raw text elements=] can have [=text=] and [=character references=], but the text must
not contain an [=ambiguous ampersand=]. There are also
[[#restrictions-on-the-contents-of-raw-text-and-escapable-raw-text-elements|further restrictions]]
described below.
[=Foreign elements=] whose start tag is marked as self-closing can't have any contents (since,
again, as there's no end tag, no content can be put between the start tag and the end tag).
[=Foreign elements=] whose start tag is *not* marked as self-closing can have [=text=],
[=character references=], [=CDATA sections=], other [=kind of element|elements=], and
[=comments=], but the text must not contain the character U+003C LESS-THAN SIGN (<) or an
[=ambiguous ampersand=].
The HTML syntax does not support namespace declarations, even in [=foreign elements=].
For instance, consider the following HTML fragment:
The innermost element, cdr:license, is actually in the [=SVG namespace=], as the
"xmlns:cdr" attribute has no effect (unlike in XML). In fact, as the comment in the
fragment above says, the fragment is actually non-conforming. This is because the SVG
specification does not define any elements called "cdr:license" in the
[=SVG namespace=].
[=Normal elements=] can have [=text=], [=character references=], other
[=kind of element|elements=], and [=comments=], but the text must not contain the character U+003C
LESS-THAN SIGN (<) or an [=ambiguous ampersand=]. Some [=normal elements=] also have
[[#restrictions-on-content-models|yet more restrictions]] on what content they are allowed to
hold, beyond the restrictions imposed by the content model and those described in this paragraph.
Those restrictions are described below.
Tags contain a tag name, giving the element's name. HTML elements all have names that
only use [=alphanumeric ASCII characters=]. In the HTML syntax, tag names, even those for
[=foreign elements=], may be written with any mix of lower- and uppercase letters that, when
converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.
#### Start tags #### {#start-tags}
Start tags must have the following format:
1. The first character of a start tag must be a U+003C LESS-THAN SIGN character (<).
2. The next few characters of a start tag must be the element's [=tag name=].
3. If there are to be any attributes in the next step, there must first be one or more
[=space characters=].
4. Then, the start tag may have a number of attributes, the [=attribute|syntax for which=] is
described below. Attributes must be separated from each other by one or more
[=space characters=].
5. After the attributes, or after the [=tag name=] if there are no attributes, there may be one or
more [=space characters=]. (Some attributes are required to be followed by a space. See
[[#elements-attributes]] below.)
6. Then, if the element is one of the [=void elements=], or if the element is a
[=foreign element=], then there may be a single U+002F SOLIDUS character (/). This character
has no effect on [=void elements=], but on [=foreign elements=] it marks the start tag as
self-closing.
7. Finally, start tags must be closed by a U+003E GREATER-THAN SIGN character (>).
#### End tags #### {#end-tags}
End tags must have the following format:
1. The first character of an end tag must be a U+003C LESS-THAN SIGN character (<).
2. The second character of an end tag must be a U+002F SOLIDUS character (/).
3. The next few characters of an end tag must be the element's [=tag name=].
4. After the tag name, there may be one or more [=space characters=].
5. Finally, end tags must be closed by a U+003E GREATER-THAN SIGN character (>).
#### Attributes #### {#elements-attributes}
Attributes for an element are expressed inside the element's start tag.
Attributes have a name and a value. Attribute names must consist of one or more
characters other than the [=space characters=], U+0000 NULL, U+0022 QUOTATION MARK ("),
U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS
SIGN (=) characters, the [=control characters=], and any characters that are not defined by
Unicode. In the HTML syntax, attribute names, even those for [=foreign elements=], may be written
with any mix of lower- and uppercase letters that are an [=ASCII case-insensitive=] match for the
attribute's name.
Attribute values are a mixture of [=text=] and [=character references=], except with
the additional restriction that the text cannot contain an [=ambiguous ampersand=].
Attributes can be specified in four different ways:
: Empty attribute syntax
:: Just the [=attribute name=]. The value is implicitly the empty string.
In the following example, the <{input/disabled}> attribute is given with the empty attribute
syntax:
If an attribute using the empty attribute syntax is to be followed by another attribute, then
there must be a [=space character=] separating the two.
: Unquoted attribute value syntax
:: The [=attribute name=], followed by zero or more [=space characters=], followed by a single
U+003D EQUALS SIGN character, followed by zero or more [=space characters=], followed by the
[=attribute value=], which, in addition to the requirements given above for attribute values,
must not contain any literal [=space characters=], any U+0022 QUOTATION MARK characters
("), U+0027 APOSTROPHE characters ('), U+003D EQUALS SIGN characters (=), U+003C
LESS-THAN SIGN characters (<), U+003E GREATER-THAN SIGN characters (>), or U+0060 GRAVE
ACCENT characters (`), and must not be the empty string.
In the following example, the <{input/value}> attribute is given with the unquoted attribute
value syntax:
If an attribute using the unquoted attribute syntax is to be followed by another attribute or
by the optional U+002F SOLIDUS character (/) allowed in step 6 of the [=start tag=] syntax
above, then there must be a [=space character=] separating the two.
: Single-quoted attribute value syntax
:: The [=attribute name=], followed by zero or more [=space characters=], followed by a single
U+003D EQUALS SIGN character, followed by zero or more [=space characters=], followed by a
single U+0027 APOSTROPHE character ('), followed by the [=attribute value=], which, in
addition to the requirements given above for attribute values, must not contain any literal
U+0027 APOSTROPHE characters ('), and finally followed by a second single U+0027 APOSTROPHE
character (').
In the following example, the <{input/type}> attribute is given with the single-quoted
attribute value syntax:
If an attribute using the single-quoted attribute syntax is to be followed by another
attribute, then there must be a [=space character=] separating the two.
: Double-quoted attribute value syntax
:: The [=attribute name=], followed by zero or more [=space characters=], followed by a single
U+003D EQUALS SIGN character, followed by zero or more [=space characters=], followed by a
single U+0022 QUOTATION MARK character ("), followed by the [=attribute value=], which, in
addition to the requirements given above for attribute values, must not contain any literal
U+0022 QUOTATION MARK characters ("), and finally followed by a second single U+0022 QUOTATION
MARK character (").
In the following example, the <{input/name}> attribute is given with the double-quoted
attribute value syntax:
If an attribute using the double-quoted attribute syntax is to be followed by another
attribute, then there must be a [=space character=] separating the two.
There must never be two or more attributes on the same start tag whose names are an
[=ASCII case-insensitive=] match for each other.
---
When a [=foreign element=] has one of the namespaced attributes given by the local name and
namespace of the first and second cells of a row from the following table, it must be written
using the name given by the third cell from the same row.
Local name
Namespace
Attribute name
actuate
[=XLink namespace=]
<{xlink/actuate|xlink:actuate}>
arcrole
[=XLink namespace=]
<{xlink/arcrole|xlink:arcrole}>
href
[=XLink namespace=]
<{xlink/href|xlink:href}>
role
[=XLink namespace=]
<{xlink/role|xlink:role}>
show
[=XLink namespace=]
<{xlink/show|xlink:show}>
title
[=XLink namespace=]
<{xlink/title|xlink:title}>
type
[=XLink namespace=]
<{xlink/type|xlink:type}>
lang
[=XML namespace=]
<{xml/lang|xml:lang}>
space
[=XML namespace=]
<{xml/space|xml:space}>
xmlns
[=XMLNS namespace=]
<{xmlns/xmlns}>
xlink
[=XMLNS namespace=]
<{xlink/xlink|xmlns:xlink}>
No other namespaced attribute can be expressed in [[#syntax|the HTML syntax]].
Whether the attributes in the table above are conforming or not is defined by
other specifications (e.g., the SVG and MathML specifications); this section only describes the
syntax rules if the attributes are serialized using the HTML syntax.
#### Optional tags #### {#optional-tags}
Certain tags can be omitted.
Omitting an element's [=start tag=] in the situations described below does not
mean the element is not present; it is implied, but it is still there. For example, an HTML
document always has a root <{html}> element, even if the string <html> doesn't
appear anywhere in the markup.
An <{html}> element's [=start tag=] may be omitted if the first thing inside the <{html}> element
is not a [=comment=].
For example, in the following case it's ok to remove the "<html>" tag:
Hello
Welcome to this example.
Doing so would make the document look like this:
Hello