This section only describes the rules for XML resources. Rules for
[[#text-html|text/html]] resources are discussed in the section above
entitled "The HTML syntax".
DOCTYPE if desired, but this is not required
to conform to this specification. This specification does not define a public or system
identifier, nor provide a formal DTD.
According to the XML specification, XML processors are not guaranteed to process
the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XHTML documents
is unsafe if they are defined in an external file (except for <,
>, &, "
and ').
Document object.
To create DOM nodes representing elements an XML parser must use the create an element for a token algorithm,
or some equivalent that operates on appropriate XML datastructures,
to ensure the proper element interfaces are created and that custom elements are set up correctly.
An XML parser is either associated with a Document object when it is
created, or creates one implicitly.
This Document must then be populated with DOM nodes that represent the tree
structure of the input passed to the parser, as defined by the XML specification, the Namespaces
in XML specification, and the DOM specification. DOM mutation events must not fire for the
operations that the XML parser performs on the Document's tree, but the
user agent must act as if elements and attributes were individually appended and set respectively
so as to trigger rules in this specification regarding what happens when an element is inserted
into a document or has its attributes set, and the DOM specification's requirements regarding
mutation observers mean that mutation observers are fired (unlike mutation events). [[!XML]] [[!XML-NAMES]] [[!DOM]] [[!UIEVENTS]]
Between the time an element's start tag is parsed and the time either the element's end tag is
parsed or the parser detects a well-formedness error, the user agent must act as if the element
was in a stack of open elements.
This is used, e.g., by the <{object}> element to avoid instantiating plugins before the <{param}> element children have been parsed.
This specification provides the following additional information that user agents should use when retrieving an external entity: the public identifiers given in the following list all correspond to the URL given by this link. (This URL is a DTD containing the entity declarations for the names listed in the [[#named-character-references]] section.) [[!XML]]-//W3C//DTD XHTML 1.0 Transitional//EN-//W3C//DTD XHTML 1.1//EN-//W3C//DTD XHTML 1.0 Strict//EN-//W3C//DTD XHTML 1.0 Frameset//EN-//W3C//DTD XHTML Basic 1.0//EN-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN-//W3C//DTD MathML 2.0//EN-//WAPFORUM//DTD XHTML Mobile 1.0//ENThis is not strictly a violation of the XML specification, but it does contradict the spirit of the XML specification's requirements. This is motivated by a desire for user agents to all handle entities in an interoperable fashion without requiring any network access for handling external subsets. [[!XML]]
XML parsers can be invoked with XML scripting support enabled or disabled. Except where otherwise specified, XML parsers are invoked with XML scripting support enabled. When an XML parser with XML scripting support enabled creates a <{script}> element, it must be marked as being "[=parser-inserted=]" and its "[=non-blocking=]" flag must be unset. If the parser was originally created for the XML fragment parsing algorithm, then the element must be marked as "[=already started=]" also. When the element's end tag is subsequently parsed, the user agent must perform a microtask checkpoint, and then prepare the <{script}> element. If this causes there to be a pending parsing-blocking script, then the user agent must run the following steps:Document has no
style sheet that is blocking scripts and the pending parsing-blocking
script's "[=ready to be parser-executed=]" flag is set.Since the {{Document/write()|document.write()}} API is not available for XML documents, much of the complexity in the HTML parser is not needed in the XML parser.
When the XML parser has XML scripting support disabled, none of this happens.
When an XML parser would append a node to a <{template}> element, it must instead append it to the <{template}> element's template contents (aDocumentFragment node).
This is a willful violation of the XML specification; unfortunately, XML is not
formally extensible in the manner that is needed for template processing.
[[!XML]]
Node object, its node document
must be set to the node document of
the node into which the newly created node is to be inserted.
Certain algorithms in this specification spoon-feed the
parser characters one string at a time. In such cases, the XML parser must act
as it would have if faced with a single string consisting of the concatenation of all those
characters.
When an XML parser reaches the end of its input, it must [=stop parsing=], following the same rules as the HTML parser. An XML
parser can also be aborted, which must again be done in
the same way as for an HTML parser.
For the purposes of conformance checkers, if a resource is determined to be in the XHTML syntax, then it is an XML document.
Document or
{{Element}} node either returns a fragment of XML that represents that node or throws an
exception.
For Documents, the algorithm must return a string in the form of a document entity, if none of the error cases
below apply.
For {{Element}}s, the algorithm must return a string in the form of an internal general parsed entity, if none of the
error cases below apply.
In both cases, the string returned must be XML namespace-well-formed and must be an isomorphic
serialization of all of that node's relevant child nodes, in tree order.
User agents may adjust prefixes and namespace declarations in the serialization (and indeed might
be forced to do so in some cases to obtain namespace-well-formed XML). User agents may use a
combination of regular text and character references to represent {{Text}} nodes in the
DOM.
A node's relevant child nodes are those that apply given the following rules:
template
element's template contents, if any.Document case.) [[!XML]]
[[!XML-NAMES]]
For the purposes of this section, an internal general parsed entity is considered XML
namespace-well-formed if a document consisting of an element with no namespace declarations whose
contents are the internal general parsed entity would itself be XML namespace-well-formed.
If any of the following error cases are found in the DOM subtree being serialized, then the
algorithm must throw an {{InvalidStateError}} exception instead of returning a
string:
Document node with no child element nodes.DocumentType node that has an external subset public identifier that contains
characters that are not matched by the XML PubidChar production. [[!XML]]DocumentType node that has an external subset system identifier that contains
both a U+0022 QUOTATION MARK (") and a U+0027 APOSTROPHE (') or that contains characters that are
not matched by the XML Char production. [[!XML]]Name production. [[!XML]]Attr node, {{Text}} node, Comment node, or
ProcessingInstruction node whose data contains characters that are not matched by
the XML Char production. [[!XML]]Comment node whose data contains two adjacent U+002D HYPHEN-MINUS characters
(-) or ends with such a character.ProcessingInstruction node whose target name is an ASCII
case-insensitive match for the string "xml".ProcessingInstruction node whose target name contains a U+003A COLON (:).ProcessingInstruction node whose data contains the string "?>".
These are the only ways to make a DOM unserialisable. The DOM enforces all the other XML
constraints; for example, trying to append two elements to a Document node
will throw a {{HierarchyRequestError}} exception.
Document or throws
a "{{SyntaxError}}" {{DOMException}}. Given a string input and a
context element context, the algorithm is as
follows:
lookupNamespaceURI() method
on the element would return a non-null value for that prefix.
The default namespace is the namespace for which the DOM isDefaultNamespace()
method on the element would return true.
No DOCTYPE is passed to the parser, and therefore no external subset is
referenced, and therefore no entities will be recognized.