This section is non-normative.
To embed an image in HTML, when there is only a single image resource,
use the <{img}> element and with its <{img/src}> and <{img/alt}> attributes.
However, there are a number of situations for which the author might wish to use multiple image
resources that the user agent can choose from:
Different users might have different environmental characteristics:
The users' physical screen size might be different from one another.
A mobile phone's screen might be 4 inches diagonally, while a laptop's screen might be 14
inches diagonally.
This is only relevant when an image's rendered size depends on the viewport size.
The users' screen pixel density might be different from one another.
A mobile phone's screen might have three times as many physical pixels per inch compared to
another mobile phone's screen, regardless of their physical screen size.
The users' zoom level might be different from one another, or might change for a single user
over time.
A user might zoom in to a particular image to be able to get a more detailed look.
The zoom level and the screen pixel density (the previous point) can both affect the
number of physical screen pixels per CSS pixel. This ratio is usually referred to as
device-pixel-ratio.
The users' screen orientation might be different from one another, or might change for a
single user over time.
A tablet can be held upright or rotated 90 degrees, so that the screen is either "portrait"
or "landscape".
The users' network speed, network latency and bandwidth cost might be different from one
another, or might change for a single user over time.
A user might be on a fast, low-latency and constant-cost connection while at work, on a
slow, low-latency and constant-cost connection while at home, and on a variable-speed,
high-latency and variable-cost connection anywhere else.
Authors might want to show the same image content but with different rendered size depending
on, usually, the width of the viewport. This is usually referred to as
viewport-based selection.
A Web page might have a banner at the top that always spans the entire viewport
width. In this case, the rendered size of the image depends on the physical size of the screen
(assuming a maximised browser window).
Another Web page might have images in columns, with a single column for screens with a small
physical size, two columns for screens with medium physical size, and three columns for screens
with big physical size, with the images varying in rendered size in each case to fill up the
viewport. In this case, the rendered size of an image might be bigger in
the one-column layout compared to the two-column layout, despite the screen being smaller.
Authors might want to show different image content depending on the rendered size of the image.
This is usually referred to as art direction.
When a Web page is viewed on a screen with a large physical size (assuming a maximised
browser window), the author might wish to include some less relevant parts surrounding the
critical part of the image. When the same Web page is viewed on a screen with a small physical
size, the author might wish to show only the critical part of the image.
Authors might want to show the same image content but using different image formats, depending
on which image formats the user agent supports. This is usually referred to as
image format-based selection.
A Web page might have some images in the JPEG, WebP and JPEG XR image formats, as the
latter two have better compression abilities compared to JPEG. Since different user agents
can support different image formats, with some formats offering better compression ratios,
the author would like to serve the better formats to user agents that support them, while
providing a JPEG fallback for user agents that don't.
The above situations are not mutually exclusive. For example, it is reasonable to combine
different resources for different device-pixel-ratio with different resources for
art direction.
While it is possible to solve these problems using scripting, doing so introduces some other
problems:
Some user agents aggressively download images specified in the HTML markup, before scripts
have had a chance to run, so that Web pages complete loading sooner. If a script changes which
image to download, the user agent will potentially start two separate downloads, which can
instead cause worse page loading performance.
If the author avoids specifying any image in the HTML markup and instead instantiates a
single download from script, that avoids the double download problem above but instead it makes
no image be downloaded at all for users with scripting disabled and it disables the aggressive
image downloading optimization.
With this in mind, this specification introduces a number of features to address the above
problems in a declarative manner.
Device-pixel-ratio-based selection when the rendered size of the image is fixed
The <{img/src}> and <{img/srcset}> attributes on the <{img}> element can be used, using the
x descriptor, to provide multiple images that only vary in their size (the smaller
image is a scaled-down version of the bigger image).
The user agent can choose any of the given resources depending on the user's screen's pixel
density, zoom level, and possibly other factors such as the user's network conditions.
To provide backwards compatibility with older user agents that don't understand the
<{img/srcset}> attribute, one of the URLs is specified in the <{img}> element's
src attribute. This will result in something useful (though perhaps at a
lower-resolution than the user might expect) being displayed even in older user agents.
For user agents that understand <{img/srcset}>, the src attribute participates
in the resource selection, as if it had been specified by <{img/srcset}> with a
1x descriptor.
The image's rendered size is given in the width and height
attributes, which allows the user agent to allocate space for the image before it is
downloaded.
The <{img/srcset}> and sizes attributes can be used,
using the w descriptor,
to provide multiple images that only vary in their size
(the smaller image is a scaled-down version of the bigger image).
In this example, a banner image takes up the entire viewport width
(using appropriate CSS).
The user agent will calculate the effective pixel density of each image from the specified
w descriptors and the specified rendered size in the sizes
attribute.
It can then choose any of the given resources depending on the user's screen's pixel
density, zoom level, and possibly other factors such as the user's network conditions.
If the user's screen is 320 CSS pixels wide, this is equivalent to specifying
wolf-400.jpg 1.25x, wolf-800.jpg 2.5x, wolf-1600.jpg 5x.
On the other hand, if the user's screen is 1200 CSS pixels wide,
this is equivalent to specifying
wolf-400.jpg 0.33x, wolf-800.jpg 0.67x, wolf-1600.jpg 1.33x.
By using the w descriptors and the sizes attribute,
the user agent can choose the correct image source to download regardless of how large the user's device is.
For backwards compatibility,
one of the URLs is specified in the <{img}> element's src attribute.
In new user agents, the src attribute is ignored
when the <{img/srcset}> attribute uses w descriptors.
In this example, the sizes attribute could be omitted
because the default value is 100vw.
In this example, the Web page has three layouts depending on the width of the viewport.
The narrow layout has one column of images (the width of each image is about 100%),
the middle layout has two columns of images (the width of each image is about 50%),
and the widest layout has three columns of images, and some page margin (the width of each image is about 33%).
It breaks between these layouts when the viewport is 30em wide and 50em wide, respectively.
The sizes attribute sets up the
layout breakpoints at 30em and 50em,
and declares the image sizes between these breakpoints to be
100vw, 50vw, or calc(33vw - 100px).
These sizes do not necessarily have to match up exactly with the actual image width as specified in the CSS.
The user agent will pick a width from the sizes attribute,
using the first item with a <> (the part in parentheses) that evaluates to true,
or using the last item (calc(33vw - 100px)) if they all evaluate to false.
For example, if the viewport width is 29em,
then (max-width: 30em) evaluates to true and 100vw is used,
so the image size, for the purpose of resource selection, is 29em.
If the viewport width is instead 32em,
then (max-width: 30em) evaluates to false,
but (max-width: 50em) evaluates to true and 50vw is used,
so the image size, for the purpose of resource selection, is 16em (half the viewport width).
Notice that the slightly wider viewport results in a smaller image because of the different layout.
The user agent can then calculate the effective pixel density and choose an appropriate
resource similarly to the previous example.
The <{picture}> element and the <{source}> element,
together with the media attribute,
can be used, to provide multiple images that vary the image content
(for instance the smaller image might be a cropped version of the bigger image).
The user agent will choose the first <{source}> element
for which the media query in the media attribute matches,
and then choose an appropriate URL from its srcset attribute.
The rendered size of the image varies depending on which resource is chosen.
To specify dimensions that the user agent can use before having downloaded the image,
CSS can be used.
This example combines art direction- and device-pixel-ratio-based selection.
A banner that takes half the viewport is provided in two versions,
one for wide screens and one for narrow screens.
In this example, the user agent will choose the first <{source}> that has a <{source/type}>
attribute with a supported MIME type. If the user agent supports WebP images, the first
<{source}> element will be chosen. If not, but the user agent does support JPEG XR images, the
second <{source}> element will be chosen. If neither of those formats are supported, the <{img}> element will be chosen.
The <{picture}> element is a container which provides multiples sources to its contained
<{img}> element to allow authors to declaratively control or give hints to the user agent about
which image resource to use, based on the screen pixel density, viewport size, image
format, and other factors. It represents its children.
The user agent first tries to match an image source contained within the <{source}> attribute and use that,
but if none is found, it falls back to what is contained within the <{img}> element, which must be present.
The following example utilizes art direction to provide the appropriate
image at a particular viewport for small and large screens.
The <{picture}> element is somewhat different from the similar-looking
video and <{audio}> elements. While all of them contain <{source}> elements,
the <{source}> element's <{source/src}> attribute has no meaning when the element is
nested within a <{picture}> element, and the resource selection algorithm is different.
By itself the <{picture}> element itself does not display anything; it merely provides
a context for its contained <{img}> element that enables it to choose from multiple
URLs.
The <{source}> element allows authors to specify multiple alternative source sets for
<{img}> elements or multiple alternative media resources for media elements. It
does not represent anything on its own.
The type attribute may be present. If present,
the value must be a valid MIME type.
The remainder of the requirements depend on whether the parent is a <{picture}> element or a
media element:
<{source}> element's parent is a <{picture}> element
The srcset content attribute must be
present, and must consist of one or more image candidate strings, each separated from
the next by a U+002C COMMA character (,). If an image candidate string contains no
descriptors and no [=space characters=] after the URL, the following
image candidate string, if there is one, must begin with one or more
[=space characters=].
If the <{source/srcset}> attribute has any image candidate strings using a
width descriptor, the sizes content
attribute must also be present, and the value must be a valid source size list.
The media content attribute may also be
present. If present, the value must contain a valid media query list.
The <{source/type}> gives the type of the images in the source set, to allow the user
agent to skip to the next <{source}> element if it does not support the given type.
If the <{source/type}> attribute is not specified, the user agent will not select
a different <{source}> element if it finds that it does not support the image format after
fetching it.
When a <{source}> element has a following sibling <{source}> element or <{img}> element with
a srcset attribute specified, it must have at least one of the following:
* A <{source/media}> attribute specified with a value that, after
stripping leading and trailing white space, is not the empty string and is not an
ASCII case-insensitive match for the string "all".
* A <{source/type}> attribute specified.
The src attribute must not be present.
Dynamically modifying a <{source}> element and its attribute when the element is already
inserted in a <{video}> or <{audio}> element will have no effect. To change what
is playing, just use the src attribute on the media element directly,
possibly making use of the canPlayType() method to pick from amongst available
resources. Generally, manipulating <{source}> elements manually after the document has
been parsed is an unnecessarily complicated approach.
The <{source/type}> content attribute gives the type of the media resource, to help
the user agent determine if it can play this media resource before fetching it. If
specified, its value must be a valid MIME type. The codecs parameter,
which certain MIME types define, might be necessary to specify exactly how the resource
is encoded. [[!RFC6381]]
The following list shows some examples of how to use the codecs= MIME
parameter in the <{source/type}> attribute.
: H.264 Constrained baseline profile video (main and extended video compatible) level 3 and Low-Complexity AAC audio in MP4 container
::
: H.264 Extended profile video (baseline-compatible) level 3 and Low-Complexity AAC audio in MP4 container
::
: H.264 Main profile video level 3 and Low-Complexity AAC audio in MP4 container
::
: H.264 "High" profile video (incompatible with main, baseline, or extended profiles) level 3 and Low-Complexity AAC audio in MP4 container
::
: MPEG-4 Visual Simple Profile Level 0 video and Low-Complexity AAC audio in MP4 container
::
: MPEG-4 Advanced Simple Profile Level 0 video and Low-Complexity AAC audio in MP4 container
::
: MPEG-4 Visual Simple Profile Level 0 video and AMR audio in 3GPP container
::
: Theora video and Vorbis audio in Ogg container
::
: Theora video and Speex audio in Ogg container
::
: Vorbis audio alone in Ogg container
::
: Speex audio alone in Ogg container
::
: FLAC audio alone in Ogg container
::
: Dirac video and Vorbis audio in Ogg container
::
The <{source/srcset}>, <{source/sizes}>, and <{source/media}> attributes must not be present.
If a <{source}> element is inserted as a child of a media element that has no
<{source/src}> attribute and whose networkState has the value
NETWORK_EMPTY, the user agent must invoke the media element's
resource selection algorithm.
The IDL attributes
src,
type,
srcset,
sizes and
media must reflect the
respective content attributes of the same name.
If the author isn't sure if user agents will all be able to render the media resources
provided, the author can listen to the error event on the last
<{source}> element and trigger fallback behavior:
An <{img}> element represents an image and its fallback content.
The image given by the src and srcset attributes,
and any previous sibling <{source}> elements'
srcset attributes if the parent is a <{picture}> element,
is the embedded content; the value of
the alt attribute and the content referred to by
the <{img/longdesc}> attribute are the
<{img}> element's fallback content, and provide equivalent content for
users and user agents who cannot process images or have image loading disabled.
Requirements for alternative representations of the image are described
in the next section.
The src attribute must be present, and must contain a
valid non-empty URL potentially surrounded by spaces referencing a non-interactive,
optionally animated, image resource that is neither paged nor scripted.
The srcset attribute may also be present.
If present, its value must consist of one or more
image candidate strings,
each separated from the next by a U+002C COMMA character (,).
If an image candidate string contains no descriptors
and no [=space characters=] after the URL,
the following image candidate string, if there is one,
must begin with one or more [=space characters=].
An image candidate string consists of the following components, in order, with the
further restrictions described below this list:
Zero or more [=space characters=].
A valid non-empty URL that does not start or end with a U+002C COMMA character (,),
referencing a non-interactive, optionally animated, image resource
that is neither paged nor scripted.
Zero or more [=space characters=].
Zero or one of the following:
A width descriptor, consisting of:
a space character,
a valid non-negative integer giving a number greater than zero
representing the width descriptor value,
and a U+0077 LATIN SMALL LETTER W character.
A pixel density descriptor, consisting of:
a space character,
a valid floating-point number giving a number greater than zero
representing the pixel density descriptor value,
and a U+0078 LATIN SMALL LETTER X character.
The requirements above imply that images can be static bitmaps (e.g., PNGs, GIFs, JPEGs),
single-page vector documents (single-page PDFs, XML files with an SVG document element), animated
bitmaps (APNGs, animated GIFs), animated vector graphics (XML files with an SVG document element
that use declarative SMIL animation), and so forth. However, these definitions preclude SVG
files with script, multipage PDF files, interactive MNG files, HTML documents, plain text
documents, and so forth. [[!PNG]] [[!GIF]] [[!JPEG]] [[!PDF]] [[!XML]] [[APNG]] [[!SVG11]] [[!MNG]]
If the srcset attribute is present,
the sizes attribute may also be present.
If present, its value must be a valid source size list.
A valid source size list is a string that matches the following grammar:
[[!CSS-VALUES]] [[!MEDIAQ]]
= # [ , ]? | = =
A <source-size-value> must not be negative.
Percentages are not allowed in a <source-size-value>,
to avoid confusion about what it would be relative to.
The vw unit can be used for sizes relative to the viewport width.
The <{img}> element must not be used as a layout tool. In particular, img
elements should not be used to display transparent images, as such images rarely convey meaning and
rarely add anything useful to the document.
The crossorigin attribute is a CORS
settings attribute. Its purpose is to allow images from third-party sites that allow
cross-origin access to be used with <{canvas}>.
The decoding attribute is an enumerated attribute.
It is an [=image decoding hint=] to request synchronous or asynchronous image loading.
Valid values are "sync", "async", and "auto".
The [=missing value default=] and [=invalid value default=] are both "auto".
The referrerpolicy attribute is a referrer policy attribute.
Its purpose is to set the referrer policy used when fetching the image. [[!REFERRERPOLICY]]
An <{img}> element has a current request and a pending request.
The current request is initially set to a new image request.
The pending request is initially set to null.
The current request is usually referred to as the <{img}> element itself.
An image request has a state, current URL and image data.
An image request's state is one of the following:
Unavailable
The user agent hasn't obtained any image data,
or has obtained some or all of the image data but
hasn't yet decoded enough of the image to get the image dimensions.
Partially available
The user agent has obtained some of the image data and at least the image dimensions are
available.
Completely available
The user agent has obtained all of the image data and at least the image dimensions are
available.
Broken
The user agent has obtained all of the image data that it can, but it cannot even decode
the image enough to get the image dimensions (e.g., the image is corrupted, or the format is
not supported, or no data could be obtained).
The element's src, srcset, width, or sizes attributes are set, changed, or removed.
The element's src attribute is set to the same value as the previous value.
This must set the restart animation flag for the update the image data algorithm.
The element's crossorigin attribute's state is changed.
The element's parent is a <{picture}> element and a <{source}> element is inserted as a
previous sibling.
The element's parent is a <{picture}> element and a
<{source}> element that was a previous sibling is removed.
The element's parent is a <{picture}> element and a
<{source}> element that is a previous sibling has its
srcset,
sizes,
media
or type attributes set, changed, or removed.
Each <{img}> element has a last selected source, which must initially be
null.
Each image request has a current pixel density, which must initially be undefined.
When an <{img}> element has a current pixel density that is not 1.0, the
element's image data must be treated as if its resolution, in device pixels per CSS pixels, was
the current pixel density.
The image's density-corrected intrinsic width and height are the intrinsic width and height
after taking into account the current pixel density.
For example, given a screen with 96 CSS pixels per CSS inch, if the
current pixel density is 3.125, that means that there are 96 × 3.125 = 300 device
pixels per CSS inch, and thus if the image data is 300x600, it has intrinsic dimensions of
300 ÷ 3.125 = 96 CSS pixels by 600 ÷ 3.125 = 192 CSS pixels. With a current pixel density
of 2.0 (192 device pixels per CSS inch) and the same image data (300x600), the intrinsic
dimensions would be 150x300.
Each {{Document}} object must have a list of available images. Each image
in this list is identified by a tuple consisting of an absolute URL, a CORS
settings attribute mode, and, if the mode is not No CORS, an
[=concept/origin=].
Each image furthermore has an ignore higher-layer caching flag.
User agents may copy entries from one {{Document}}
object's list of available images to another at any time (e.g., when the
{{Document}} is created, user agents can add to it all the images that are loaded in
other {{Document}}s), but must not change the keys of entries copied in this way when
doing so, and must unset the ignore higher-layer caching flag for the copied entry.
User agents may also remove images from such lists at any time (e.g., to save
memory).
User agents must remove entries in the list of available images as appropriate
given higher-layer caching semantics for the resource (e.g., the HTTP Cache-Control
response header) when the ignore higher-layer caching flag is unset.
The list of available images is intended to enable synchronous
switching when changing the src attribute to a URL that has
previously been loaded, and to avoid re-downloading images in the same document even when they
don't allow caching per HTTP. It is not used to avoid re-downloading the same image while the
previous image is still loading.
For example, if a resource has the HTTP response header Cache-Control: must-revalidate,
the user agent would remove it from the list of available images but could keep the image data separately,
and use that if the server responds with a 204 No Content status.
Image decoding is the process which converts
an encoded image's data into a bitmap form for presentation. Implementations may
choose when and what to decode for the best user experience.
Image decoding is said to be synchronous if it prevents presentation of other
content until it is finished. Typically, this has an effect of atomically
presenting the image and any other content at the same time. However, this
presentation is delayed by the amount of time it takes to perform the decode.
Image decoding is said to be asynchronous if it does not prevent presentation of
other content. This has an effect of presenting non-image content faster. However,
the image content is missing on screen until the decode finishes. Once the decode
is finished, the screen is updated with the image.
In both synchronous and asynchronous decoding modes, the final content is
presented to screen after the same amount of time has elapsed. The main
difference is whether the user agent presents non-image content ahead of
presenting the final content.
In order to aid the user agent in deciding whether to perform synchronous or
asynchronous decode, the decoding attribute can
be set on <{img}> elements. The possible values of the
decoding attribute are the following
image decoding hint keywords:
Keyword
State
Brief description
sync
Sync
Indicates a preference to decode this image synchronously
for atomic presentation with other content.
async
Async
Indicates a preference to decode this image asynchronously
to avoid delaying presentation of other content.
auto
Auto
Indicates no preference in decoding mode (the default).
When decoding an image, the user agent should respect the preference indicated
by the decoding attribute's state. If the state indicated is auto, then the
user agent is free to choose any decoding behavior.
When the user agent is to update the image data of an <{img}> element,
optionally with the restart animations flag set,
it must run the following steps:
If another instance of this algorithm for this <{img}> element was started after this instance
(even if it aborted and is no longer running), then abort these steps.
If the element does not use srcset or picture and
it does not have a parent or it has a parent but it is not a <{picture}> element,
and it has a src attribute specified and
its value is not the empty string, let selected source be the value of the
element's src attribute, and selected pixel
density be 1.0. Otherwise, let selected source be null and selected pixel density be undefined.
If selected source is not null, run these substeps:
Parseselected source, relative
to the element's node document. If that is not successful, then
abort these inner set of steps. Otherwise, let urlString be the
resulting URL string.
Let key be a tuple consisting of urlString, the
<{img}> element's <{img/crossorigin}>
attribute's mode, and, if that mode is not No CORS,
the node document's [=concept/origin=].
⌛ If another instance of this algorithm for this <{img}> element was started
after this instance (even if it aborted and is no longer running), then abort these steps.
Only the last instance takes effect, to avoid multiple requests when, for
example, the src, srcset,
and crossorigin attributes are all set in
succession.
⌛ Let selected source and selected pixel density be the
URL and pixel density that results from selecting an image source,
respectively.
⌛ Parseselected source, relative
to the element's node document, and let urlString be the
resulting URL string. If
that is not successful, run these substeps:
This, unfortunately, can be used to perform a rudimentary port scan of the user's local
network (especially in conjunction with scripting, though scripting isn't actually
necessary to carry out such an attack). User agents may implement cross-origin access
control policies that are stricter than those described above to mitigate this attack, but
unfortunately such policies are typically not compatible with existing Web content.
Otherwise, if image request is the current request,
it is in the unavailable state,
and the user agent is able to determine that image request's image
is corrupted in some fatal way such that the image dimensions cannot be obtained,
set the current request's state to broken.
Each task that is queued by the networking task source while the image is being
fetched must update the presentation of the image, but as each new body part comes in, it must
replace the previous image. Once one body part has been completely decoded, the user agent
must set the <{img}> element to the completely available state and queue a task to fire a simple event named
load at the <{img}> element.
The progress and loadend events are not fired for
multipart/x-mixed-replace image streams.
If the resource type and data corresponds to a supported image format, as described below
If the user agent is able to determine image request's image's width and height,
and image request is pending request,
set image request's state to partially available.
Otherwise, if the user agent is able to determine image request's image's width and height,
and image request is current request,
update the <{img}> element's presentation appropriately
and set image request's state to partially available.
Otherwise, if the user agent is able to determine that image request's image
is corrupted in some fatal way such that the image dimensions cannot be obtained,
and image request is current request,
abort the image request for image request,
fire a simple event named error at the <{img}> element,
fire a simple event named loadend at the <{img}> element,
and abort these steps.
That task, and each subsequent task, that is queued by the
networking task source while the image is being fetched, if image
request is the current request, must update the presentation of the image
appropriately (e.g., if the image is a progressive JPEG, each packet can improve the
resolution of the image).
Furthermore, the last task that is queued by the networking task source once the resource has been
fetched must additionally run these steps:
To fire a progress event or simple event named type at an element e,
depending on resource r, means to
fire a progress event named type at e if r is CORS-same-origin,
and otherwise fire a simple event named type at e.
While a user agent is running the above algorithm for an element x, there
must be a strong reference from the element's node document to the element x,
even if that element is not in its Document.
An img element is said to use srcset or picture
if it has a <{img/srcset}> attribute specified or if it has a parent that is a
picture element.
When an <{img}> element is in the completely available
state and the user agent can decode the media data without errors, then the
<{img}> element is said to be fully decodable.
Whether the image is fetched successfully or not (e.g., whether the response status was an
ok status) must be ignored when determining the image's type and whether it is a
valid image.
This allows servers to return images with error responses, and have them displayed.
The user agent should apply the image sniffing rules to determine the type of the image,
with the image's associated Content-Type headers giving the official type. If
these rules are not applied, then the type of the image must be the type given by the image's
associated Content-Type headers.
User agents must not support non-image resources with the <{img}> element (e.g., XML files whose
document element is an HTML element). User agents must not run executable code (e.g.,
scripts) embedded in the image resource. User agents must only display the first page of a
multipage resource (e.g., a PDF file). User agents must not allow the resource to act in an
interactive fashion, but should honor any animation in the resource.
This specification does not specify which image types are to be supported.
An <{img}> element is associated with a source set.
A source set is an ordered set of zero or more image sources
and a source size.
An image source is a [=url/URL=],
and optionally either a density descriptor, or a width descriptor.
A source size is a <source-size-value>.
When a source size has a unit relative to the viewport,
it must be interpreted relative to the <{img}> element's document's viewport.
Other units must be interpreted the same as in Media Queries. [[!MEDIAQ]]
When asked to select an image source for a given <{img}>
element el, user agents must do the following:
If el's source set is empty,
return null as the URL and undefined as the pixel density and abort these steps.
Otherwise, take el's source set
and let it be source set.
If an entry b in source set has the same associated density descriptor
as an earlier entry a in source set, then remove entry b.
Repeat this step until none of the entries in source set have the same associated density descriptor as an earlier entry.
In a user agent-specific manner,
choose one image source from source set.
Let this be selected source.
Return selected source and its associated pixel density.
When asked to update the source set for a given <{img}> element el,
user agents must do the following:
If el has a parent node and that is a <{picture}> element,
let elements be an array containing el's parent node's child elements, retaining relative order.
Otherwise, let elements be array containing only el.
If el has a width attribute, and parsing
that attribute's value using the rules for parsing dimension values doesn't generate
an error or a percentage value, then let width be the returned integer value.
Otherwise, let width be null.
Iterate through elements,
doing the following for each item child:
If child has a src attribute
whose value is not the empty string
and source set does not contain an
image source with a density descriptor value of 1,
and no image source with a width descriptor,
append child's src attribute value to source set.
Each <{img}> element independently considers
its previous sibling <{source}> elements
plus the <{img}> element itself
for selecting an image source, ignoring any other (invalid) elements,
including other <{img}> elements in the same <{picture}> element,
or <{source}> elements that are following siblings
of the relevant <{img}> element.
When asked to parse a srcset attribute from an element,
parse the value of the element's srcset attribute as follows:
Let input be the value passed to this algorithm.
Let position be a pointer into input,
initially pointing at the start of the string.
Splitting loop: Collect a sequence of characters
that are [=space characters=] or U+002C COMMA characters.
If any U+002C COMMA characters were collected, that is a [=parse error=].
If position is past the end of input,
return candidates and abort these steps.
Let c be the character at position.
Do the following depending on the value of state.
For the purpose of this step, "EOF" is a special character representing
that position is past the end of input.
If current descriptor is not empty,
append current descriptor to descriptors
and let current descriptor be the empty string.
Set state to after descriptor.
U+002C COMMA (,)
Advance position to the next character in input.
If current descriptor is not empty,
append current descriptor to descriptors.
Jump to the step labeled descriptor parser.
U+0028 LEFT PARENTHESIS (()
Append c to current descriptor.
Set state to in parens.
EOF
If current descriptor is not empty,
append current descriptor to descriptors.
Jump to the step labeled descriptor parser.
Anything else
Append c to current descriptor.
In parens
Do the following, depending on the value of c:
U+0029 RIGHT PARENTHESIS ())
Append c to current descriptor.
Set state to in descriptor.
EOF
Append current descriptor to descriptors.
Jump to the step labeled descriptor parser.
Set state to in descriptor.
Set position to the previous character in input.
Advance position to the next character in input.
Repeat this substep.
In order to be compatible with future additions,
this algorithm supports multiple descriptors and descriptors with parens.
Descriptor parser: Let error be no.
Let width be absent.
Let density be absent.
Let future-compat-h be absent.
For each descriptor in descriptors,
run the appropriate set of steps from the following list:
If the descriptor consists of a valid non-negative integer
followed by a U+0077 LATIN SMALL LETTER W character
If the user agent does not support the sizes attribute,
let error be yes.
A conforming user agent will support the sizes attribute.
However, user agents typically implement and ship features in an incremental manner in practice.
If width and density
are not both absent,
then let error be yes.
If density is zero, the intrinsic dimensions will be infinite.
User agents are expected to have limits in how big images can be rendered,
which is allowed by the hardware limitations clause.
If the descriptor consists of a valid non-negative integer
followed by a U+0068 LATIN SMALL LETTER H character
This is a [=parse error=].
If future-compat-h and density
are not both absent,
then let error be yes.
If future-compat-h is not absent and width is absent,
let error be yes.
If error is still no,
then append a new image source to candidates
whose URL is url,
associated with a width width if not absent
and a pixel density density if not absent.
Otherwise, there is a [=parse error=].
Return to the step labeled splitting loop.
When asked to parse a sizes attribute from an element,
parse a comma-separated list of component values
from the value of the element's sizes attribute
(or the empty string, if the attribute is absent),
and let unparsed sizes list be the result. [[!CSS-SYNTAX-3]]
For each unparsed size in unparsed sizes list:
Remove all consecutive <>s
from the end of unparsed size.
If unparsed size is now empty,
that is a [=parse error=];
continue to the next iteration of this algorithm.
If the last component value in unparsed size
is a valid non-negative <source-size-value>,
let size be its value
and remove the component value from unparsed size.
Any CSS function other than the calc() function is invalid.
Otherwise, there is a [=parse error=];
continue to the next iteration of this algorithm.
Remove all consecutive <>s
from the end of unparsed size.
If unparsed size is now empty,
return size and exit this algorithm.
If this was not the last item in unparsed sizes list,
that is a [=parse error=].
Parse the remaining component values in unparsed size
as a <>.
If it does not parse correctly,
or it does parse correctly but the <> evaluates to false,
continue to the next iteration of this algorithm. [[!MEDIAQ]]
Return size and exit this algorithm.
If the above algorithm exhausts unparsed sizes list without returning a
size value, follow these steps:
If width is not null, return a <length> with the value
width and the unit px.
Return 100vw.
A [=parse error=] for the algorithms above
indicates a non-fatal mismatch between input and requirements.
User agents are encouraged to expose [=parse error=]s somehow.
While a valid source size list only contains a bare <source-size-value>
(without an accompanying <>)
as the last entry in the <source-size-list>,
the parsing algorithm technically allows such at any point in the list,
and will accept it immediately as the size
if the preceding entries in the list weren't used.
This is to enable future extensions,
and protect against simple author errors such as a final trailing comma.
An image source can have a density descriptor,
a width descriptor,
or no descriptor at all accompanying its URL.
Normalizing a source set gives every image source a density descriptor.
When asked to normalize the source densities of a source setsource set,
the user agent must do the following:
Otherwise, give the image source a density descriptor of 1x.
The user agent may at any time run the following algorithm to update an img
element's image in order to react to changes in the environment. (User agents are not
required to ever run this algorithm; for example, if the user is not looking at the page any
more, the user agent might want to wait until the user has returned to the page before determining
which image to use, in case the environment changes again in the meantime.)
⌛ Parseselected source,
relative to the element's node document, and let urlString be the
resulting URL string.
If that is not successful, abort these steps.
⌛ Let corsAttributeState be the state of the element's
crossorigin content attribute.
⌛ Let origin be the [=concept/origin=] of the <{img}> element's
node document.
If the list of available images contains an entry for key,
then set image request's image data to that of the entry.
Continue to the next step.
Otherwise, run these substeps:
If response's unsafe response is a network error or
if the image format is unsupported (as determined by applying the image sniffing rules, again as mentioned earlier),
or if the user agent is able to determine that image request's image is corrupted in
some fatal way such that the image dimensions cannot be obtained, or if the resource type is
multipart/x-mixed-replace, then let pending request be null and abort
these steps.
Otherwise, response's unsafe response is image
request's image data. It can be either
CORS-same-origin or CORS-cross-origin; this affects the
[=concept/origin=] of the image itself (e.g., when used on a
canvas).
If the src attribute is set and the alt attribute is set to the empty string (e.g. alt="" or alt without a set value)
The image is either decorative or supplemental to the rest of the content, redundant with
some other information in the document.
If the image is available and the user agent is configured
to display that image, then the element represents the element's image data.
Otherwise, the element represents nothing, and may be omitted completely from
the rendering. User agents may provide the user with a notification that an image is present but
has been omitted from the rendering.
If the src attribute is set and the alt attribute is set to a value that isn't empty
The image is a key part of the content; the alt attribute
gives a textual equivalent or replacement for the image.
If the image is available and the user agent is configured
to display that image, then the element represents the element's image data.
Otherwise, the element represents the text given by the alt attribute. User agents may provide the user with a notification
that an image is present but has been omitted from the rendering.
If the src attribute is set and the alt attribute is not
There is no textual equivalent of the image available.
If the image is available and the user agent is configured
to display that image, then the element represents the element's image data.
Otherwise, the user agent should display some sort of indicator that there is an image that
is not being rendered, and may, if requested by the user, or if so configured, or when required
to provide contextual information in response to navigation, provide caption information for the
image, derived as follows:
If the image is a descendant of a <{figure}> element that has a child
<{figcaption}> element, and, ignoring the <{figcaption}> element and its
descendants, the <{figure}> element has no {{Text}} node descendants other
than inter-element white space, and no embedded content descendant
other than the <{img}> element, then the contents of the first such
<{figcaption}> element are the caption information; abort these steps.
There is no caption information.
If the src attribute is not set and either the alt attribute is set to the empty string or the alt attribute is not set at all
The element represents the text given by the alt attribute.
The alt attribute does not represent advisory information.
User agents must not present the contents of the alt attribute
in the same way as content of the <{global/title}> attribute.
User agents may always provide the user with the option to display any image, or to prevent any
image from being displayed. User agents may also apply heuristics to help the user make use of the
image when the user is unable to see it, e.g., due to a visual disability or because they are using
a text terminal with no graphics capabilities. Such heuristics could include, for instance,
optical character recognition (OCR) of text found within the image.
In the case where an <{img}> without an alt attribute is the child of a <{figure}>
element with a non-empty <{figcaption}> element, the image's presence should be minimally conveyed
to a user by Assistive Technology, typically by identifying the image role.
The contents of <{img}> elements, if any, are ignored for the purposes of
rendering.
The usemap attribute,
if present, can indicate that the image has an associated
image map.
The ismap
attribute, when used on an element that is a descendant of an
<{a}> element with an <{a/href}> attribute, indicates by its
presence that the element provides access to a server-side image
map. This affects how events are handled on the corresponding
<{a}> element.
Users who do not have a pointing device, or who cannot see
the image referred to will generally not be able to use a server-side image map successfully.
Authors should use another mechanism, such as a "client-side" image map
made using the <{img/usemap}> attribute, wherever possible.
The ismap attribute is a
boolean attribute. The attribute must not be specified
on an element that does not have an ancestor <{a}> element
with an <{a/href}> attribute.
In this example the user is asked to identify a person in an image by
clicking on it. The server could respond to a click by confirming the
correct position or by asking the user to try again.
Where's Wally? Click on him!
The coordinates where the user clicks are sent to the server with the
request for the resource referenced by the <{a}> element's
href attribute.
The definition of the <{a}> element explains how the coordinates of the
click event on an <{img}> element with ismap
attribute inside an <{a}> element are communicated to the server.
The usemap and <{img/ismap}> attributes can result in confusing behavior
when used together with <{source}> elements with the media attribute specified
in a <{picture}> element.
The <{img}> element supports dimension attributes.
The alt, src, srcset and sizes IDL attributes must reflect the
respective content attributes of the same name.
The crossOrigin IDL attribute must
reflect the crossorigin content attribute, limited to only known values.
The useMap IDL attribute must
reflect the usemap content attribute.
The isMap IDL attribute must reflect
the <{img/ismap}> content attribute.
The referrerPolicy IDL attribute must
reflect the <{img/referrerpolicy}> content attribute, limited to only known values.
The longDesc IDL attribute is defined in [[!html-longdesc]]. The IDL attribute must reflect
the <{img/longdesc}> content attribute.
image . width [ = value ]
image . height [ = value ]
These attributes return the actual rendered dimensions of the
image, or zero if the dimensions are not known.
They can be set, to change the corresponding content
attributes.
image . naturalWidth
image . naturalHeight
These attributes return the intrinsic dimensions of the image,
or zero if the dimensions are not known.
image . complete
Returns true if the image has been completely downloaded or if
no image is specified; otherwise, returns false.
Returns the [=image decoding hint=] set for this image.
image . decode()
This method causes the user agent to decode the image
[=in parallel=], returning a promise that fulfills when decoding is complete.
The promise will be rejected with an "{{EncodingError}}" {{DOMException}}
if the image cannot be decoded.
image = new Image( [ width [, height ] ] )
Returns a new <{img}> element, with the width and height attributes set to the values
passed in the relevant arguments, if applicable.
The IDL attributes width and height must return the rendered width and height of the
image, in CSS pixels, if the image is being rendered, and is being rendered to a
visual medium; or else the density-corrected intrinsic width and height
of the image, in CSS pixels, if the image has intrinsic dimensions and is
available but not being rendered to a visual medium; or else 0, if
the image is not available or does not have
intrinsic dimensions. [[!CSS-2015]]
On setting, they must act as if they reflected the respective
content attributes of the same name.
The IDL attributes naturalWidth and
naturalHeight must return
the density-corrected intrinsic width and height
of the image, in CSS pixels, if the image has intrinsic dimensions and is
available, or else 0. [[!CSS-2015]]
The IDL attribute complete must return true if
any of the following conditions is true:
Both the src attribute and the srcset attribute are omitted.
The srcset attribute is omitted and the src attribute's value is the empty string.
The value of complete can thus change while
a [=concept/script=] is executing.
The currentSrc IDL attribute
must return the <{img}> element's current request's current URL.
The decode() method, when invoked,
must perform the following steps:
If any of the following conditions are true about this <{img}> element:
its [=node document=] is not an [=active document=];
Decode the image.
If decoding does not need to be performed for this image, resolve promise with
undefined. An example of this case would be for vector graphics.
If decoding fails, reject promise with an "{{EncodingError}}" {{DOMException}}.
An example of this would be corrupt image data or attempting to decode an
image encoded by an unsupported codec.
If the decoding process completes successfully, resolve promise with undefined.
User agents should ensure that the decoded media image is readily available
until at least the end of the next successful update the rendering step in the
event loop.
Implementations may choose to discard the decoded image data in the event it
is difficult to keep the decoded copy ready. Low memory situations or large
images would be cases where implementations may choose to do this.
Implementations are expected to treat animated images that have all frames
loaded as completely available.
Return *promise*.
A constructor is provided for creating HTMLImageElement objects (in addition to
the factory methods from DOM such as createElement()): Image(width, height).
When invoked as a constructor, this must return a new HTMLImageElement object (a new
<{img}> element). If the width argument is present, the new object's
width content attribute must be set to width. If the height argument is also present, the new object's
height content attribute must be set to height.
The element's node document must be the active document of the
browsing context of the Window object on which the interface object of
the invoked constructor is found.
Requirements for providing text to act as an alternative for images
Text alternatives, [[WCAG20]]
are a primary way of making visual information accessible, because they can be rendered through many
sensory modalities (for example, visual, auditory or tactile) to match the needs of the user. Providing
text alternatives allows the information to be rendered in a variety of ways by a variety of user agents.
For example, a person who cannot see a picture can hear the text alternative read aloud using synthesized speech.
The <{img/alt}> attribute on images is a very important accessibility attribute.
Authoring useful alt attribute content requires the author to
carefully consider the context in which the image appears and the function that
image may have in that context.
The <{img/longdesc}> attribute on images is likely to be read far less often by users
and is necessary for far fewer images. Nevertheless it provides an important way for
users who cannot see an image or cannot see it clearly, and user agents that cannot automatically process images,
to understand what it shows. The <{img/longdesc}> attribute's use cases are more fully described in [[!html-longdesc]]
The guidance included here addresses the most common ways authors use images.
Additional guidance and techniques are available in Resources on Alternative Text for Images.
Examples of scenarios where users benefit from text alternatives for images
They have a very slow connection and are browsing with images disabled.
They have a vision impairment and use text to speech software.
They have a cognitive impairment and use text to speech software.
They are using a text-only browser.
They are listening to the page being read out by a voice Web browser.
They have images disabled to save on download costs.
They have problems loading images or the source of an image is wrong.
General guidelines
Except where otherwise specified, the alt attribute must be specified and its value must not be empty;
the value must be an appropriate functional replacement for the image. The specific requirements for the
alt attribute content depend on the image's function in the page, as described in the following sections.
To determine an appropriate text alternative it is important to think about why an image is being included in a page.
What is its purpose? Thinking like this will help you to understand what is important about the image for the
intended audience. Every image has a reason for being on a page, because it provides useful information, performs a
function, labels an interactive element, enhances aesthetics or is purely decorative. Therefore, knowing what the image
is for, makes writing an appropriate text alternative easier.
A link or button containing nothing but an image
When an <{a}> element that is a hyperlink, or a <{button}> element, has no text content
but contains one or more images, include text in the alt attribute(s) that together convey the purpose of the link or button.
In this example, a portion of an editor interface is displayed. Each button has an icon representing an action a user can take on content they are editing. For users who cannot view the images, the action names are included within the alt attributes of the images:
In this example, a link contains a logo. The link points to the W3C web site from an external site.
The text alternative is a brief description of the link target.
This example is the same as the previous example, except that the link is on the W3C web site.
The text alternative is a brief description of the link target.
Depending on the context in which an image of a logo is used it could be appropriate to provide an indication, as part of the text alternative, that the image is a logo. Refer to section [[#logos-insignia-flags-or-emblems]].
In this example, a link contains a print preview icon. The link points to a version of the page with a
print stylesheet applied. The text alternative is a brief description of the link target.
In this example, a button contains a search icon. The button submits a search form. The text alternative is a
brief description of what the button does.
In this example, a company logo for the PIP Corporation has been split into the following two images,
the first containing the word PIP and the second with the abbreviated word CO. The images are the
sole content of a link to the PIPCO home page. In this case a brief description of the link target is provided.
As the images are presented to the user as a single entity the text alternative PIP CO home is in the
alt attribute of the first image.
Users can benefit when content is presented in graphical form, for example as a
flowchart, a diagram, a graph, or a map showing directions. Users who are unable to view the image also benefit when
content presented in a graphical form is provided in a text-based format. Software agents that process text content,
but cannot automatically process images (e.g. translation services, many search engines), also benefit from a
text-based description.
In the following example we have an image of a pie chart, with text in the alt
attribute representing the data shown in the pie chart:
In the case where an image repeats the previous paragraph in graphical form. The
<{img/alt}> attribute content labels the image and the <{img/longdesc}> attribute identifies it as a description.
According to a recent study Firefox has a 40% browser share,
Internet Explorer has 25%, Chrome has 25%, Safari has 6% and Opera has 4%.
It can be seen that when the image is not available, for example because the src
attribute value is incorrect, the text alternative provides the user with a brief description of
the image content:
In cases where the text alternative is lengthy, more than a sentence or two, or would benefit
from the use of structured markup, provide a brief description or label using the
alt attribute, and an associated text alternative.
Here's an example of a flowchart image, with a short text alternative
included in the alt attribute, in this case the text alternative is a description of the link target
as the image is the sole content of a link. The link points to a description, within the same document, of the
process represented in the flowchart.
...
...
Dealing with a broken lamp
Check if it's plugged in, if not, plug it in.
If it still doesn't work; check if the bulb is burned out. If it is, replace the bulb.
If it still doesn't work; buy a new lamp.
In this example, there is an image of a chart. It would be inappropriate to provide the information depicted in
the chart as a plain text alternative in an alt attribute as the information is a data set. Instead a
structured text alternative is provided below the image in the form of a data table using the data that is represented
in the chart image.
Indications of the highest and lowest rainfall for each season have been included in the
table, so trends easily identified in the chart are also available in the data table.
Average rainfall in millimetres by country and season.
United Kingdom
Japan
Australia
Spring
5.3 (highest)
2.4
2 (lowest)
Summer
4.5 (highest)
3.4
2 (lowest)
Autumn
3.5 (highest)
1.8
1.5 (lowest)
Winter
1.5 (highest)
1.2
1 (lowest)
Rainfall Data
Rainfall in millimetres by Country and Season.
UK
Japan
Australia
Spring
5.5 (highest)
2.4
2 (lowest)
Summer
4.5 (highest)
3.4
2 (lowest)
Autumn
3.5 (highest)
1.8
1.5 (lowest)
Winter
1.5 (highest)
1.2
1 lowest
The <{figure}> element is used to group the Bar Chart image and data table.
The <{figcaption}> element provides a caption for the grouped content.
For any of the examples in this section the details and summary
elements could be used so that the text descriptions for the images are only displayed on demand:
Dealing with a broken lamp
Check if it's plugged in, if not, plug it in.
If it still doesn't work; check if the bulb is burned out. If it is, replace the bulb.
If it still doesn't work; buy a new lamp.
The <{details}> and <{summary}> elements are not currently well supported by
browsers, until such times they are supported, if used, you will need to use scripting to
provide the functionality. There are a number of scripted Polyfills and scripted custom
controls available, in popular JavaScript UI widget libraries, which provide similar
functionality.
Images of text
Sometimes, an image only contains text, and the purpose of the image
is to display text using visual effects and /or fonts. It is strongly
recommended that text styled using CSS be used, but if this is not possible, provide
the same text in the alt attribute as is in the image.
This example shows an image of the text "Get Happy!" written in a fancy multi colored freehand
style. The image makes up the content of a heading. In this case the text alternative for the
image is "Get Happy!".
In this example we have an advertising image consisting of text, the phrase "The BIG sale" is
repeated 3 times, each time the text gets smaller and fainter, the last line reads "...ends Friday"
In the context of use, as an advertisement, it is recommended that the image's text alternative only include the text "The BIG sale"
once as the repetition is for visual effect and the repetition of the text for users who cannot view
the image is unnecessary and could be confusing.
In situations where there is also a photo or other graphic along with the image of text,
ensure that the words in the image text are included in the text alternative, along with any other description
of the image that conveys meaning to users who can view the image, so the information is also
available to users who cannot view the image.
When an image is used to represent a character that cannot otherwise be represented in Unicode,
for example gaiji, itaiji, or new characters such as novel currency symbols, the alternative text
should be a more conventional way of writing the same thing, e.g., using the phonetic hiragana or
katakana to give the character's pronunciation.
In this example from 1997, a new-fangled currency symbol that looks like a curly E with two
bars in the middle instead of one is represented using an image. The alternative text gives the
character's pronunciation.
Only 5.99!
Only 5.99!
An image should not be used if Unicode characters would serve an identical purpose. Only when
the text cannot be directly represented using Unicode, e.g., because of decorations or because the
character is not in the Unicode character set (as in the case of gaiji), would an image be
appropriate.
If an author is tempted to use an image because their default system font does not
support a given character, then Web Fonts are a better solution than images.
An illuminated manuscript might use graphics for some of its letters. The text alternative in
such a situation is just the character that the image represents.
nce upon a time and a long long time ago...
<p><img src="initials/fancyO.png" alt="O">nce upon a time and a long long time ago...
Where the design of the illuminated letter is important, the primary text alternative in
is the character that the image represents, and <{img/longdesc}> can provide a link to a more detailed description:
nce upon a time and a long long time ago...
<p><img src="initials/story-o.jpg" alt="O" longdesc="letters/story-0.html">nce
upon a time and a long long time ago...
Images that include text
Sometimes, an image consists of a graphics such as a chart and associated text.
In this case it is recommended that the text in the image is included in the text alternative.
Consider an image containing a pie chart and associated text. It is recommended wherever
possible to provide any associated text as text, not an image of text.
If this is not possible include the text in the text alternative along with the pertinent
information conveyed in the image.
<p><img src="figure1.gif" alt="Figure 1. Distribution of Articles by Journal Category.
Pie chart: Language=68%, Education=14% and Science=18%."></p>
Here's another example of the same pie chart image,
showing a short text alternative included in the alt attribute
and a longer text alternative in text. The figure and figcaption
elements are used to associate the longer text alternative with the image. The alt attribute is used
to label the image.
Figure 1. Distribution of Articles by Journal Category. Pie chart: Language=68%, Education=14% and Science=18%.
The advantage of this method over the previous example is that the text alternative
is available to all users at all times. It also allows structured mark up to be used in the text
alternative, where as a text alternative provided using the alt attribute does not.
Images that enhance the themes or subject matter of the page content
An image that isn't discussed directly by the surrounding text but still has some relevance can
be included in a page using the <{img}> element. Such images are more than mere decoration, they
may augment the themes or subject matter of the page content and so still form part of the
content. In these cases, it is recommended that a text alternative be provided.
Here is an example of an image closely related to the subject matter of the page content
but not directly discussed. An image of a painting inspired by a poem, on a page reciting that poem.
The following snippet shows an example. The image is a painting titled the "Lady of Shallot", it is
inspired by the poem and its subject matter is derived from the poem. Therefore it is strongly
recommended that a text alternative is provided. There is a short description of the content of
the image in the alt attribute and
a link below the image to a longer description located at the bottom of the document. At the end
of the longer description there is also a link to further information about the painting.
<header>
<h1>The Lady of Shalott</h1>
<p>A poem by Alfred Lord Tennyson</p>
</header>
<img src="shalott.jpg" alt="Painting - a young woman with long hair, sitting in a wooden boat. Full description below." longdesc="#des">
<p><a href="#des">Description of the painting</a>.</p>
<!-- Full Recitation of Alfred, Lord Tennyson's Poem. -->
...
...
...
<p id="des">The woman in the painting is wearing a flowing white dress. A large piece of intricately patterned fabric is draped over the side. In her right hand she holds the chain mooring the boat. Her expression is mournful. She stares at a crucifix lying in front of her. Beside it are three candles. Two have blown out.
<a href="https://www.tate.org.uk/art/artworks/waterhouse-the-lady-of-shalott-n01543">Further information about the painting</a>.</p>
This example illustrates the provision of a text alternative identifying an image as a
photo of the main subject of a page.
Robin Berjon
What more needs to be said?
It is not always easy to write a useful text alternative for an image, another option is to provide a link to a
description or further information about the image when one is available.
In this example of the same image, there is a short text alternative included in the alt attribute, and
there is a link after the image. The link points to a page containing information about the painting.
The Lady of Shalott
A poem by Alfred Lord Tennyson.
About this paintingFull recitation of Alfred, Lord Tennyson's poem.
A graphical representation of some of the surrounding text
In many cases, the image is actually just supplementary, and its presence merely reinforces the
surrounding text. In these cases, the alt attribute must be
present but its value must be unset or the empty string.
In general, an image falls into this category if removing the image doesn't make the page any
less useful, but including the image makes it a lot easier for users of visual browsers to
understand the concept.
This example includes a screenshot of part of a text editor with the file described in
the instruction, displayed:
In the text file, add SleepMode=1 under [options], then save and close.
In the text file, add SleepMode=1 under [options], then save and close.
A purely decorative image that doesn't add any information
Purely decorative images are visual enhancements, decorations or embellishments that provide no
function or information beyond aesthetics to users who can view the images.
Mark up purely decorative images so they can be ignored by assistive technology by using an
alt attribute with no value (e.g. alt="" or alt with
no set value). While it is not unacceptable to include decorative images inline,
it is recommended if they are purely decorative to include the image using CSS.
Here's an example of an image being used as a decorative banner for a person's blog,
the image offers no information and so an alt attribute with no set value
is used.
Clara's Blog
Welcome to my blog...
Clara's Blog
Welcome to my blog...
Inline images
When images are used inline as part of the flow of text in a sentence, provide a word or
phrase as a text alternative which makes sense in the context of the sentence it is a part of.
I you.
I you.
My breaks.
My breaks.
A group of images that form a single larger picture with no links
When a picture has been sliced into smaller image files that are then displayed
together to form the complete picture again, include a text alternative for one
of the images using the alt attribute as per the relevant
guidance for the picture as a whole, and then include an empty alt
attribute on the other images.
In this example, a picture representing a company logo for the PIP Corporation
has been split into two pieces, the first containing the letters "PIP" and the second with
the word "CO". The text alternative PIP CO is in the alt attribute of the first image.
In the following example, a rating is shown as three filled
stars and two empty stars. While the text alternative could have
been "★★★☆☆", the author has
instead decided to more helpfully give the rating in the form "3
out of 5". That is the text alternative of the first image, and the
rest have empty alt attributes.
Rating:
Image maps
If an <{img}> element has a usemap attribute which references a <{map}> element containing
<{area}> elements that have href attributes, the img is considered to be interactive content.
In such cases, always provide a text alternative for the image using the alt attribute.
Consider the following image which is a map of Katoomba,
it has 2 interactive regions corresponding to the areas of North and South Katoomba:
The text alternative is a brief description of the image. The alt attribute on each
of the <{area}> elements provides text describing the content of the target page of each linked region:
<p>View houses for sale in North Katoomba or South Katoomba:</p>
<p><img src="imagemap.png" width="209" alt="Map of Katoomba" height="249" usemap="#Map">
<map name="Map">
<area shape="poly" coords="78,124,124,10,189,29,173,93,168,132,136,151,110,130"
href="north.html" alt="Houses in North Katoomba">
<area shape="poly" coords="66,63,80,135,106,138,137,154,167,137,175,133,144,240,49,223,17,137,17,61"
alt="Houses in South Katoomba" href="south.html">
</map>
A group of images that form a single larger picture with links
Sometimes, when you create a composite picture from multiple images, you may wish to
link one or more of the images. Provide an alt attribute
for each linked image to describe the purpose of the link.
In the following example, a composite picture is used to represent a "crocoduck"; a fictional creature which
defies evolutionary principles by being part crocodile and part duck. You are asked to interact with the
crocoduck, but you need to exercise caution...
The crocoduck
You encounter a strange creature called a "crocoduck".
The creature seems angry! Perhaps some friendly stroking will help to calm
it, but be careful not to stroke any crocodile parts. This would just enrage
the beast further.
Images of Pictures
Images of pictures or graphics include visual representations of objects, people, scenes, abstractions, etc.
This non-text content, [[WCAG20]] can convey a significant amount of
information visually or provide a specific sensory experience, [[WCAG20]] to
a sighted person. Examples include photographs, paintings, drawings and artwork.
An appropriate text alternative for a picture is a brief description, or name [[WCAG20]]. As in all text alternative authoring decisions, writing suitable text alternatives for pictures requires
human judgment. The text value is subjective to the context where the image is used and the page author's writing style. Therefore,
there is no single "right" or "correct" piece of alt text for any particular image. In addition to providing a short text
alternative that gives a brief description of the non-text content, also providing supplemental content through another means when
appropriate may be useful.
This first example shows an image uploaded to a photo-sharing site. The photo is of a cat, sitting in the bath. The image has a
text alternative provided using the <{img}> element's alt attribute. It also has a caption provided by including
the <{img}> element in a <{figure}> element and using a <{figcaption}> element to identify the caption text.
Lola prefers a bath to a shower.
Lola prefers a bath to a shower.
This example is of an image that defies a complete description, as the subject of the image is open to interpretation.
The image has a text alternative in the alt attribute which gives users who cannot view the image a sense
of what the image is. It also has a caption provided by including the <{img}> element in a figure
element and using a <{figcaption}> element to identify the caption text.
The first of the ten cards in the Rorschach test.
The first of the ten cards in the Rorschach test.
Webcam images
Webcam images are static images that are automatically updated periodically. Typically the images are
from a fixed viewpoint, the images may update on the page automatically as each new image is uploaded from
the camera or the user may be required to refresh the page to view an updated image. Examples include traffic
and weather cameras.
This example is fairly typical; the title and a time stamp are included in the image, automatically generated
by the webcam software. It would be better if the text information was not included in the image, but as it is part
of the image, include it as part of the text alternative. A caption is also provided using the <{figure}>
and <{figcaption}> elements. As the image is provided to give a visual indication of the current weather near a building,
a link to a local weather forecast is provided, as with automatically generated and uploaded webcam images it may be impractical to
provide such information as a text alternative.
The text of the alt attribute includes a prose version of the timestamp, designed to make the text more
understandable when announced by text to speech software. The text alternative also includes a description of some aspects
of what can be seen in the image which are unchanging, although weather conditions and time of day change.
View from the top of Sopwith house, looking towards North Kingston. This image is updated every hour.
View the latest weather details for Kingston upon Thames.
<figure>
<img src="webcam1.jpg" alt="Sopwith house weather cam. Taken on the 21/04/10 at 11:51 and 34 seconds. In the foreground are the safety rails on the flat part of the roof. Nearby there are low rize industrial buildings, beyond are blocks of flats. In the distance there's a church steeple.">
<figcaption>View from Sopwith house, looking towards north Kingston. This image is updated every hour.</figcaption>
</figure>
<p>View the <a href="https://www.bbc.co.uk/weather/0/6690829">latest weather details</a> for Kingston upon Thames.</p>
When a text alternative is not available at the time of publication
In some cases an image is included in a published document, but the author is unable to provide an appropriate text alternative.
In such cases the minimum requirement is to provide a caption for the image using the figure and figcaption
elements under the following conditions:
The <{img}> element is in a <{figure}> element
The <{figure}> element contains a <{figcaption}> element
The <{figcaption}> element contains content other than inter-element white space
Ignoring the <{figcaption}> element and its descendants, the figure
element has no {{Text}} node descendants other than inter-element white space, and no
embedded content descendant other than the <{img}> element.
In other words, the only content of the figure is an <{img}> element and a figcaption
element, and the <{figcaption}> element must include (caption) content.
Such cases are to be kept to an absolute
minimum. If there is even the slightest possibility of the author
having the ability to provide real alternative text, then it would
not be acceptable to omit the alt
attribute.
In this example, a person uploads a photo, as part of a bulk upload of many images, to a photo sharing site. The user has not
provided a text alternative or a caption for the image. The site's authoring tool inserts a caption automatically using whatever useful
information it has for the image. In this case it's the file name and date the photo was taken.
The caption text in the example below is not a suitable text alternative and is
not conforming to the Web Accessibility Guidelines 2.0. [[WCAG20]]
clara.jpg, taken on 12/11/2010.clara.jpg, taken on 12/11/2010.
Notice that even in this example, as much useful information as possible is
still included in the <{figcaption}> element.
In this second example, a person uploads a photo to a photo sharing site. She has provided
a caption for the image but not a text alternative. This may be because the site does not provide users with the ability
to add a text alternative in the alt attribute.
Eloisa with Princess BelleEloisa with Princess Belle
Sometimes the entire point of the image is that a textual
description is not available, and the user is to provide the
description. For example, software that displays images and
asks for alternative text precisely for the purpose of then
writing a page with correct alternative text. Such a page could
have a table of images, like this:
Image
Description
Image 640 by 100, filename 'banner.gif'
Image 200 by 480, filename 'ad3.gif'
Since some users cannot use images at all (e.g., because they are blind) the
alt attribute is only allowed to be omitted when no text
alternative is available and none can be made available, as in the above examples.
An image not intended for the user
Generally authors should avoid using <{img}> elements
for purposes other than showing images.
If an <{img}> element is being used for purposes other
than showing an image, e.g., as part of a service to count page
views, use an empty alt attribute.
An example of an <{img}> element used to collect web page statistics.
The alt attribute is empty as the image has no meaning.
It is recommended for the example use above the width and
height attributes be set to zero.
Another example use is when an image such as a spacer.gif is used to aid positioning of content.
The alt attribute is empty as the image has no meaning.
It is recommended that CSS be used to position content instead of <{img}> elements.
Icon Images
An icon is usually a simple picture representing a program, action, data file or a concept.
Icons are intended to help users of visual browsers to recognize features at a glance.
Use an empty alt attribute when an icon is supplemental to
text conveying the same meaning.
In this example, we have a link pointing to a site's home page, the link contains a
house icon image and the text "home". The image has an empty alt text.
Home
Where images are used in this way, it would also be appropriate to add the image using CSS.
#home:before {
content: url(home.png);
}
Home
In this example, there is a warning message, with a warning icon. The word "Warning!" is in emphasized
text next to the icon. As the information conveyed by the icon is redundant the <{img}> element is given an empty alt attribute.
Warning! Your session is about to expire.
Warning! Your session is about to expire.
When an icon conveys additional information not available in text, provide a text alternative.
In this example, there is a warning message, with a warning icon. The icon emphasizes the
importance of the message and identifies it as a particular type of content.
Your session is about to expire.
Your session is about to expire.
Logos, insignia, flags, or emblems
Many pages include logos, insignia, flags, or emblems, which stand for a company, organization, project,
band, software package, country, or other entity. What can be considered as an appropriate text alternative depends upon,
like all images, the context in which the image is being used and what function it serves in the given context.
If a logo is the sole content of a link, provide a brief description of the link target in the alt attribute.
This example illustrates the use of the HTML5 logo as the sole content of a link to the HTML specification.
If a logo is being used to represent the entity, e.g., as a page heading, provide the name of the
entity being represented by the logo as the text alternative.
This example illustrates the use of the WebPlatform.org logo being used to represent itself.
and other developer resources
and other developer resources
The text alternative in the example above could also include the word "logo" to describe the
type of image content. If so, it is suggested that square brackets be used to delineate this
information: alt="[logo] WebPlatform.org".
If a logo is being used next to the name of the what that it represents, then the logo is
supplemental. Include an empty alt attribute as the text alternative is already
provided.
This example illustrates the use of a logo next to the name of the organization it represents.
WebPlatform.org
WebPlatform.org
If the logo is used alongside text discussing the subject or entity the logo represents, then
provide a text alternative which describes the logo.
This example illustrates the use of a logo next to text discussing the subject the logo
represents.
HTML is a language for structuring and presenting content for the World Wide
Web, a core technology of the Internet. HTML5 is the latest revision of the HTML specification
which was originally created in 1990. Its core aims are to improve the language with support for
the latest multimedia while keeping it easily readable by humans and consistently understood
by computers and devices.
HTML is a language for structuring and presenting content...
CAPTCHA Images
CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart".
CAPTCHA images are used for security purposes to confirm that content is being accessed by a
person rather than a computer. This authentication is done through visual verification of an
image. CAPTCHA typically presents an image with characters or words in it that the user is to
re-type. The image is usually distorted and has some noise applied to it to make the characters
difficult to read.
To improve the accessibility of CAPTCHA provide text alternatives that identify and describe the
purpose of the image, and provide alternative forms of the CAPTCHA using output modes for
different types of sensory perception. For instance provide an audio alternative along with the
visual image. Place the audio option right next to the visual one. This helps but is still
problematic for people without sound cards, the deaf-blind, and some people with limited hearing.
Another method is to include a form that asks a question along with the visual image. This helps
but can be problematic for people with cognitive impairments.
It is strongly recommended that alternatives to CAPTCHA be used, as all forms of CAPTCHA
introduce unacceptable barriers to entry for users with disabilities. Further information is
available in Inaccessibility of CAPTCHA.
This example shows a CAPTCHA test which uses a distorted image of text. The text alternative in
the alt attribute provides instructions for a user in the case where she cannot
access the image content.
Example code:
An image in a <{picture}> element
The <{picture}> element and any <{source}> elements it contains have no semantics for users,
only the <{img}> element or its text alternative is displayed to users. Provide a text alternative for an
<{img}> element without regard to it being within a <{picture}> element. Refer to
Requirements for providing text to act as an alternative for images for more information on how to provide
useful alt text for images.
Art directed images that rely on picture need to depict
the same content (irrespective of size, pixel density, or any other discriminating
factor). Therefore the appropriate text alternative for an image will always be the
same irrespective of which source file ends up being chosen by the browser.
Is it a ghost?
The large and small versions (both versions are displayed for demonstration purposes) of
the image portray the same scene: Reflection of a girls face in a train window,
while the small version (displayed on smaller screens) is cropped, this does not effect the subject matter
or the appropriateness of the alt text.
Guidance for markup generators
Wherever possible, markup generators (such as WYSIWYG authoring tools) should obtain
alternative text from their users. However, it is recognized that in many cases obtaining
alternative text from users may not be possible.
For images that are the sole contents of links, markup generators should examine the link
target to determine the title of the target, or the URL of the target, and use information
obtained in this manner as the alternative text.
For images that have captions, markup generators should use the
<{figure}> and <{figcaption}> elements to provide the image's caption.
As a last resort, implementors
must omit the alt attribute altogether.
[[!ATAG20]]
Setting an <{image}> element's <{img/alt}> attribute to the empty string means the image in
question provides no essential information. Assistive technology, such as screen readers,
will typically ignore the presence of such an image, as ignoring it will not stop the user
from understanding the document, and saves the user time.
In the following example, the <{img/alt}> is set to the empty string, so it will be silently
ignored by screen readers.
Unless the image is truly decorative, implementors setting an image's alt
attribute to the empty string is not recommended. An empty string will indicate to content
management software and accessibility checking software that the image does not need
alternative text, and therefore they will not flag the image in question as a potential
problem, meaning it is less likely to be repaired.
In contrast, if an image is vital to understanding the contents of a document, implementors
should be aware that an image without an <{img/alt}> attribute is preferable to an
<{img/alt}> attribute set to the empty string.
Taking the previous example and omitting the alt attribute, as follows:
will allow screen readers to announce the filename of the image (bar-graph.png). Doing so may
allow the user to try to learn what the image conveys, e.g. from its announced filename,
asking a friend to describe it, running Optical Character Recognition to see if the image
contains text, or submitting it to an image search to get more information, etc.
(None of these approaches are as good as giving the image correct alternative text).
Markup generators may specify a generator-unable-to-provide-required-alt
attribute on <{img}> elements for which they have been unable to obtain a text alternative and
for which they have therefore omitted the alt attribute. The value of this
attribute must be the empty string. Documents containing such attributes are not conforming,
but conformance checkers will silently ignore this error.
This is intended to avoid markup generators from being pressured into replacing the error of
omitting the alt attribute with the even more egregious error of providing phony
text alternatives, because state-of-the-art automated conformance checkers cannot distinguish
phony text alternatives from correct text alternatives.
Markup generators should generally avoid using the image's own
file name as the text alternative. Similarly, markup generators
should avoid generating text alternatives from any content that will
be equally available to presentation user agents (e.g., Web
browsers).
This is because once a page is generated, it will
typically not be updated, whereas the browsers that later read the
page can be updated by the user, therefore the browser is likely to
have more up-to-date and finely-tuned heuristics than the markup
generator did when generating the page.
Guidance for conformance checkers
A conformance checker must report the lack of an alt attribute as an error unless one
of the conditions listed below applies:
* The <{img}> element is in a <{figure}> element that satisfies
the conditions described above.
* The <{img}> element has a (non-conforming) generator-unable-to-provide-required-alt
attribute whose value is the empty string. A conformance checker that is not reporting the lack
of an alt attribute as an error must also not report the presence of the empty
generator-unable-to-provide-required-alt attribute as an error. (This case does not
represent a case where the document is conforming, only that the generator could not determine
appropriate alternative text — validators are not required to show an error in this case,
because such an error might encourage markup generators to include bogus alternative text purely
in an attempt to silence validators. Naturally, conformance checkers may report the
lack of an alt attribute as an error even in the presence of the
generator-unable-to-provide-required-alt attribute; for example, there could be a
user option to report all conformance errors even those that might be the more or less
inevitable result of using a markup generator.)
Any number of [=comments=] and [=space characters=].
Here a blog uses the srcdoc attribute in conjunction
with the <{iframe/sandbox}> attributes described below to provide users of user
agents that support this feature with an extra layer of protection from script injection in the
blog post comments:
I got my own magazine!
After much effort, I've finally found a publisher, and so now I have my own magazine! Isn't that awesome?! The first issue will come out in September, and we have articles about getting food, and about getting in boxes, it's going to be great!
Notice the way that quotes have to be escaped (otherwise the srcdoc attribute would end prematurely), and the way raw
ampersands (e.g., in URLs or in prose) mentioned in the sandboxed content have to be
doubly escaped — once so that the ampersand is preserved when originally parsing
the srcdoc attribute, and once more to prevent the
ampersand from being misinterpreted when parsing the sandboxed content.
Furthermore, notice that since the [=DOCTYPE=] is optional in
`iframe` `srcdoc` documents, and the <{html}>,
<{head}>, and <{body}> elements have optional
start and end tags, and the <{title}> element is also optional in `iframe` `srcdoc` documents, the markup in a srcdoc attribute can be
relatively succinct despite representing an entire document, since only the contents of the
<{body}> element need appear literally in the syntax. The other elements are still
present, but only by implication.
In the HTML syntax, authors need only remember to use U+0022
QUOTATION MARK characters (") to wrap the attribute contents and then to escape all U+0026
AMPERSAND (&) and U+0022 QUOTATION MARK (") characters, and to specify the
<{iframe/sandbox}> attribute, to ensure safe embedding of content.
Due to restrictions of the XHTML syntax, in XML
the U+003C LESS-THAN SIGN character (<) needs to be escaped as well.
In order to prevent attribute-value normalization, some of XML's
white space characters — specifically U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED
(LF), and U+000D CARRIAGE RETURN (CR) — also need to be escaped. [[!XML]]
If the src attribute and the srcdoc attribute are both specified
together, the srcdoc attribute takes priority. This allows authors to provide
a fallback [=url/URL=] for legacy user agents that do not support the
srcdoc attribute.
Whenever an <{iframe}> element with a nested browsing context has its
srcdoc attribute set, changed, or removed, the user agent
must process the iframe attributes.
Similarly, whenever an <{iframe}> element with a nested browsing context
but with no srcdoc attribute specified has its src attribute set, changed, or removed, the user agent must
process the iframe attributes.
When the user agent is to process the iframe attributes, it must run
the first appropriate steps from the following list:
A load event is also fired at the <{iframe}> element when it is created if no
other data is loaded in it.
Each {{Document}} has an iframe load in progress flag and a
mute iframe load flag. When a {{Document}} is created, these flags must be unset
for that {{Document}}.
The iframe load event steps are as follows:
This, in conjunction with scripting, can be used to probe the URL space of the
local network's HTTP servers. User agents may implement cross-origin
access control policies that are stricter than those described above to mitigate this attack,
but unfortunately such policies are typically not compatible with existing Web content.
If, when the element is created, the srcdoc attribute is not set, and the
src attribute is either also not set or set but its value cannot be
parsed, the browsing context will remain at the initial about:blank page.
If the user navigates away from this page, the iframe's corresponding
WindowProxy object will proxy new Window objects for new
{{Document}} objects, but the src attribute will not change.
The name attribute, if present, must be a
valid browsing context name. The given value is used to name the
nested browsing context.
When the browsing context is created, if the attribute is present, the
browsing context name must be set to the value of this attribute;
otherwise, the browsing context name must be set to the empty string.
Whenever the name attribute is set, the nested browsing context's
name must be changed to the new value. If the attribute
is removed, the browsing context name must be set to the empty string.
The sandbox attribute, when specified,
enables a set of extra restrictions on any content hosted by the <{iframe}>. Its value
must be an unordered set of unique space-separated tokens that are ASCII
case-insensitive. The allowed values are allow-forms, allow-pointer-lock, allow-popups, allow-presentation, allow-same-origin, allow-scripts, and allow-top-navigation.
When the attribute is set, the content is treated as being from a unique [=concept/origin=],
forms, scripts, and various potentially annoying APIs are disabled, links are prevented from
targeting other browsing contexts, and plugins are secured.
The allow-same-origin keyword causes
the content to be treated as being from its real origin instead of forcing it into a unique
origin; the allow-top-navigation
keyword allows the content to navigate its top-level browsing context;
and the allow-forms, allow-pointer-lock, allow-popups,
allow-presentation and allow-scripts keywords re-enable forms, the
pointer lock API, popups, the presentation API, and scripts respectively. [[!POINTERLOCK]] [[!PRESENTATION-API]]
Setting both the allow-scripts and allow-same-origin keywords together when the
embedded page has the same origin as the page containing the iframe
allows the embedded page to simply remove the <{iframe/sandbox}>
attribute and then reload itself, effectively breaking out of the sandbox altogether.
These flags only take effect when the nested browsing context of
the iframe is navigated. Removing them, or removing the
entire <{iframe/sandbox}> attribute, has no effect on an
already-loaded page.
Potentially hostile files should not be served from the same server as the file
containing the <{iframe}> element. Sandboxing hostile content is of minimal help if an
attacker can convince the user to just visit the hostile content directly, rather than in the
<{iframe}>. To limit the damage that can be caused by hostile HTML content, it should be
served from a separate dedicated domain. Using a different domain ensures that scripts in the
files are unable to attack the site, even if the user is tricked into visiting those pages
directly, without the protection of the <{iframe/sandbox}>
attribute.
When an <{iframe}> element with a <{iframe/sandbox}>
attribute has its nested browsing context created (before the initial
about:blank {{Document}} is created), and when an iframe
element's <{iframe/sandbox}> attribute is set or changed while it
has a nested browsing context, the user agent must parse the sandboxing directive using the attribute's value as the input, the <{iframe}> element's nested browsing context's
<{iframe}> sandboxing flag set as the output, and, if the
iframe has an allowfullscreen
attribute, the allow fullscreen flag.
When an <{iframe}> element's <{iframe/sandbox}>
attribute is removed while it has a nested browsing context, the user agent must
empty the <{iframe}> element's nested browsing context's
<{iframe}> sandboxing flag set as the output.
In this example, some completely-unknown, potentially hostile, user-provided HTML content is
embedded in a page. Because it is served from a separate domain, it is affected by all the normal
cross-site restrictions. In addition, the embedded page has scripting disabled, plugins disabled,
forms disabled, and it cannot navigate any frames or windows other than itself (or any frames or
windows it itself embeds).
We're not scared of you! Here is your content, unedited:
It is important to use a separate domain so that if the attacker convinces the
user to visit that page directly, the page doesn't run in the context of the site's origin,
which would make the user vulnerable to any attack found in the page.
In this example, a gadget from another site is embedded. The gadget has scripting and forms
enabled, and the origin sandbox restrictions are lifted, allowing the gadget to communicate
with its originating server. The sandbox is still useful, however, as it disables plugins
and popups, thus reducing the risk of the user being exposed to malware and other annoyances.
;
Suppose a file A contained the following fragment:
Suppose that file B contained an iframe also:
Further, suppose that file C contained a link:
Link
For this example, suppose all the files were served as [[#text-html|text/html]].
Page C in this scenario has all the sandboxing flags set. Scripts are disabled, because the
iframe in A has scripts disabled, and this overrides the
allow-scripts keyword set on the iframe in B. Forms are also
disabled, because the inner iframe (in B) does not have the
allow-forms keyword set.
Suppose now that a script in A removes all the <{iframe/sandbox}> attributes in A and B.
This would change nothing immediately. If the user clicked the link in C, loading page D into
the iframe in B, page D would now act as if the iframe in B had the
allow-same-origin and allow-forms keywords set, because that was the
state of the nested browsing context in the iframe in A when page B was
loaded.
Generally speaking, dynamically removing or changing the <{iframe/sandbox}> attribute is
ill-advised, because it can make it quite hard to reason about what will be allowed and
what will not.
The allowfullscreen attribute is a
boolean attribute. When specified, it indicates that {{Document}} objects in
the <{iframe}> element's browsing context are to be allowed to userequestFullscreen() (if it's not blocked for other
reasons, e.g., there is another ancestor iframe without this attribute set).
Here, an iframe is used to embed a player from a video site. The allowfullscreen attribute is needed to enable the
player to show its video fullscreen.
Check out my new video!
The allowpaymentrequest
attribute is a boolean attribute. When specified, it indicates that
{{Document}} objects in the <{iframe}> element's browsing context
are to be allowed to use the PaymentRequest interface
to make payment requests.
The allowusermedia attribute
is a boolean attribute. When specified, it indicates that {{Document}} objects
in the <{iframe}> element's browsing context are to be allowed to use
{{Element/getUserMedia()}} (if it's not blocked for other reasons, e.g. there
is another ancestor <{iframe}> without this attribute set).
To determine whether a {{Document}} object document is
allowed to use the feature indicated by attribute name
allowattribute, run these steps:
The <{iframe}> element supports dimension attributes for cases where the
embedded content has specific dimensions (e.g., ad units have well-defined dimensions).
An <{iframe}> element never has fallback content, as it will always
create a nested browsing context, regardless of whether the specified initial
contents are successfully used.
The referrerpolicy attribute is a
referrer policy attribute.
Its purpose is to set the referrer policy used when
processing the iframe attributes. [[!REFERRERPOLICY]]
Descendants of <{iframe}> elements represent nothing. (In legacy user agents that do
not support <{iframe}> elements, the contents would be parsed as markup that could act as
fallback content.)
When used in HTML documents, the allowed content model
of <{iframe}> elements is text, except that invoking the HTML fragment parsing
algorithm with the <{iframe}> element as the context element and the text contents as
the input must result in a list of nodes that are all phrasing content,
with no [=parse errors=] having occurred, with no script
elements being anywhere in the list or as descendants of elements in the list, and with all the
elements in the list (including their descendants) being themselves conforming.
The <{iframe}> element must be empty in XML documents.
The HTML parser treats markup inside <{iframe}> elements as text.
The IDL attributes src, srcdoc, name, and sandbox must reflect the respective
content attributes of the same name.
The supported tokens for <{iframe/sandbox}>'s {{DOMTokenList}} are the
allowed values defined in the <{iframe/sandbox}> attribute and supported by the user agent.
The allowFullscreen IDL attribute
must reflect the allowfullscreen
content attribute.
The allowPaymentRequest IDL
attribute must reflect the allowpaymentrequest content attribute.
The allowUserMedia IDL
attribute must reflect the allowusermedia content attribute.
The referrerPolicy IDL attribute must
reflect the referrerpolicy content attribute, limited to only known values.
The contentDocument IDL attribute must
return the {{Document}} object of the active document of the <{iframe}> element's
nested browsing context, if any and if its [=concept/origin=] is the
same origin-domain as the [=concept/origin=] specified by the
incumbent settings object, or null otherwise.
The contentWindow IDL attribute must
return the WindowProxy object of the <{iframe}> element's nested browsing context, if any, or null otherwise.
Here is an example of a page using an iframe to include advertising from an
advertising broker:
Depending on the type of content instantiated by the <{embed}> element, the node may also
support other interfaces.
The <{embed}> element provides an integration point for an external (typically
non-HTML) application or interactive content.
The src attribute gives the address of the
resource being embedded. The attribute, if present, must contain a valid non-empty URL
potentially surrounded by spaces.
The type attribute, if present, gives the
MIME type by which the plugin to instantiate is selected. The value must be a
valid mime type. If both the type attribute and
the src attribute are present, then the type attribute must specify the same type as the explicit Content-Type metadata of the resource given by the src attribute.
While any of the following conditions are occurring, any plugin instantiated for
the element must be removed, and the <{embed}> element represents
nothing:
The element has neither a src attribute nor a type attribute.
The user agent must parse the value of the element's
src attribute, relative to the element. If that is
successful, the user agent should run these steps:
Determine the type of the content being embedded, as
follows (stopping at the first substep that determines the type):
If the element has a type attribute, and that
attribute's value is a type that a plugin supports, then the value of the
type attribute is the content's type.
Otherwise, if applying the URL parser algorithm to the [=url/URL=] of
the specified resource (after any redirects) results in a URL record whose
path component matches a pattern that a
plugin supports, then the content's type is the type that the plugin can handle.
For example, a plugin might say that it can handle resources with path components that end with the four character string
".swf".
Otherwise, find and instantiate an appropriate plugin based on the content's type, and hand that plugin the
content of the resource, replacing any previously instantiated plugin for the element. The
<{embed}> element now represents this plugin instance.
Whether the resource is fetched successfully or not (e.g., whether the response status was
an ok status) must be ignored when determining the content's type and when handing
the resource to the plugin.
This allows servers to return data for plugins even with error responses (e.g.,
HTTP 500 Internal Server Error codes can still contain plugin data).
The user agent should find and instantiate an appropriate plugin based on the
value of the type attribute. The embed
element now represents this plugin instance.
Once the plugin is completely loaded, queue a task to fire a simple
event named load at the element.
The <{embed}> element has no fallback content. If the user agent can't
find a suitable plugin when attempting to find and instantiate one for the algorithm above, then
the user agent must use a default plugin. This default could be as simple as saying "Unsupported
Format".
Whenever an <{embed}> element that was potentially
active stops being potentially active, any
plugin that had been instantiated for that element must be unloaded.
When a plugin is to be instantiated but it cannot be secured and the sandboxed plugins browsing context
flag is set on the <{embed}> element's node document's active
sandboxing flag set, then the user agent must not instantiate the plugin, and
must instead render the <{embed}> element in a manner that conveys that the
plugin was disabled. The user agent may offer the user the option to override the
sandbox and instantiate the plugin anyway; if the user invokes such an option, the
user agent must act as if the conditions above did not apply for the purposes of this element.
Plugins that cannot be secured are
disabled in sandboxed browsing contexts because they might not honor the restrictions imposed by
the sandbox (e.g., they might allow scripting even when scripting in the sandbox is disabled). User
agents should convey the danger of overriding the sandbox to the user if an option to do so is
provided.
All attributes in HTML documents get lowercased automatically, so the
restriction on uppercase letters doesn't affect such documents.
The four exceptions are to exclude legacy attributes that have side-effects beyond
just sending parameters to the plugin.
The user agent should pass the names and values of all the attributes of the embed
element that have no namespace to the plugin used, when one is instantiated.
The HTMLEmbedElement object representing the element must expose the scriptable
interface of the plugin instantiated for the <{embed}> element, if any. At a
minimum, this interface must implement the legacy caller
operation. (It is suggested that the default behavior of this legacy caller operation, e.g.,
the behavior of the default plugin's legacy caller operation, be to throw a
{{NotSupportedError}} exception.)
The <{embed}> element supports dimension attributes.
The IDL attributes src and
type each must reflect the
respective content attributes of the same name.
Here's a way to embed a resource that requires a proprietary plugin, like Flash:
If the user does not have the plugin (for example if the plugin vendor doesn't support the
user's platform), then the user will be unable to use the resource.
To pass the plugin a parameter "quality" with the value "high", an attribute can be
specified:
This would be equivalent to the following, when using an <{object}> element
instead:
Depending on the type of content instantiated by the <{object}> element, the node also
supports other interfaces.
The <{object}> element can represent an external resource, which, depending on the
type of the resource, will either be treated as an image, as a nested browsing context, or as an external resource to be processed by a plugin.
The data attribute, if present, specifies the
address of the resource. If present, the attribute must be a
valid non-empty URL potentially surrounded by spaces.
Authors who reference resources from other origins
that they do not trust are urged to use the typemustmatch attribute defined below. Without that
attribute, it is possible in certain cases for an attacker on the remote host to use the plugin
mechanism to run arbitrary scripts, even if the author has used features such as the Flash
"allowScriptAccess" parameter.
The type attribute, if present, specifies the
type of the resource. If present, the attribute must be a valid mime type.
At least one of either the data attribute or the type attribute must be present.
The typemustmatch attribute is a
boolean attribute whose presence indicates that the resource specified by the data attribute is only to be used if the value of the type attribute and the Content-Type of the
aforementioned resource match.
The typemustmatch attribute must not be
specified unless both the data attribute and the type attribute are present.
The name attribute, if present, must be a
valid browsing context name. The value is used to name the nested browsing context created for the object, if applicable.
This allows links to use the object's browsing context as a target:
Clicking the link "Check out the new icon" in the example below would cause the new.svg to
be loaded in place of the original old.svg.
If a nested browsing context is not created, e.g. for security reasons or due to
incorrect implementation of the <{object/name3}> attribute, the target attribute
in this example will instead create a new browsing context - typically a new tab -
whose browsing context name is the attribute's value, and the resource that the link
references will be loaded there.
one of the element's ancestor <{object}> elements changes to or from showing its
fallback content,
the element's classid attribute is set, changed, or
removed,
the element's classid attribute is not present, and
its data attribute is set, changed, or removed,
neither the element's classid attribute nor its
data attribute are present, and its type attribute is set, changed, or removed,
the element changes from being rendered to not being rendered, or vice versa,
...the user agent must queue a task to run the following steps to (re)determine
what the <{object}> element represents. This task
being queued or actively running must delay the load
event of the element's node document.
If the user has indicated a preference that this <{object}> element's
fallback content be shown instead of the element's usual behavior, then jump to
the step below labeled fallback.
For example, a user could ask for the element's fallback content to
be shown because that content uses a format that the user finds more accessible.
If the classid attribute is present, and has a
value that isn't the empty string, then: if the user agent can find a plugin
suitable according to the value of the classid
attribute, and either plugins aren't being sandboxed or that
plugin can be secured, then that
pluginshould be used, and the value of the data attribute, if any, should be passed to the
plugin. If no suitable plugin can be found, or if the
plugin reports an error, jump to the step below labeled fallback.
If the data attribute is present and its value is not the empty string, then:
If the type attribute is present and its value is
not a type that the user agent supports, and is not a type that the user agent can find a
plugin for, then the user agent may jump to the step below labeled fallback
without fetching the content to examine its real type.
Parse the [=url/URL=] specified by the data attribute, relative to the element.
If that failed, fire a simple event named error at the element,
then jump to the step below labeled fallback.
If the resource is not yet available (e.g., because the resource was not available in the
cache, so that loading the resource required making a request over the network), then jump to
the step below labeled fallback. The task that is
queued by the networking task source once the
resource is available must restart this algorithm from this step. Resources can load
incrementally; user agents may opt to consider a resource "available" whenever enough data
has been obtained to begin processing the resource.
If the load failed (e.g., there was an HTTP 404 error, there was a DNS error), fire
a simple event named error at the element, then jump to
the step below labeled fallback.
If the <{object}> element has a typemustmatch attribute, jump to the step
below labeled handler.
If the user agent is configured to strictly obey Content-Type headers for this resource,
and the resource has associated Content-Type metadata,
then let the resource type be the type specified in
the resource's Content-Type metadata, and jump to the step below
labeled handler.
This can introduce a vulnerability, wherein a site is trying to embed a
resource that uses a particular plugin, but the remote site overrides that and instead
furnishes the user agent with a resource that triggers a different plugin with different
security characteristics.
If there is a type attribute present on the <{object}> element, and that
attribute's value is not a type that the user agent supports, but it is a type
that a plugin supports, then let the resource type be the type specified
in that type attribute, and jump to the step below labeled handler.
Run the appropriate set of steps from the following list:
If binary is false, then let the resource type be the type
specified in the resource's Content-Type metadata, and jump to the step
below labeled handler.
If there is a type attribute present on the <{object}> element, and its
value is not application/octet-stream, then run the following steps:
If the attribute's value is a type that a plugin supports, or the
attribute's value is a type that starts with "image/" that is
not also an XML MIME type, then let the resource type be the
type specified in that type attribute.
If there is a type attribute present on the <{object}> element, then
let the tentative type be the type specified in that type
attribute.
Otherwise, let tentative type be the computed type of the resource.
If tentative type is notapplication/octet-stream, then let resource type be
tentative type and jump to the step below labeled handler.
If applying the URL parser algorithm to the [=url/URL=] of the
specified resource (after any redirects) results in a URL record whose path
component matches a pattern that a plugin supports, then let
resource type be the type that the plugin can handle.
For example, a plugin might say that it can handle resources with
path components that end with the four character string ".swf".
It is possible for this step to finish, or for one of the substeps above to
jump straight to the next step, with resource type still being unknown. In
both cases, the next step will trigger fallback.
Handler: Handle the content as given by the first of the following cases that
matches:
If the resource type is not a type that the user agent supports, but
it is a type that a plugin supports
If plugins are being sandboxed and the plugin that supports resource type
cannot be secured, jump to the step below labeled fallback.
Otherwise, the user agent should use the plugin that supports
resource type and pass the content of the resource to that
plugin. If the plugin reports an error, then jump to the step
below labeled fallback.
If the resource type is an XML MIME type, or if the
resource type does not start with "image/"
The <{object}> element must be associated with a newly created
nested browsing context, if it does not already have one.
If the [=url/URL=] of the given resource is not about:blank, the element's
nested browsing context must then be navigated to that resource, with
replacement enabled, and with the <{object}> element's node document's
browsing context as the source browsing context. (The data
attribute of the <{object}> element doesn't get updated if the browsing context gets
further navigated to other locations.)
If the [=url/URL=] of the given resource isabout:blank, then,
instead, the user agent must queue a task to fire a simple event
named load at the <{object}> element.
No load event is fired at the about:blank document itself.
The <{object}> element represents the nested browsing context.
If the name attribute is present, the
browsing context name must be set to the value of this attribute; otherwise,
the browsing context name must be set to the empty string.
If the resource type starts with "image/", and support
for images has not been disabled
Apply the image sniffing rules to determine the type of the image.
The <{object}> element represents the specified image. The image is
not a nested browsing context.
If the image cannot be rendered, e.g., because it is malformed or in an unsupported
format, jump to the step below labeled fallback.
Otherwise
The given resource type is not supported. Jump to the step below
labeled fallback.
If the previous step ended with the resource type being
unknown, this is the case that is triggered.
The element's contents are not part of what the <{object}> element represents.
If the data attribute is absent but the type attribute is present,
and the user agent can find a plugin suitable according to the value of the
type attribute, and either plugins aren't being sandboxed or the plugin
can be secured, then that pluginshould be used. If these conditions
cannot be met, or if the plugin reports an error, jump to the step below labeled
fallback. Otherwise abort these steps; once the plugin is completely loaded,
queue a task to fire a simple event named load at the element.
Fallback: The <{object}> element represents the element's
children, ignoring any leading <{param}> element children. This is the element's
fallback content. If the element has an instantiated plugin, then unload it.
When the algorithm above instantiates a plugin, the user agent
should pass to the plugin used the names and values of all the attributes on the
element, in the order they were added to the element, with the attributes added by the parser
being ordered in source order, followed by a parameter named "PARAM" whose value is null, followed
by all the names and values of parameters given by
<{param}> elements that are children of the <{object}> element, in tree
order. If the plugin supports a scriptable interface, the
HTMLObjectElement object representing the element should expose that interface. The
<{object}> element represents the plugin. The
plugin is not a nested browsing context.
Plugins are considered sandboxed for the purpose of an
<{object}> element if the sandboxed plugins browsing context flag is set on
the <{object}> element's node document's active sandboxing flag
set.
Due to the algorithm above, the contents of <{object}> elements act as
fallback content, used only when referenced resources can't be shown (e.g., because it
returned a 404 error). This allows multiple <{object}> elements to be nested inside each other,
targeting multiple user agents with different capabilities, with the user agent picking the first
one it supports.
When an <{object}> element represents a nested browsing context: if the
<{object}> element's nested browsing context's active document
is not [=ready for post-load tasks=], and when anything is delaying the load event of
the <{object}> element's browsing context's active document, and when the
<{object}> element's browsing context is in the delaying load
events mode, the object must delay the load event of its document.
The task source for the tasks mentioned in this
section is the DOM manipulation task source.
Whenever the name attribute is set, if the
<{object}> element has a nested browsing context, its
name must be changed to the new value. If the attribute is
removed, if the <{object}> element has a browsing context, the
browsing context name must be set to the empty string.
The <{object}> element supports dimension attributes.
The IDL attributes data,
type and
name each must reflect the
respective content attributes of the same name. The
typeMustMatch IDL attribute must
reflect the <{object/typemustmatch}> content attribute.
The contentDocument IDL attribute must
return the {{Document}} object of the active document of the <{object}> element's
nested browsing context, if any and if its [=concept/origin=] is the
same origin-domain as the [=concept/origin=] specified by the
incumbent settings object, or null otherwise.
The contentWindow IDL attribute must
return the {{WindowProxy}} object of the <{object}> element's nested browsing context, if
it has one; otherwise, it must return null.
All <{object}> elements have a legacy caller operation. If the
<{object}> element has an instantiated plugin that supports a scriptable interface that
defines a legacy caller operation, then that must be the behavior of the object's legacy caller
operation. Otherwise, the object's legacy caller operation must be to throw a
{{NotSupportedError}} exception.
In the following example, a Java applet is embedded in a page using the object
element. To account for a user not having Java installed, a paragraph of
fallback content has been added after the <{param}>. The fallback <{p}> should not be
displayed if the Java applet loads.
My Java Clock
Applets have been removed from the web platform, and the example in this section is only provided
as a reference for providing fallback content when dealing with removed features such as applets.
All current implementations are expected to only show the fallback content in the example above.
In this example, an HTML page is embedded in another using the object element.
My HTML Clock
The following example shows how a plugin can be used in HTML (in this case the Flash plugin,
to show a video file). A fallback is provided for users who do not have Flash enabled, in this
case using the <{video}> element to show the video for those using user agents that support
<{video}>, and finally providing a link to the video for those who have neither Flash
nor a video-capable browser.
The <{param}> element defines parameters for plugins invoked by object
elements. It does not represent anything on its own.
The name attribute gives the name of the
parameter.
The value attribute gives the value of the
parameter.
Both attributes must be present. They may have any value.
If both attributes are present, and if the parent element of the param is an
<{object}> element, then the element defines a parameter with the given name-value pair.
If either the name or value of a parameter defined
by a <{param}> element that is the child of an <{object}> element that
represents an instantiated plugin changes, and if that
plugin is communicating with the user agent using an API that features the ability to
update the plugin when the name or value of a parameter so changes, then the user agent must
appropriately exercise that ability to notify the plugin of the change.
The IDL attributes name and value must both reflect the respective
content attributes of the same name.
The following example shows how the <{param}> element can be used to pass a parameter
to a plugin, in this case the O3D plugin.
If the element has a src attribute:
zero or more <{track}> elements, then transparent,
but with no media element descendants.
If the element does not have a src attribute: zero or more <{source}> elements,
then zero or more <{track}> elements, then transparent, but with no
media element descendants.
interface HTMLVideoElement : HTMLMediaElement {
attribute unsigned long width;
attribute unsigned long height;
readonly attribute unsigned long videoWidth;
readonly attribute unsigned long videoHeight;
attribute DOMString poster;
};
A <{video}> element is used for playing videos or movies, and audio files with captions.
Content may be provided inside the <{video}> element. User agents
should not show this content to the user; it is intended for older Web browsers which do
not support <{video}>, so that legacy video plugins can be tried, or to show text to the
users of these older browsers informing them of how to access the video contents.
In particular, this content is not intended to address accessibility concerns. To make video
content accessible to people with disabilities, a variety of features are available.
Captions and sign language tracks can be embedded in the video stream, or as external files
using the <{track}> element. Audio descriptions can be provided, either as a separate track
embedded in the video stream, or by referencing a WebVTT file with the <{track}>
element that the user agent can present as synthesized speech. WebVTT can also be used to
provide chapter titles. For users who would rather not use a media element at all, transcripts
or other textual alternatives can be provided by simply linking to them in the prose near the
<{video}> element. [[WEBVTT]]
The <{video}> element is a media element whose media data is ostensibly video
data, possibly with associated audio data.
Because the <{video}> element is necessary to support accessibility for audio content,
user agents should support playback of the same set of audio formats and container formats
in the <{video}> element that they support for the <{audio}> element.
The src, preload, autoplay, loop,
muted, and <{video/controls}> attributes are the attributes common to all
media elements.
The poster content attribute gives the address of an
image file that the user agent can show while no video data is available. The attribute, if
present, must contain a valid non-empty URL potentially surrounded by spaces.
If the specified resource is to be used, then, when the element is created or when the poster attribute is set, changed, or removed, the user agent must
run the following steps to determine the element's poster frame (regardless of the
value of the element's show poster flag):
If there is an existing instance of this algorithm running for this video
element, abort that instance of this algorithm without changing the poster
frame.
If the poster attribute's value is the empty string
or if the attribute is absent, then there is no poster frame; abort these
steps.
Parse the poster attribute's value relative to the element. If this
fails, then there is no poster frame; abort these steps.
If an image is thus obtained, the poster frame is that image. Otherwise,
there is no poster frame.
The image given by the poster attribute, the poster frame, is intended to
be a representative frame of the video (typically one of the first non-blank frames) that
gives the user an idea of what the video is like.
A <{video}> element represents what is given for the first matching condition in the list below:
When no video data is available (the element's readyState attribute is
either HAVE_NOTHING, or HAVE_METADATA but no video data has yet been
obtained at all, or the element's readyState attribute is any
subsequent value but the media resource does not have a video channel)
The <{video}> element represents its poster frame, if any,
or else the first frame of the video.
When the <{video}> element is {{HTMLMediaElement/paused}}, and the
frame of video corresponding to the current playback
position is not available (e.g., because the video is seeking or buffering)
When the <{video}> element is neither potentially playing nor
{{HTMLMediaElement/paused}} (e.g., when seeking or stalled)
The <{video}> element represents the last frame of the video to have
been rendered.
When the <{video}> element is {{HTMLMediaElement/paused}}
Frames of video must be obtained from the video track that was selected when the
event loop last reached step 1.
Which frame in a video stream corresponds to a particular playback position is
defined by the video stream's format.
The <{video}> element also represents any text track cues whose
text track cue active flag is set and whose
text track is in the showing mode, and any
audio from the media resource, at the current playback position.
Any audio associated with the media resource must, if played, be played
synchronized with the current playback position, at the element's effective
media volume. The user agent must play the audio from audio tracks that were enabled when the event loop last reached step 1.
In addition to the above, the user agent may provide messages to the user (such as "buffering",
"no video loaded", "error", or more detailed information) by overlaying text or icons on the
video or other areas of the element's playback area, or in another appropriate manner.
User agents that cannot render the video may instead make the element represent a link
to an external video playback utility or to the video data itself.
When a <{video}> element's media resource has a video channel, the
element provides a paint source whose width is the media resource's
intrinsic width, whose height is the
media resource's intrinsic height, and whose appearance is
the frame of video corresponding to the current playback position, if that is available, or else
(e.g., when the video is seeking or buffering) its previous appearance, if any, or else (e.g.,
because the video is still loading the first frame) blackness.
video . videoWidth
video . videoHeight
These attributes return the intrinsic dimensions of the video,
or zero if the dimensions are not known.
The intrinsic width and intrinsic height of the media resource
are the dimensions of the resource in CSS pixels after taking into account the resource's
dimensions, aspect ratio, clean aperture, resolution, and so forth, as defined for the format used
by the resource. If an anamorphic format does not define how to apply the aspect ratio to the
video data's dimensions to obtain the "correct" dimensions, then the user agent must apply the
ratio by increasing one dimension and leaving the other unchanged.
The videoWidth IDL attribute must return
the intrinsic width of the video in CSS pixels.
The videoHeight IDL attribute must
return the intrinsic height of the video in CSS
pixels. If the element's readyState attribute is HAVE_NOTHING,
then the attributes must return 0.
Whenever the intrinsic width or intrinsic height of the
video changes (including, for example, because the selected video track was changed), if the
element's readyState attribute is not HAVE_NOTHING, the user agent must
queue a task to fire a simple event named resize at the
media element.
The <{video}> element supports dimension attributes.
In the absence of style rules to the contrary, video content should be rendered inside the
element's playback area such that the video content is shown centered in the playback area at the
largest possible size that fits completely within it, with the video content's aspect ratio being
preserved. Thus, if the aspect ratio of the playback area does not match the aspect ratio of the
video, the video will be shown letterboxed or pillarboxed. Areas of the element's playback area
that do not contain the video represent nothing.
In user agents that implement CSS, the above requirement can be implemented by
using the style rule suggested in [[#rendering]].
The intrinsic width of a <{video}> element's playback area is the
intrinsic width of the poster frame, if that is available and the
element currently represents its poster frame; otherwise, it is the
intrinsic width of the video resource, if that is
available; otherwise the intrinsic width is missing.
The intrinsic height of a <{video}> element's playback area is the
intrinsic height of the poster frame, if that is available and the
element currently represents its poster frame; otherwise it is the
intrinsic height of the video resource, if that is
available; otherwise the intrinsic height is missing.
The default object size is a width of 300 CSS pixels and a height of 150 CSS
pixels. [[!CSS3-IMAGES]]
User agents should provide controls to enable or disable the display of closed captions, audio
description tracks, and other additional data associated with the video stream, though such
features should, again, not interfere with the page's normal rendering.
User agents may allow users to view the video content in manners more suitable to the user
(e.g., fullscreen or in an independent resizable window). Captions, subtitles or other additional
visual tracks should remain available and visible when enabled. As for the other user interface
features, controls to enable this should not interfere with the page's normal rendering unless
the user agent is exposing a user interface. As for the other user interface features, controls
to enable this should not interfere with the page's normal rendering unless the user agent is
exposing a user interface. In such an independent context, however, user agents may make
full user interfaces visible e.g., play, pause, seeking, and volume controls even if the
<{mediaelements/controls}> attribute is absent.
User agents may allow video playback to affect system features that could interfere with the
user's experience; for example, user agents could disable screensavers while video playback is in
progress.
The poster IDL attribute must
reflect the <{video/poster}> content attribute.
This example shows how different video files can be offered to the browser. If the browser
does not support a specific codec, it can play one of the alternative files offered.
This example also shows the <{video/controls}>, and <{media/preload}> attributes.
This example shows how to detect when a video has failed to play correctly:
If the element has a src attribute:
zero or more <{track}> elements, then transparent,
but with no media element descendants.
If the element does not have a src attribute: zero or more <{source}> elements,
then zero or more <{track}> elements, then transparent,
but with no media element descendants.
An <{audio}> element represents a sound or audio stream.
Content may be provided inside the <{audio}> element. User agents
should not show this content to the user; it is intended for older Web browsers which do
not support <{audio}>, so that legacy audio plugins can be tried, or to show text to the
users of these older browsers informing them of how to access the audio contents.
In this example, a browser which does not support inline audio can present a download link to
the user.
In particular, this content is not intended to address accessibility concerns. To
make audio content accessible to people with hearing, cognitive or other
disabilities, a variety of features are available. If captions or a sign language video are
available, the <{video}> element can be used instead of the <{audio}> element to
play the audio, allowing users to enable the visual alternatives. Chapter titles can be
provided to aid navigation, using the <{track}> element and a WebVTT file. And,
naturally, transcripts or other textual alternatives can be provided by simply linking to
them in the prose near the <{audio}> element. [[WEBVTT]]
The <{audio}> element is a media element whose media data is ostensibly audio data.
The src, preload,
autoplay, loop, muted, and <{audio/controls}>
attributes are the attributes common to all media elements.
When an <{audio}> element is potentially playing, it must have its audio
data played synchronized with the current playback position, at the element's
effective media volume. The user agent must play the audio from audio tracks that
were enabled when the event loop last reached step 1.
When an <{audio}> element is not potentially playing, audio must not play for the element.
audio = new Audio( [ url ] )
Returns a new <{audio}> element, with the src
attribute set to the value passed in the argument, if applicable.
A constructor is provided for creating HTMLAudioElement objects (in addition to
the factory methods from DOM such as createElement()):
Audio(src).
When invoked as a constructor, it must return a new HTMLAudioElement object
(a new audio element). The element must be created with its preload
attribute set to the literal value "auto". If the src argument is
present, the object created must be created with its src content attribute set
to the provided value (this will cause the user agent to invoke the object's
resource selection algorithm before returning).
The element's node document must be the active document of the
browsing context of the Window object on which the interface object
of the invoked constructor is found.
This example shows how different audio files can be offered to the browser. If the browser does
not support a specific codec, it can play one of the alternative files offered.
This example also shows the boolean <{audio/controls}>, <{media/autoplay}>, and
<{media/loop}> attributes.
The audio element can cause content to play which implements the "Dolphin" attack [[Dolphin]],
using sound the user cannot hear to trigger interactive voice devices. Mitigations include
limiting the range of audio reproduction to that which the user can hear, and disabling
<{audio/autoplay}> functionality.
The audio element can be used to mimic parts of a voice interface such as a screen reader
or "voice assistant", in a phishing attack. Mitigation strategies include disabling
<{audio/autoplay}> functionality, or advising users not to use default audio voices
in order to decrease the likelihood of a successful mimic attack.
The <{track}> element allows authors to specify explicit external text resources for
media elements. It does not represent anything on its own.
The kind attribute is an
enumerated attribute. The following table lists the keywords defined for this attribute.
The keyword given in the first cell of each row maps to the state given in the second cell.
Keyword
State
Brief description
subtitles
Subtitles
Transcription or translation of the dialog, suitable for when the sound is available but not understood (e.g., because the user does not understand the language of the media resource's audio track).
Overlaid on the video.
captions
Captions
Transcription or translation of the dialog, sound effects, relevant musical cues, and other relevant audio information, suitable for when sound is unavailable or not clearly audible (e.g., because it is muted, drowned-out by ambient noise, or because the user is deaf).
Overlaid on the video; labeled as appropriate for the hard-of-hearing.
descriptions
Descriptions
Textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is obscured, unavailable, or not usable (e.g., because the user is interacting with the application without a screen while driving, or because the user is blind).
Synthesized as audio.
chapters
Chapters
Chapter titles, intended to be used for navigating the media resource.
Displayed as an interactive (potentially nested) list in the user agent's interface.
metadata
Metadata
Tracks intended for use from script.
Not displayed by the user agent.
The attribute may be omitted. The missing value default is the
subtitles state. The invalid value default is the
metadata state.
The src attribute gives the address of the text
track data. The value must be a valid non-empty URL potentially surrounded by spaces.
This attribute must be present.
If the element has a <{track/src}> attribute whose value is not the empty string and whose
value, when the attribute was set, could be successfully parsed relative to the element's
node document, then the element's track URL is the
resulting URL string. Otherwise, the element's track URL is the empty string.
The srclang attribute gives the language of
the text track data. The value must be a valid BCP 47 language tag. This attribute must be
present if the element's kind attribute is in the subtitles state. [[!BCP47]]
If the element has a srclang attribute whose value is not the empty string,
then the element's track language is the value of the attribute.
Otherwise, the element has no track language.
The label attribute gives a user-readable
title for the track. This title is used by user agents when listing subtitle,
caption, and audio description tracks in
their user interface.
The value of the label attribute, if the attribute is present, must not be the
empty string. Furthermore, there must not be two track element children of the
same media element whose kind attributes are in the same state, whose
srclang attributes are both missing or have values that represent the same
language, and whose label attributes are again both missing or both have the
same value.
If the element has a label attribute whose value is not the empty string, then
the element's track label is the value of the attribute. Otherwise, the element's
track label is an empty string.
The default attribute is a
boolean attribute, which, if specified, indicates that the track is to be enabled
if the user's preferences do not indicate that another track would be more appropriate.
Each media element must have no more than one <{track}> element child
whose kind attribute is in the Subtitles or
Captions state and whose <{track/default}> attribute is specified.
Each media element must have no more than one <{track}> element child
whose kind attribute is in the Descriptions state
and whose <{track/default}> attribute is specified.
Each media element must have no more than one <{track}> element child whose
kind attribute is in the Chapters state and
whose <{track/default}> attribute is specified.
There is no limit on the number of <{track}> elements whose kind attribute is in the Metadata state and whose <{track/default}> attribute is specified.
Returns the {{TextTrack}} object corresponding to the text track of
the <{track}> element.
The readyState attribute must return the
numeric value corresponding to the text track readiness state of the <{track}> element's
text track, as defined by the following list:
The track IDL attribute must, on
getting, return the <{track}> element's text track's corresponding {{TextTrack}} object.
The src,
srclang,
label, and
default IDL attributes must
reflect the respective content attributes of the same name.
The kind IDL attribute must
reflect the content attribute of the same name, limited to only known values.
This video has subtitles in several languages:
(The <{global/lang}> attributes on the last two describe the language of the
label attribute, not the language of the subtitles themselves. The language of
the subtitles is given by the srclang attribute.)
Media elements
{{HTMLMediaElement}} objects (<{audio}> and <{video}>, in this specification) are simply known as
media elements.
The media element attributes, src, crossorigin,
preload, autoplay, <{media/disableRemotePlayback}>, loop,
muted, and <{mediaelements/controls}>, apply to all media elements. They are
defined in this section.
Media elements are used to present audio data, or video and audio data, to the user.
This is referred to as media data in this section, since this section applies equally
to media elements for audio or for video.
The term media resource is used to refer to the complete
set of media data, e.g., the complete video file, or complete audio file.
A media resource can have multiple audio and video tracks. For the purposes of a
media element, the video data of the media resource is only that of the
currently selected track (if any) as given by the element's videoTracks attribute
when the event loop last reached step 1, and the audio data of the media resource
is the result of mixing all the currently enabled tracks (if any) given by the element's
audioTracks attribute when the event loop last reached step 1.
Both <{audio}> and <{video}> elements can be used for both audio and video. The main
difference between the two is simply that the audio element has no playback area
for visual content (such as video or captions), whereas the video element does.
Except where otherwise explicitly specified, the task source for all the tasks
queued in this section and its subsections is the
media element event task source of the media element in question.
Error codes
media . error
Returns a MediaError object representing the current error state of the element.
Returns null if there is no error.
All media elements have an associated error status, which records the last error the
element encountered since its resource selection algorithm was last invoked. The
error attribute, on getting, must
return the MediaError object created for this last error, or null if there has
not been an error.
interface MediaError {
const unsigned short MEDIA_ERR_ABORTED = 1;
const unsigned short MEDIA_ERR_NETWORK = 2;
const unsigned short MEDIA_ERR_DECODE = 3;
const unsigned short MEDIA_ERR_SRC_NOT_SUPPORTED = 4;
readonly attribute unsigned short code;
};
media . error . code
Returns the current error's error code, from the list below.
The code attribute of a
MediaError object must return the code for the error, which must be one of the
following:
MEDIA_ERR_ABORTED (numeric value 1)
The fetching process for the media resource was aborted by the user agent at the
user's request.
MEDIA_ERR_NETWORK (numeric value 2)
A network error of some description caused the user agent to stop fetching the media
resource, after the resource was established to be usable.
MEDIA_ERR_DECODE (numeric value 3)
An error of some description occurred while decoding the media resource, after
the resource was established to be usable.
Returns the [=url/URL=] of the current media resource, if any.
Returns the empty string when there is no media resource, or it doesn't have a
[=url/URL=].
There are three ways to specify a media resource, the {{HTMLMediaElement/srcObject}}
IDL attribute, the <{media/src}> content attribute, and <{source}> elements. The IDL attribute
takes priority, followed by the content attribute, followed by the elements.
MIME types
A media resource can be described in terms of its type, specifically a
MIME type, in some cases with a codecs parameter. (Whether the
codecs parameter is allowed or not depends on the MIME type.) [[!RFC6381]]
Types are usually somewhat incomplete descriptions; for example "video/mpeg"
doesn't say anything except what the container type is, and even a type like
"video/mp4; codecs="avc1.42E01E, mp4a.40.2"" doesn't include information like the
actual bitrate (only the maximum bitrate). Thus, given a type, a user agent can often only know
whether it might be able to play media of that type (with varying levels of
confidence), or whether it definitely cannot play media of that type.
A type that the user agent knows it cannot render is one that describes a resource
that the user agent definitely does not support, for example because it doesn't recognize the
container type, or it doesn't support the listed codecs.
The MIME type "application/octet-stream" with no parameters is never
a type that the user agent knows it cannot render. User agents must treat that type
as equivalent to the lack of any explicit Content-Type metadata
when it is used to label a potential media resource.
Only the MIME type "application/octet-stream" with no parameters is
special-cased here; if any parameter appears with it, it will be treated just like any other
MIME type. This is a deviation from the rule that unknown MIME type parameters
should be ignored.
media . canPlayType(type)
Returns the empty string (a negative response), "maybe", or "probably" based on how confident
the user agent is that it can play media resources of the given type.
The canPlayType(type) method
must return the empty string if type is a type that the user agent
knows it cannot render or is the type "application/octet-stream"; it must
return "probably" if the user agent is confident that the type
represents a media resource that it can render if used in with this audio
or <{video}> element; and it must return "maybe" otherwise.
Implementors are encouraged to return "maybe" unless the type can be
confidently established as being supported or not. Generally, a user agent should never return
"probably" for a type that allows the codecs parameter if that
parameter is not present.
This script tests to see if the user agent supports a (fictional) new format to dynamically
decide whether to use a <{video}> element or a plugin:
The type attribute of the <{source}> element allows the user agent to avoid
downloading resources that use formats it cannot render.
Network states
media . {{HTMLMediaElement/networkState}}
Returns the current state of network activity for the element, from the codes in the list
below.
As media elements interact with the network, their current network activity is represented
by the networkState attribute. On
getting, it must return the current network state of the element, which must be one of the
following values:
: NETWORK_EMPTY (numeric value 0)
:: The element has not yet been initialized. All attributes are in their initial states.
: NETWORK_IDLE (numeric value 1)
:: The element's resource selection algorithm is active and has selected a
resource, but it is not actually using the network at this time.
: NETWORK_LOADING (numeric value 2)
:: The user agent is actively trying to download data.
: NETWORK_NO_SOURCE (numeric value 3)
:: The element's resource selection algorithm is active, but it has not yet found a
resource to use.
The resource selection algorithm defined below describes exactly when the
{{HTMLMediaElement/networkState}} attribute changes value and what events fire to indicate
changes in this state.
Loading the media resource
media . load()
Causes the element to reset and start selecting and loading a new media resource
from scratch.
All media elements have an autoplaying flag, which must begin in the true
state, and a delaying-the-load-event flag, which must begin in the false state.
While the delaying-the-load-event flag is true, the element must
delay the load event of its document.
When the load() method on a
media element is invoked, the user agent must run the media element load algorithm.
The media element load algorithm consists of the following steps.
Basically, pending events and callbacks for the media element are discarded and
pending promises are rejected when the media element starts loading a new resource.
Playback of any previously playing media resource for this element stops.
The resource selection algorithm for a media element is as follows. This
algorithm is always invoked as part of a task, but one of the first steps in the
algorithm is to return and continue running the remaining steps in parallel. In addition,
this algorithm interacts closely with the event loop mechanism; in particular, it has
synchronous sections (which are triggered as part of the event loop algorithm).
Steps in such sections are marked with ⌛.
Set the element's networkState attribute to
the NETWORK_NO_SOURCE value.
Wait for the task queued by the previous step to have executed.
Abort these steps. The element won't attempt to load another resource until this
algorithm is triggered again.
If mode is attribute
⌛ If the src attribute's value is the empty string, then end the
synchronous section, and jump down to the failed with attribute step below.
⌛ Let |urlString| and |urlRecord| be the [=resulting URL string=] and the
[=resulting URL record=], respectively, that would have resulted from [=parsing=] the
[=url/URL=] specified by the <{media/src}> attribute's value relative to the
[=media element=]'s [=node document=] when the <{media/src}> attribute was last changed.
⌛ If urlString was obtained successfully, set the currentSrc attribute to urlString.
If |urlRecord| was obtained successfully, run the [=resource fetch algorithm=] with
|urlRecord|. If that algorithm returns without aborting *this* one, then the load failed.
Wait for the task queued by the previous step to have executed.
Abort these steps. The element won't attempt to load another resource until this
algorithm is triggered again.
Otherwise (mode is children)
⌛ Let pointer be a position defined by two adjacent nodes in the
media element's child list, treating the start of the list (before the first
child in the list, if any) and end of the list (after the last child in the list, if any)
as nodes in their own right. One node is the node before pointer, and the
other node is the node after pointer. Initially, let pointer be the
position between the candidate node and the next node, if there are any, or the
end of the list, if it is the last node.
As nodes are inserted and removed into the
media element, pointer must be updated as follows:
If a new node is inserted between the two nodes that define pointer
Let pointer be the point between the node before pointer
and the new node. In other words, insertions at pointer go after
pointer.
If the node before pointer is removed
Let pointer be the point between the node after pointer
and the node before the node after pointer. In other words,
pointer doesn't move relative to the remaining nodes.
If the node after pointer is removed
Let pointer be the point between the node before pointer
and the node after the node before pointer. Just as with the previous case,
pointer doesn't move relative to the remaining nodes.
Other changes don't affect pointer.
⌛ Process candidate: If candidate does not have a
src attribute, or if its src attribute's value is the
empty string, then end the synchronous section, and jump down to the
failed with elements step below.
⌛ Let |urlString| and |urlRecord| be the [=resulting URL string=] and the
[=resulting URL record=], respectively, that would have resulted from [=parsing=] the
[=url/URL=] specified by |candidate|'s <{source/src}> attribute's value relative to the
|candidate|'s [=node document=] when the <{source/src}> attribute was last changed.
⌛ If urlString was not obtained successfully, then end the
synchronous section, and jump down to the Failed with elements step
below.
⌛ If candidate has a type attribute whose value, when
parsed as a MIME type (including any codecs described by the codecs
parameter, for types that define that parameter), represents
a type that the user agent knows it cannot render, then end the
synchronous section, and jump down to the failed with elements
step below.
⌛ Search loop: If the node after pointer is
the end of the list, then jump to the waiting step below.
⌛ If the node after pointer is a <{source}> element,
let candidate be that element.
⌛ Advance pointer so that the node before pointer is now the
node that was after pointer, and the node after pointer is the node
after the node that used to be after pointer, if any.
⌛ If candidate is null, jump back to the search loop step.
Otherwise, jump back to the process candidate step.
⌛ Waiting: Set the element's networkState attribute to the
NETWORK_NO_SOURCE value.
⌛ Set the element's delaying-the-load-event flag back to true (this
[=delay the load event|delays the load event=] again, in case it hasn't been fired yet).
⌛ Set the networkState back to NETWORK_LOADING.
⌛ Jump back to the find next candidate step above.
The dedicated media source failure steps are the following steps:
Set the error attribute to a new MediaError object whose
code attribute is set to MEDIA_ERR_SRC_NOT_SUPPORTED.
If the algorithm was invoked with [=media provider object=] or a [=URL record=] whose
[=url/object=] is a [=media provider object=], then let |mode| be *local*. Otherwise let |mode|
be *remote*.
If mode is remote, then let the current media resource be the
resource given by the [=URL record=] passed to this algorithm; otherwise, let the
current media resource be the resource given by the media provider
object. Either way, the current media resource is now the element's media
resource.
Run the appropriate steps from the following list:
If mode is remote
Optionally, run the following substeps. This is the expected behavior if the user agent
intends to not attempt to fetch the resource until the user requests it explicitly
(e.g., as a way to implement the preload attribute's none
keyword).
Wait for an implementation-defined event (e.g., the user requesting that the media
element begin playback).
Set the element's delaying-the-load-event flag back to true (this
[=delay the load event|delays the load event=] again, in case it hasn't been fired yet).
Set the networkState to NETWORK_LOADING.
Let request be the result of creating a potential-CORS request given
current media resource's [=URL record=] and the media element's
crossorigin content attribute value.
Set request's client to the media element's
node document's Window object's environment settings object
and type to "audio" if the media element is an
audio element and to "video" otherwise.
Fetchrequest.
The response's unsafe response obtained in this fashion, if any,
contains the media data. It can be CORS-same-origin or
CORS-cross-origin; this affects whether subtitles referenced in the media
data are exposed in the API and, for <{video}> elements, whether a
canvas gets tainted when the video is drawn on it.
The stall timeout is a user-agent defined length of time, which should be about
three seconds. When a media element that is actively attempting to obtain
media data has failed to receive any data for a duration equal to the stall
timeout, the user agent must queue a task to fire a simple
event named stalled at the element.
User agents may allow users to selectively block or slow media data downloads.
When a media element's download has been blocked altogether, the user agent must
act as if it was stalled (as opposed to acting as if the connection was closed). The rate
of the download may also be throttled automatically by the user agent, e.g., to balance
the download with other connections sharing the same bandwidth.
User agents may decide to not download
more content at any time, e.g., after buffering five minutes of a one hour media resource,
while waiting for the user to decide whether to play the resource or not, while waiting
for user input in an interactive resource, or when the user navigates away from the page.
When a media element's download has been suspended, the user agent must
queue a task, to set the networkState to NETWORK_IDLE
and fire a simple event named suspend at the element. If and when
downloading of the resource resumes, the user agent must queue a task to set the
networkState to NETWORK_LOADING. Between the queuing of these
tasks, the load is suspended (so progress events don't fire, as described
above).
The preload attribute provides a hint regarding how much buffering the
author thinks is advisable, even in the absence of the autoplay attribute.
When a user agent decides to completely suspend a download, e.g., if it is waiting until
the user starts playback before downloading any further content, the user agent must
queue a task to set the element's delaying-the-load-event flag to
false. This stops delaying the load event.
The user agent may use whatever means necessary to fetch the resource (within the
constraints put forward by this and other specifications); for example, reconnecting to
the server in the face of network errors, using HTTP range retrieval requests, or
switching to a streaming protocol. The user agent must consider a resource erroneous only
if it has given up trying to fetch it.
To determine the format of the media resource, the user agent must use the
rules for sniffing audio and video specifically.
While the load is not suspended (see below), every 350ms (±200ms) or for every byte
received, whichever is least frequent, queue a task to fire a simple
event named progress at the element.
The networking task sourcetasks to process the data as it is being fetched
must each immediatelyqueue a task to run the first appropriate steps from
the media data processing steps list below. (A new task is used for this so that
the work described below occurs relative to the media element event task source
rather than the networking task source.)
When the networking task source has queued the last task as part of
fetching the media resource (i.e., once the download has completed), if the
fetching process completes without errors, including decoding the media data, and if all
of the data is available to the user agent without network access, then, the user agent
must move on to the final step below. This might never happen, e.g., when streaming
an infinite resource such as Web radio, or if the resource is longer than the user agent's
ability to cache data.
While the user agent might still need network access to obtain parts of the
media resource, the user agent must remain on this step.
For example, if the user agent has discarded the first half of a video, the user agent
will remain at this step even once the playback has ended, because there is
always the chance the user will seek back to the start. In fact, in this situation,
once playback has ended, the user agent will end up firing a suspend
event, as described earlier.
Otherwise (mode is local)
The resource described by the current media resource, if any, contains the
media data. It is CORS-same-origin.
If the current media resource is a raw data stream (e.g., from a
File object), then to determine the format of the media resource,
the user agent must use the rules for sniffing audio and video specifically.
Otherwise, if the data stream is pre-decoded, then the format is the format given by the
relevant specification.
Set the element’s [=delaying-the-load-event flag=] to false. This stops
[=delay the load event|delaying the load event=] when the resource is local.
Whenever new data for the current media resource becomes available,
queue a task to run the first appropriate steps from the
media data processing steps list below.
When the current media resource is permanently exhausted (e.g., all the bytes of
a Blob have been processed), if there were no decoding errors, then the user
agent must move on to the final step below. This might never happen, e.g., if the
current media resource is a MediaStream.
The media data processing steps list is as follows:
If the media data cannot be fetched at all, due to network errors, causing the
user agent to give up trying to fetch the resource
If the media data can be fetched but is found by inspection to be in an
unsupported format, or can otherwise not be rendered at all
DNS errors, HTTP 4xx and 5xx errors (and equivalents in other protocols), and other fatal
network errors that occur before the user agent has established whether the
current media resource is usable, as well as the file using an unsupported
container format, or using unsupported codecs for all the data, must cause the user agent to
execute the following steps:
The user agent should cancel the fetching process.
Create an AudioTrack object to represent the audio track.
Update the media element's audioTracks attribute's
AudioTrackList object with the new AudioTrack object.
Let enable be unknown.
If either the media resource or the address of the
current media resource indicate a particular set of audio tracks to enable,
or if the user agent has information that would facilitate the selection of specific audio
tracks to improve the user's experience, then: if this audio track is one of the ones to
enable, then set enable to true, otherwise, set enable
to false.
This could be triggered by Media Fragments URI fragment
identifier syntax, but it could also be triggered e.g., by the user agent selecting
a 5.1 surround sound audio track over a stereo audio track. [[!MEDIA-FRAGS]]
If enable is still unknown, then, if the media element
does not yet have an enabled audio track, then set enable to true,
otherwise, set enable to false.
If enable is true, then enable this audio track,
otherwise, do not enable this audio track.
Fire a trusted event with the name addtrack, that does
not bubble and is not cancelable, and that uses the {{TrackEvent}} interface, with the
track attribute initialized to the new
AudioTrack object, at this AudioTrackList object.
Create a {{VideoTrack}} object to represent the video track.
Update the media element's videoTracks attribute's
VideoTrackList object with the new VideoTrack object.
Let enable be unknown.
If either the media resource or the address of the
current media resource indicate a particular set of video tracks to enable,
or if the user agent has information that would facilitate the selection of specific
video tracks to improve the user's experience, then: if this video track is the first
such video track, then set enable to true, otherwise, set
enable to false.
This could again be triggered by media fragments syntax.
If enable is still unknown, then, if the media element
does not yet have a selected video track, then set enable to
true, otherwise, set enable to false.
If enable is true, then select this track and unselect any
previously selected video tracks, otherwise, do not select this video track. If other
tracks are unselected, then a change event will be fired.
Fire a trusted event with the name addtrack, that does
not bubble and is not cancelable, and that uses the {{TrackEvent}} interface, with the
track attribute initialized to the new VideoTrack object,
at this {{VideoTrackList}} object.
Once enough of the media data has been fetched to
determine the duration of the media resource, its dimensions, and other metadata
This indicates that the resource is usable. The user agent must follow these substeps:
Update the timeline offset to the date and time that corresponds to the zero time
in the media timeline established in the previous step, if any. If no explicit time
and date is given by the media resource, the timeline offset must be set to
Not-a-Number (NaN).
Update the duration attribute with the time of the last frame of the
resource, if known, on the media timeline established above. If it is not known
(e.g., a stream that is in principle infinite), update the duration attribute
to the value positive Infinity.
If either the media resource or the address of the
current media resource indicate a particular start time, then set the
initial playback position to that time and, if jumped is still
false, seek to that time and let jumped be true.
For example, with media formats that support the media fragment syntax
the fragment, can be used to indicate a start position.
[[!MEDIA-FRAGS]]
If there is no enabled audio track, then enable an audio track. This will cause a
change event to be fired.
If there is no selected video track, then select a video track. This will
cause a change event to be fired.
A user agent that is attempting to reduce network usage while still fetching
the metadata for each media resource would also stop buffering at this point,
following the rules described previously, which involve the
networkState attribute switching to the NETWORK_IDLE
value and a suspend event firing.
The user agent is required to determine the duration of the
media resource and go through this step before playing.
Once the entire media resource has been fetched (but potentially before any of
it has been decoded)
If the user agent can keep the media resource loaded, then the
algorithm will continue to its final step below, which aborts the algorithm.
If the connection is interrupted after some media data has been received,
causing the user agent to give up trying to fetch the resource
Fatal network errors that occur after the user agent has established whether the
current media resource is usable (i.e., once the media element's
readyState attribute is no longer HAVE_NOTHING) must cause
the user agent to execute the following steps:
The user agent should cancel the fetching process.
Set the error attribute to a new MediaError object whose
code attribute is set to MEDIA_ERR_NETWORK.
Set the element's networkState attribute to the
NETWORK_IDLE value.
Fatal errors in decoding the media data that occur after the user agent has
established whether the current media resource is usable (i.e., once the
media element's readyState attribute is no longer
HAVE_NOTHING) must cause the user agent to execute the following steps:
The user agent should cancel the fetching process.
Set the error attribute to a new MediaError object whose
code attribute is set to MEDIA_ERR_DECODE.
Set the element's networkState attribute
to the NETWORK_IDLE value.
If the media data fetching process is aborted by the user
The fetching process is aborted by the user, e.g., because the user pressed a "stop" button,
the user agent must execute the following steps. These steps are not followed if the
load() method itself is invoked while these steps are running, as the steps
above handle that particular kind of abort.
The user agent should cancel the fetching process.
Set the error attribute to a new
MediaError object whose code attribute
is set to MEDIA_ERR_ABORTED.
If the media element's readyState attribute has a value equal to
HAVE_NOTHING, set the element's networkState attribute to the
NETWORK_EMPTY value, set the element's show poster flag to true, and
fire a simple event named emptied at the element.
Otherwise, set the element's networkState
attribute to the NETWORK_IDLE value.
If the media data can be fetched but has non-fatal
errors or uses, in part, codecs that are unsupported, preventing the user agent from
rendering the content completely correctly but not preventing playback altogether
The server returning data that is partially usable but cannot be optimally rendered must
cause the user agent to render just the bits it can handle, and ignore the rest.
Cross-origin videos do not expose their subtitles, since that would allow attacks such as
hostile sites reading subtitles from confidential videos on a user's intranet.
Final step: If the user agent ever reaches this step (which can only happen if
the entire resource gets loaded and kept available): abort the overall
resource selection algorithm.
When a media element is to forget the media element's media-resource-specific
tracks, the user agent must remove from the media element's
list of text tracks all the media-resource-specific text tracks, then empty the
media element's audioTracks attribute's AudioTrackList object,
then empty the media element's videoTracks attribute's {{VideoTrackList}}
object. No events (in particular, no removetrack events) are fired as part of this;
the error and emptied events, fired by the algorithms that
invoke this one, can be used instead.
The preload attribute is an enumerated
attribute. The following table lists the keywords and states for the attribute — the
keywords in the left column map to the states in the cell in the second column on the same row as
the keyword. The attribute can be changed even once the media resource is being
buffered or played; the descriptions in the table below are to be interpreted with that in mind.
Keyword
State
Brief description
none
None
Hints to the user agent that either the author does not expect the user to need the media resource, or that the server wants to minimize unnecessary traffic.
This state does not provide a hint regarding how aggressively to actually download the media resource if buffering starts anyway (e.g., once the user hits "play").
metadata
Metadata
Hints to the user agent that the author does not expect the user to need the media resource, but that fetching the resource metadata (dimensions, track list, duration, etc), and maybe even the first few frames, is reasonable. If the user agent precisely fetches no more than the metadata, then the media element will end up with its readyState attribute set to HAVE_METADATA; typically though, some frames will be obtained as well and it will probably be HAVE_CURRENT_DATA or HAVE_FUTURE_DATA.
When the media resource is playing, hints to the user agent that bandwidth is to be considered scarce, e.g., suggesting throttling the download so that the media data is obtained at the slowest possible rate that still maintains consistent playback.
auto
Automatic
Hints to the user agent that the user agent can put the user's needs first without risk to the server, up to and including optimistically downloading the entire resource.
The empty string is also a valid keyword, and maps to the
Automatic state. The attribute's missing value default is
user-agent defined, though the Metadata state is suggested as a
compromise between reducing server load and providing an optimal user experience.
Authors might switch the attribute from "none" or "metadata" to
"auto" dynamically once the user begins playback. For example, on a page with
many videos this might be used to indicate that the many videos are not to be downloaded
unless requested, but that once one is requested it is to be downloaded aggressively.
The preload attribute is intended to provide a hint to the user agent about what
the author thinks will lead to the best user experience. The attribute may be ignored
altogether, for example based on explicit user preferences or based on the available
connectivity.
The preload IDL attribute must
reflect the content attribute of the same name, limited to only known values.
The autoplay attribute can override the preload attribute
(since if the media plays, it naturally has to buffer first, regardless of the hint given
by the preload attribute). Including both is not an error.
media . buffered
Returns a TimeRanges object that represents the ranges of the
media resource that the user agent has buffered.
The buffered attribute must return a new
static normalized TimeRanges object that represents the ranges of the
media resource, if any, that the user agent has buffered, at the time the attribute
is evaluated. Users agents must accurately determine the ranges available, even for media streams
where this can only be determined by tedious inspection.
Typically this will be a single range anchored at the zero point. However, if the user agent
uses HTTP range requests in response to seeking, then there could be multiple ranges.
User agents may discard previously buffered data.
Thus, a time position included within a range of the objects return by the
buffered attribute at one time can end up being not included in the range(s) of
objects returned by the same attribute at later times.
Offsets into the media resource
media . duration
Returns the length of the media resource, in seconds, assuming that the start of
the media resource is at time zero.
Returns NaN if the duration isn't available.
Returns Infinity for unbounded streams.
A media resource has a media timeline that maps times (in seconds) to
positions in the media resource. The origin of a timeline is its earliest defined
position. The duration of a timeline is its last defined position.
Establishing the media timeline:
If the media resource somehow specifies an explicit timeline whose origin is not negative
(i.e., gives each frame a specific time offset and gives the first frame a zero or positive
offset), then the media timeline should be that timeline. (Whether the
media resource can specify a timeline or not depends on the media resource's
format.) If the media resource specifies an explicit start time and date,
then that time and date should be considered the zero point in the media timeline; the
timeline offset will be the time and date, exposed using the
getStartDate() method.
If the media resource has a discontinuous timeline, the user agent must extend the
timeline used at the start of the resource across the entire resource, so that the media
timeline of the media resource increases linearly starting from the
earliest possible position (as defined below), even if the underlying media
data has out-of-order or even overlapping time codes.
For example, if two clips have been concatenated into one video file, but the video format
exposes the original times for the two clips, the video data might expose a timeline that
goes, say, 00:15..00:29 and then 00:05..00:38. However, the user agent would not expose those
times; it would instead expose the times as 00:15..00:29 and 00:29..01:02, as a single video.
In the rare case of a media resource that does not have an explicit timeline, the
zero time on the media timeline should correspond to the first frame of the
media resource. In the even rarer case of a media resource with no
explicit timings of any kind, not even frame durations, the user agent must itself determine the
time for each frame in a user-agent-defined manner.
An example of a file format with no explicit timeline but with explicit frame durations is
the Animated GIF format. An example of a file format with no explicit timings at all is the
JPEG-push format (multipart/x-mixed-replace with JPEG frames, often used as the
format for MJPEG streams).
If, in the case of a resource with no timing information, the user agent will nonetheless be
able to seek to an earlier point than the first frame originally provided by the server, then the
zero time should correspond to the earliest seekable time of the media resource;
otherwise, it should correspond to the first frame received from the server (the point in the
media resource at which the user agent began receiving the stream).
At the time of writing, there is no known format that lacks explicit frame time
offsets yet still supports seeking to a frame before the first frame sent by the server.
Consider a stream from a TV broadcaster, which begins streaming on a sunny Friday afternoon in
October, and always sends connecting user agents the media data on the same media timeline,
with its zero time set to the start of this stream. Months later, user agents connecting to
this stream will find that the first frame they receive has a time with millions of seconds.
The getStartDate() method would always return the date that the
broadcast started; this would allow controllers to display real times in their scrubber (e.g.,
"2:30pm") rather than a time relative to when the broadcast began ("8 months, 4 hours, 12
minutes, and 23 seconds").
Consider a stream that carries a video with several concatenated fragments, broadcast by a
server that does not allow user agents to request specific times but instead just streams the
video data in a predetermined order, with the first frame delivered always being identified as
the frame with time zero. If a user agent connects to this stream and receives fragments
defined as covering timestamps 2010-03-20 23:15:00 UTC to 2010-03-21 00:05:00 UTC and
2010-02-12 14:25:00 UTC to 2010-02-12 14:35:00 UTC, it would expose this with a
media timeline starting at 0s and extending to 3,600s (one hour). Assuming the
streaming server disconnected at the end of the second clip, the duration
attribute would then return 3,600. The getStartDate() method would return a
{{Date}} object with a time corresponding to 2010-03-20 23:15:00 UTC. However, if a
different user agent connected five minutes later, it would (presumably) receive
fragments covering timestamps 2010-03-20 23:20:00 UTC to 2010-03-21 00:05:00 UTC and 2010-02-12
14:25:00 UTC to 2010-02-12 14:35:00 UTC, and would expose this with a media timeline
starting at 0s and extending to 3,300s (fifty five minutes). In this case, the
getStartDate() method would return a {{Date}} object
with a time corresponding to 2010-03-20 23:20:00 UTC.
In both of these examples, the seekable attribute would give the ranges that
the controller would want to actually display in its UI; typically, if
the servers don't support seeking to arbitrary times, this would be the range of time from the
moment the user agent connected to the stream up to the latest frame that the user agent has
obtained; however, if the user agent starts discarding earlier information, the actual range
might be shorter.
In any case, the user agent must ensure that the earliest possible position (as
defined below) using the established media timeline, is greater than or equal to zero.
The media timeline also has an associated clock. Which clock is used is user-agent
defined, and may be media resource-dependent, but it should approximate the user's
wall clock.
Media elements have a
current playback position,
which must initially (i.e., in the absence of media data) be zero seconds. The
current playback position is a time on the media timeline.
Media elements also have an official playback
position, which must initially be set to zero seconds. The official playback
position is an approximation of the current playback position that is kept
stable while scripts are running.
Media elements also have a default playback start position, which must
initially be set to zero seconds. This time is used to allow the
element to be seeked even before the media is loaded.
Each media element has a show poster flag. When a media element
is created, this flag must be set to true. This flag is used to control when the user agent
is to show a poster frame for a <{video}> element instead of showing the video contents.
The currentTime attribute must, on
getting, return the media element's default playback start position,
unless that is zero, in which case it must return the element's official playback
position. The returned value must be expressed in seconds. On setting, if the
media element's readyState is HAVE_NOTHING,
then it must set the media element's default playback start position to the
new value; otherwise, it must set the official playback position to the new value and
then seek to the new value. The new value must be interpreted as being in seconds.
Media elements have an initial playback position,
which must initially (i.e., in the absence of media data) be zero seconds. The
initial playback position is updated when a media resource is loaded.
The initial playback position is a time on the media timeline.
If the media resource is a streaming resource, then the user agent might be unable
to obtain certain parts of the resource after it has expired from its buffer. Similarly, some
media resources might have a media timeline that doesn't start at zero.
The earliest possible position is the earliest position in the stream or resource
that the user agent can ever obtain again. It is also a time on the media timeline.
The earliest possible position is not explicitly exposed in the API; it corresponds to
the start time of the first range in the seekable attribute's
TimeRanges object, if any, or the current playback position otherwise.
If at any time the user agent learns that an audio or video track has ended and all
media data relating to that track corresponds to parts of the media timeline that
are before the earliest possible position, the user agent may
queue a task to first remove the track from the audioTracks
attribute's AudioTrackList object or the videoTracks attribute's
{{VideoTrackList}} object as appropriate and then fire a trusted event with the
name removetrack, that does not bubble and is not cancelable, and that
uses the {{TrackEvent}} interface, with the track attribute initialized to the
AudioTrack or VideoTrack object representing the track, at the
media element's aforementioned AudioTrackList or {{VideoTrackList}} object.
The duration attribute must return
the time of the end of the media resource, in seconds, on the media timeline.
If no media data is available, then the attributes must return the Not-a-Number (NaN)
value. If the media resource is not known to be bounded (e.g., streaming radio, or a
live event with no announced end time), then the attribute must return the positive Infinity
value.
The user agent must determine the duration of the media resource before playing
any part of the media data and before setting readyState to a value equal
to or greater than HAVE_METADATA, even if doing so requires fetching multiple
parts of the resource.
When the length of the media resource changes to a known value (e.g., from being
unknown to known, or from a previously established length to a new length) the user agent must
queue a task to fire a simple event named
durationchange at the media element. (The event is not fired
when the duration is reset as part of loading a new media resource.) If the duration is
changed such that the current playback position ends up being greater than the time of
the end of the media resource, then the user agent must also
seek to the time of the end of the media resource.
If an "infinite" stream ends for some reason, then the duration would change from positive
Infinity to the time of the last frame or sample in the stream, and the
durationchange event would be fired. Similarly, if the user agent initially
estimated the media resource's duration instead of determining it precisely, and later
revises the estimate based on new information, then the duration would change and the
durationchange event would be fired.
Some video files also have an explicit date and time corresponding to the zero time in the
media timeline, known as the timeline offset. Initially, the
timeline offset must be set to Not-a-Number (NaN).
The getStartDate() method must return
a new Date object representing the current timeline offset.
The loop attribute is a boolean
attribute that, if specified, indicates that the media element is to seek back
to the start of the media resource upon reaching the end.
The loop IDL attribute must
reflect the content attribute of the same name.
Ready states
media . readyState
Returns a value that expresses the current state of the element with respect to rendering the
current playback position, from the codes in the list below.
Media elements have a ready state, which describes to
what degree they are ready to be rendered at the current playback position. The
possible values are as follows; the ready state of a media element at any particular time is the
greatest value describing the state of the element:
HAVE_NOTHING (numeric value 0)
No information regarding the media resource is available. No data for the
current playback position is available. Media elements whose networkState attribute are set
to NETWORK_EMPTY are always in the HAVE_NOTHING state.
HAVE_METADATA (numeric value 1)
Enough of the resource has been obtained that the duration of the resource is available.
In the case of a <{video}> element, the dimensions of the video are also available. No
media data is available for the immediate current playback
position.
HAVE_CURRENT_DATA (numeric value 2)
Data for the immediate current playback position is available, but either not
enough data is available that the user agent could successfully advance the current
playback position in the direction of playback at all without immediately
reverting to the HAVE_METADATA state, or there is no
more data to obtain in the direction of playback. For example, in video this
corresponds to the user agent having data from the current frame, but not the next frame, when
the current playback position is at the end of the current frame; and to when playback has ended.
HAVE_FUTURE_DATA (numeric value 3)
Data for the immediate current playback position is available, as well as
enough data for the user agent to advance the current playback position in the
direction of playback at least a little without immediately reverting to the HAVE_METADATA state, and the text tracks are
ready. For example, in video this corresponds to the user agent having data for at least
the current frame and the next frame when the current playback position is at the
instant in time between the two frames, or to the user agent having the video data for the
current frame and audio data to keep playing at least a little when the current playback
position is in the middle of a frame. The user agent cannot be in this state if playback has ended, as the current playback position
can never advance in this case.
HAVE_ENOUGH_DATA (numeric value 4)
All the conditions described for the HAVE_FUTURE_DATA state are met, and, in addition,
either of the following conditions is also true:
The user agent has entered a state where waiting longer will not result in further data
being obtained, and therefore nothing would be gained by delaying playback any further. (For
example, the buffer might be full.)
In practice, the difference between HAVE_METADATA and
HAVE_CURRENT_DATA is negligible. Really the only time the difference is
relevant is when painting a <{video}> element onto a <{canvas}>, where it distinguishes the
case where something will be drawn (HAVE_CURRENT_DATA or greater) from the case
where nothing is drawn (HAVE_METADATA or less). Similarly, the difference between
HAVE_CURRENT_DATA (only the current frame) and HAVE_FUTURE_DATA
(at least this frame and the next) can be negligible (in the extreme, only one frame). The
only time that distinction really matters is when a page provides an interface for
"frame-by-frame" navigation.
When the ready state of a media element whose networkState is not
NETWORK_EMPTY changes, the user agent must follow the steps given below:
Apply the first applicable set of substeps from the following list:
If the previous ready state was HAVE_NOTHING,
and the new ready state is HAVE_METADATA
Before this task is run, as part of the event loop mechanism, the
rendering will have been updated to resize the <{video}> element if appropriate.
If the previous ready state was
HAVE_METADATA and the new ready state is HAVE_CURRENT_DATA
or greater
If this is the first time this occurs for this media element since the
load() algorithm was last invoked, the user agent must queue a task to
fire a simple event named loadeddata at the element.
If the new ready state is HAVE_FUTURE_DATA or HAVE_ENOUGH_DATA,
then the relevant steps below must then be run also.
If the previous ready state was HAVE_FUTURE_DATA or more, and the new
ready state is HAVE_CURRENT_DATA or less
User agents do not need to support autoplay, and it is suggested that user
agents honor user preferences on the matter. Authors are urged to use the
autoplay attribute rather than using script to force the
video to play, so as to allow the user to override the behavior if so desired.
It is possible for the ready state of a media element to jump between these states
discontinuously. For example, the state of a media element can jump straight from
HAVE_METADATA to HAVE_ENOUGH_DATA without passing through the
HAVE_CURRENT_DATA and HAVE_FUTURE_DATA states.
The readyState IDL attribute must, on
getting, return the value described above that describes the current ready state of the
media element.
The autoplay attribute is a boolean
attribute. When present, the user agent (as described in the algorithm
described herein) will automatically begin playback of the media resource as
soon as it can do so without stopping.
Authors are urged to use the autoplay attribute rather than using script to
trigger automatic playback, as this allows the user to override the automatic playback when
it is not desired, e.g., when using a screen reader. Authors are also encouraged to consider
not using the automatic playback behavior at all, and instead to let the user agent wait for
the user to start playback explicitly.
The autoplay IDL attribute must
reflect the content attribute of the same name.
Playing the media resource
media . {{HTMLMediaElement/paused}}
Returns true if playback is paused; false otherwise.
media . {{HTMLMediaElement/ended}}
Returns true if playback has reached the end of the media resource.
media . <{media/disableRemotePlayback}>
Whether the remote playback of a media element is disabled.
media . defaultPlaybackRate [ = value ]
Returns the default rate of playback, for when the user is not fast-forwarding or reversing
through the media resource.
Can be set, to change the default rate of playback.
The default rate has no direct effect on playback, but if the user switches to a fast-forward
mode, when they return to the normal playback mode, it is expected that the rate of playback
will be returned to the default rate of playback.
media . playbackRate [ = value ]
Returns the current rate playback, where 1.0 is normal speed.
Can be set, to change the rate of playback.
media . played
Returns a TimeRanges object that represents the ranges of the media
resource that the user agent has played.
media . {{HTMLMediaElement/play()}}
Sets the {{HTMLMediaElement/paused}} attribute to false, loading the
media resource and beginning playback if necessary. If the playback had ended, will
restart it from the start.
media . pause()
Sets the {{HTMLMediaElement/paused}} attribute to true, loading the
media resource if necessary.
A waiting DOM event can be fired as a result of an element that is
potentially playing stopping playback due to its readyState attribute
changing to a value lower than HAVE_FUTURE_DATA.
A media element is said to have
ended playback when:
The element's readyState attribute is HAVE_METADATA or greater, and
The ended attribute must return true if,
the last time the event loop reached step 1, the media element had
ended playback and the direction of playback was forwards, and false otherwise.
A media element is said to have stopped due to errors when the
element's readyState attribute is HAVE_METADATA or greater, and the
user agent encounters a non-fatal error during the
processing of the media data, and due to that error, is not able to play the content at
the current playback position.
A media element is said to have paused for user interaction when its
{{HTMLMediaElement/paused}} attribute is false, the {{HTMLMediaElement/readyState}} attribute is
either HAVE_FUTURE_DATA or HAVE_ENOUGH_DATA and the user agent has
reached a point in the media resource where the user has to make a selection for the
resource to continue.
It is possible for a media element to have both ended playback and
paused for user interaction at the same time.
When a media element that is potentially playing stops playing
because it has paused for user interaction, the user agent must queue a
task to fire a simple event named timeupdate at the element.
A media element is said to have paused for in-band content when its
{{HTMLMediaElement/paused}} attribute is false, the {{HTMLMediaElement/readyState}} attribute is
either HAVE_FUTURE_DATA or HAVE_ENOUGH_DATA and the user agent has
suspended playback of the media resource in order to play content that is temporally
anchored to the media resource and has a non-zero length, or to play content that is
temporally anchored to a segment of the media resource but has a length longer than
that segment.
Queue a task that, if the media element has still ended playback, and
the direction of playback is still forwards, and {{HTMLMediaElement/paused}} is false,
changes {{HTMLMediaElement/paused}} to true and fires a simple event named
pause at the media element, [=take pending play promises=] and
[=reject pending play promises=] with the result and an {{AbortError}} {{DOMException}}.
The word "reaches" here does not imply that the current playback position needs to have
changed during normal playback; it could be via seeking, for instance.
The remote attribute MUST return the
RemotePlayback object associated with the media element.
The disableRemotePlayback attribute is a
boolean attribute. When present, the user agent MUST NOT play the
media element remotely or present any UI to do so.
When the <{media/disableRemotePlayback}> attribute is added to the
media element, the user agent MUST run the steps to
disable remote playback.
A corresponding disableRemotePlayback
IDL attribute which reflects the value of each element’s <{media/disableRemotePlayback}> content
attribute is added to the {{HTMLMediaElement}} interface.
The {{HTMLMediaElement/disableRemotePlayback}} IDL attribute MUST reflect the content
attribute of the same name.
The defaultPlaybackRate attribute
gives the desired speed at which the media resource is to play, as a multiple of its
intrinsic speed. The attribute is mutable: on getting it must return the last value it was set
to, or 1.0 if it hasn't yet been set; on setting the attribute must be set to the new value.
The playbackRate attribute gives the
effective playback rate which is the speed at which the media resource plays, as a
multiple of its intrinsic speed. If it is not equal to the
{{HTMLMediaElement/defaultPlaybackRate}}, then the implication is that the user is using a
feature such as fast forward or slow motion playback. The attribute is mutable: on getting it
must return the last value it was set to, or 1.0 if it hasn't yet been set; on setting the
attribute must be set to the new value, and the playback will change speed (if the element is
potentially playing).
When the {{HTMLMediaElement/defaultPlaybackRate}} or {{HTMLMediaElement/playbackRate}} attributes
change value (either by being set by script or by being changed directly by the user agent, e.g.,
in response to user control) the user agent must queue a task to
fire a simple event named ratechange at the media element.
The played attribute must return a new
static normalized TimeRanges object that represents the ranges of points on
the media timeline of the media resource reached through the usual monotonic
increase of the current playback position during normal playback, if any, at the time
the attribute is evaluated.
Each [=media element=] has a list of pending play promises, which must be initially
empty.
To take pending play promises for a media element, the user agent must run
the following steps:
Let promises be an empty list of promises.
Copy the [=media element=]'s [=list of pending play promises=] to promises.
Clear the [=media element=]'s [=list of pending play promises=].
Return *promise*s.
To resolve pending play promises for a [=media element=] with a list of
promises promises, the user agent must resolve each promise in
promises with undefined.
To reject pending play promises for a [=media element=] with a list of promise
promises and an exception name error, the user agent must reject each
promise in promises with error.
To notify about playing for a [=media element=], the user agent must run the
following steps:
When the play() method on a
media element is invoked, the user agent must run the following steps.
If the [=media element=] is not allowed to play, return a promise rejected
with a "{{NotAllowedError}}" {{DOMException}}.
For example, a user agent could require user interaction in order to start
playback. This specification does not require any particular behavior.
If the [=media element=]'s {{HTMLMediaElement/error}} attribute is not null and its
{{MediaError/code}} is {{MEDIA_ERR_SRC_NOT_SUPPORTED}}, return a promise rejected
with a "{{NotSupportedError}}" DOMException.
This means that the dedicated media source failure steps have run. Playback is not
possible until the media element load algorithm clears the error attribute.
If the media element's {{HTMLMediaElement/readyState}}
attribute has the value HAVE_NOTHING, HAVE_METADATA, or HAVE_CURRENT_DATA, queue a task to
fire a simple event named waiting at the
element.
Otherwise, the media element's {{HTMLMediaElement/readyState}} attribute has the value HAVE_FUTURE_DATA or HAVE_ENOUGH_DATA: queue a task to
notify about playing at the element.
The effective playback rate can be 0.0, in which case the
current playback position doesn't move, despite playback not being paused
({{HTMLMediaElement/paused}} doesn't become true, and the pause event
doesn't fire).
This specification doesn't define how the user agent achieves the appropriate playback rate —
depending on the protocol and media available, it is plausible that the user agent could
negotiate with the server to have the server provide the media data at the appropriate rate,
so that (except for the period between when the rate is changed and when the server updates
the stream's playback rate) the client doesn't actually have to drop or interpolate any frames.
Any time the user agent provides a stable state, the official playback position
must be set to the current playback position.
While the direction of playback is backwards, any corresponding audio must be
muted. While the effective playback rate is so low or so high that the user agent
cannot play audio usefully, the corresponding audio must also be muted. If the
effective playback rate is not 1.0, the user agent may apply pitch adjustments to the
audio as necessary to render it faithfully.
Media elements that are potentially playing while not
in a Document must not play any video, but should play any audio component.
Media elements must not stop playing just because all references to them have been removed; only
once a media element is in a state where no further audio could ever be played by that element
may the element be garbage collected.
It is possible for an element to which no explicit references exist to play audio, even if such
an element is not still actively playing: for instance, a media element whose
media resource has no audio tracks could eventually play audio again if it had an
event listener that changes the media resource.
Let last time be the current playback position at the
time this algorithm was last run for this media element, if this is not the first
time it has run.
If the current playback position has, since the last time this algorithm was
run, only changed through its usual monotonic increase during normal playback, then let
missed cues be the list of cues in other cues whose
start times are greater than or equal to last time and whose
end times are less than or equal to the current playback position.
Otherwise, let missed cues be an empty list.
If the time was reached through the usual monotonic increase of the
current playback position during normal playback, and if the user agent has not fired
a timeupdate event at the element in the past 15 to 250ms and
is not still running event handlers for such an event, then the user agent must
queue a task to fire a simple event named timeupdate at the
element. (In the other cases, such as explicit seeks, relevant events get fired as part of
the overall process of changing the current playback position.)
The event thus is not to be fired faster than about 66Hz or slower than 4Hz
(assuming the event handlers don't take longer than 250ms to run). User agents are
encouraged to vary the frequency of the event based on the system load and the average cost
of processing the event each time, so that the UI updates are not any more frequent than the
user agent can comfortably handle while decoding the video.
In the other cases, such as explicit seeks, playback is not paused by going past
the end time of a cue, even if that cue has its
text track cue pause-on-exit flag set.
Let events be a list of tasks, initially empty. Each task in this
list will be associated with a text track, a text track cue, and a time, which
are used to sort the list before the tasks are queued.
Let affected tracks be a list of text tracks, initially empty.
When the steps below say to prepare an event named event for a
text track cuetarget with a time time, the
user agent must run these substeps:
Sort the tasks in events in ascending time order (tasks with
earlier times first).
Further sort tasks in events that have the same time by the relative
text track cue order of the text track cues associated with these tasks.
Finally, sort tasks in events that have the same time and same
text track cue order by placing tasks that fire enter events before
those that fire exit events.
Returns true if the user agent is currently seeking.
media . seekable
Returns a TimeRanges object that represents the ranges of the media
resource to which it is possible for the user agent to seek.
media . fastSeek( time )
Seeks to near the given time as fast as possible, trading precision for
speed. (To seek to a precise time, use the currentTime attribute.)
This does nothing if the media resource has not been loaded.
The seeking attribute must initially
have the value false.
The fastSeek(double time) method
must seek to the time given by the method's argument, with the
approximate-for-speed flag set.
When the user agent is required to seek to a particular
new playback position in the media resource, optionally with the
approximate-for-speed flag set, it means that the user agent must run the following steps.
This algorithm interacts closely with the event loop mechanism; in particular, it has
a synchronous section (which is triggered as part of the event loop
algorithm). Steps in that section are marked with ⌛.
If the media element's readyState
is HAVE_NOTHING, abort these steps.
If the element's seeking IDL attribute is true,
then another instance of this algorithm is already running. Abort that other instance of the
algorithm without waiting for the step that it is running to complete.
Set the seeking IDL attribute to true.
If the seek was in response to a DOM method call or setting of an IDL attribute, then
continue the script. The remainder of these steps must be run in parallel. With the
exception of the steps marked with ⌛, they could be aborted at any time by another instance
of this algorithm being invoked.
If the new playback position is later than the end of the media
resource, then let it be the end of the media resource instead.
If the new playback position is less than the earliest possible
position, let it be that position instead.
If the (possibly now changed) new playback position is not in one of
the ranges given in the seekable attribute, then let it
be the position in one of the ranges given in the seekable attribute that is the
nearest to the new playback position. If two positions both satisfy that constraint
(i.e., the new playback position is exactly in the middle between two ranges in the
seekable attribute) then use the position that is closest to
the current playback position. If there are no ranges given in the
seekable attribute then set the seeking IDL attribute to false
and abort these steps.
If the approximate-for-speed flag is set, adjust the new playback
position to a value that will allow for playback to resume promptly. If new
playback position before this step is before current playback position, then
the adjusted new playback position must also be before the current
playback position. Similarly, if the new playback position before
this step is after current playback position, then the adjusted new
playback position must also be after the current playback position.
For example, the user agent could snap to a nearby key frame, so that it doesn't have to
spend time decoding then discarding intermediate frames before resuming playback.
If the media element was potentially playing immediately before it started
seeking, but seeking caused its readyState attribute to change to a value lower
than HAVE_FUTURE_DATA, then a waiting event will be
fired at the element.
This step sets the current playback position, and thus can immediately trigger other
conditions, such as the rules regarding when playback "reaches the end of the
media resource" (part of the logic that handles looping), even before the user
agent is actually able to render the media data for that position (as determined in the
next step).
Wait until the user agent has established whether or not the media data for
the new playback position is available, and, if it is, until it has decoded
enough data to play back that position.
The seekable attribute must return a new
static normalized TimeRanges object that represents the ranges of the
media resource, if any, that the user agent is able to seek to, at the time the
attribute is evaluated.
If the user agent can seek to anywhere in the media resource, e.g., because it is a
simple movie file and the user agent and the server support HTTP Range requests, then the
attribute would return an object with one range, whose start is the time of the first
frame (the earliest possible position, typically zero), and whose end is the
time of the last frame.
The range might be continuously changing, e.g., if the user agent is buffering a sliding
window on an infinite stream. This is the behavior seen with DVRs viewing live TV,
for instance.
User agents should adopt a very liberal and optimistic view of what is seekable. User
agents should also buffer recent content where possible to enable seeking to be fast.
For instance, consider a large video file served on an HTTP server without support for
HTTP Range requests. A browser could implement this by only buffering the current
frame and data obtained for subsequent frames, never allow seeking, except for seeking to
the very start by restarting the playback. However, this would be a poor implementation. A high
quality implementation would buffer the last few minutes of content (or more, if sufficient
storage space is available), allowing the user to jump back and rewatch something surprising
without any latency, and would in addition allow arbitrary seeking by reloading the file from
the start if necessary, which would be slower but still more convenient than having to
literally restart the video and watch it all the way through just to get to an earlier
unbuffered spot.
Media resources might be internally scripted or interactive. Thus, a media element
could play in a non-linear fashion. If this happens, the user agent must act as if the algorithm
for seeking was used whenever the current playback position changes in a
discontinuous fashion (so that the relevant events fire).
Media resources with multiple media tracks
A media resource can have multiple embedded audio and video tracks. For example,
in addition to the primary video and audio tracks, a media resource could have
foreign-language dubbed dialogs, director's commentaries, audio descriptions, alternative
angles, or sign-language overlays.
media . audioTracks
Returns an AudioTrackList object representing the audio tracks available in the
media resource.
media . videoTracks
Returns a {{VideoTrackList}} object representing the video tracks available in the
media resource.
There are only ever one AudioTrackList object and one {{VideoTrackList}} object
per media element, even if another media resource is loaded into the element:
the objects are reused. (The AudioTrack and VideoTrack objects
are not, though.)
In this example, a script defines a function that takes the URL of a video, and a reference to
an element where the video is to be placed. The function then tries to load the video, and,
once it is loaded, checks to see if there is a sign-language track available. If there is, it
also displays that track. Both tracks are placed in the given container; it's assumed that
styles have been applied to appropriately display the video and sign-language track!.
AudioTrackList and VideoTrackList objects
The AudioTrackList and VideoTrackList interfaces are used by
attributes defined in the previous section.
Returns the specified AudioTrack or VideoTrack object.
audioTrack = media . audioTracks . getTrackById( id )
videoTrack = media . videoTracks . getTrackById( id )
Returns the AudioTrack or VideoTrack object with the given identifier, or null if no track has that identifier.
audioTrack . id
videoTrack . id
Returns the ID of the given track. This is the ID that can be used with a fragment
if the format supports the media fragments syntax, and that can be used with
the getTrackById() method. [[!MEDIA-FRAGS]]
Returns the label of the given track, if known, or the empty string otherwise.
audioTrack . language
videoTrack . language
Returns the language of the given track, if known, or the empty string otherwise.
audioTrack . enabled [ = value ]
Returns true if the given track is active, and false otherwise.
Can be set, to change whether the track is enabled or not. If multiple audio tracks are
enabled simultaneously, they are mixed.
media . videoTracks . selectedIndex
Returns the index of the currently selected track, if any, or -1 otherwise.
videoTrack . selected [ = value ]
Returns true if the given track is active, and false otherwise.
Can be set, to change whether the track is selected or not. Either zero or one video track is
selected; selecting a new track while a previous one is selected will unselect the previous
one.
An AudioTrackList object represents a dynamic list of zero or more audio tracks,
of which zero or more can be enabled at a time. Each audio track is represented by an
AudioTrack object.
A {{VideoTrackList}} object represents a dynamic list of zero or more video tracks, of
which zero or one can be selected at a time. Each video track is represented by a
VideoTrack object.
Tracks in AudioTrackList and {{VideoTrackList}} objects must be
consistently ordered. If the media resource is in a format that defines an order,
then that order must be used; otherwise, the order must be the relative order in which the tracks
are declared in the media resource. The order used is called the natural order
of the list.
Each track in one of these objects thus has an index; the first has the index 0, and each
subsequent track is numbered one higher than the previous one. If a media resource
dynamically adds or removes audio or video tracks, then the indices of the tracks will change
dynamically. If the media resource changes entirely, then all the previous tracks will
be removed and replaced with new tracks.
The AudioTrackList.length and
VideoTrackList.length
attributes must return the number of tracks represented by their objects at the time of getting.
The supported property indices of AudioTrackList and {{VideoTrackList}}
objects at any instant are the numbers from zero to the number of tracks represented by the
respective object minus one, if any tracks are represented. If an AudioTrackList
or {{VideoTrackList}} object represents no tracks, it has no supported property indices.
To determine the value of an indexed property for a given index index in an
AudioTrackList or {{VideoTrackList}} object list, the user agent must
return the AudioTrack or VideoTrack object that represents the
indexth track in list.
The
AudioTrackList.getTrackById(id)
and
VideoTrackList.getTrackById(id)
methods must return the first {{AudioTrack}} or {{VideoTrack}} object (respectively) in the
{{AudioTrackList}} or {{VideoTrackList}} object (respectively) whose identifier is equal to the
value of the id argument (in the natural order of the list, as defined above). When no
tracks match the given argument, the methods must return null.
The AudioTrack and VideoTrack objects represent specific tracks of a
media resource. Each track can have an identifier, category, label, and language.
These aspects of a track are permanent for the lifetime of the track; even if a track is removed
from a media resource's AudioTrackList or VideoTrackList
objects, those aspects do not change.
In addition, AudioTrack objects can each be enabled or disabled; this is the audio
track's enabled state. When an AudioTrack is created, its
enabled state must be set to false (disabled). The resource fetch algorithm
can override this.
Similarly, a single VideoTrack object per {{VideoTrackList}} object can
be selected, this is the video track's selection state. When a VideoTrack is
created, its selection state must be set to false (not selected). The resource fetch algorithm can override this.
The AudioTrack.id and
VideoTrack.id attributes must return
the identifier of the track, if it has one, or the empty string otherwise. If the
media resource is in a format that supports the Media Fragments URI fragment
identifier syntax, the identifier returned for a particular track must be the same identifier
that would enable the track if used as the name of a track in the track dimension of such a
fragment identifier. [[!MEDIA-FRAGS]] [[INBANDTRACKS]]
For example, in Ogg files, this would be the Name header field of the track. [[!OGGSKELETON]]
The AudioTrack.kind and
VideoTrack.kind attributes must
return the category of the track, if it has one, or the empty string otherwise.
The category of a track is the string given in the first column of the table below that is the
most appropriate for the track based on the definitions in the table's second and third columns,
as determined by the metadata included in the track in the media resource. The cell
in the third column of a row says what the category given in the cell in the first column of that
row applies to; a category is only appropriate for an audio track if it applies to audio tracks,
and a category is only appropriate for video tracks if it applies to video tracks. Categories
must only be returned for AudioTrack objects if they are appropriate for audio,
and must only be returned for VideoTrack objects if they are appropriate for video.
Return values for AudioTrack.kind and VideoTrack.kind
Category
Definition
Applies to...
"alternative"
A possible alternative to the main track, e.g., a different take of a song (audio), or a different angle (video).
Audio and video.
"captions"
A version of the main video track with captions burnt in. (For legacy content; new content would use text tracks.)
Video only.
"descriptions"
An audio description of a video track.
Audio only.
"main"
The primary audio or video track.
Audio and video.
"main-desc"
The primary audio track, mixed with audio descriptions.
Audio only.
"sign"
A sign-language interpretation of an audio track.
Video only.
"subtitles"
A version of the main video track with subtitles burnt in. (For legacy content; new content would use text tracks.)
Video only.
"translation"
A translated version of the main audio track.
Audio only.
"commentary"
Commentary on the primary audio or video track, e.g., a director's commentary.
Audio and video.
"" (empty string)
No explicit kind, or the kind given by the track's metadata is not recognized by the user agent.
Audio and video.
The AudioTrack.label and
VideoTrack.label attributes must
return the label of the track, if it has one, or the empty string otherwise. [[INBANDTRACKS]]
The AudioTrack.language and
VideoTrack.language attributes
must return the BCP 47 language tag of the language of the track, if it has one, or the empty
string otherwise. If the user agent is not able to express that language as a BCP 47 language tag
(for example because the language information in the media resource's format is a
free-form string without a defined interpretation), then the method must return the empty
string, as if the track had no language.
Source attribute values for id, kind, label and language of multitrack audio and video tracks as
described for the relevant media resource format. [[INBANDTRACKS]]
The AudioTrack.enabled attribute,
on getting, must return true if the track is currently enabled, and false otherwise. On setting,
it must enable the track if the new value is true, and disable it otherwise. (If the track is no
longer in an {{AudioTrackList}} object, then the track being enabled or disabled has no
effect beyond changing the value of the attribute on the {{AudioTrack}} object.)
Whenever an audio track in an AudioTrackList that was disabled is enabled, and
whenever one that was enabled is disabled, the user agent must queue a task to
fire a simple event named change at the AudioTrackList object.
An audio track that has no data for a particular position on the media timeline,
or that does not exist at that position, must be interpreted as being silent at that point on the
timeline.
The
VideoTrackList.selectedIndex
attribute must return the index of the currently selected track, if any. If the {{VideoTrackList}}
object does not currently represent any tracks, or if none of the tracks are selected, it must
instead return -1.
The VideoTrack.selected
attribute, on getting, must return true if the track is currently selected, and false otherwise.
On setting, it must select the track if the new value is true, and unselect it otherwise. If the
track is in a {{VideoTrackList}}, then all the other {{VideoTrack}} objects in that list must
be unselected. (If the track is no longer in a {{VideoTrackList}} object, then the track
being selected or unselected has no effect beyond changing the value of the attribute on the
{{VideoTrack}} object.)
Whenever a track in a VideoTrackList that was previously not selected is selected,
and whenever the selected track in a VideoTrackList is unselected without a new
track being selected in its stead, the user agent must queue a task to
fire a simple event named change at the {{VideoTrackList}} object.
This task must be queued before the task that fires the
resize event, if any.
A video track that has no data for a particular position on the media timeline
must be interpreted as being fully transparent black at that point on the timeline, with the same
dimensions as the last frame before that position, or, if the position is before all the data for
that track, the same dimensions as the first frame for that track. A track that does not exist at
all at the current position must be treated as if it existed but had no data.
For instance, if a video has a track that is only introduced after one hour of playback, and
the user selects that track then goes back to the start, then the user agent will act as if
that track started at the start of the media resource but was simply transparent
until one hour in.
Selecting specific audio and video tracks declaratively
The audioTracks and videoTracks attributes allow scripts to select
which track should play, but it is also possible to select specific tracks declaratively, by
specifying particular tracks in the fragment of the [=url/URL=] of the
media resource. The format of the fragment depends on the
MIME type of the media resource. [[!RFC2046]] [[!URL]]
In this example, a video that uses a format that supports the
media fragments syntax is embedded in such a way that the alternative angles
labeled "Alternative" are enabled instead of the default video track. [[!MEDIA-FRAGS]]
Timed text tracks
Text track model
A media element can have a group of associated text tracks, known as the
media element's list of text tracks. The text tracks are sorted
as follows:
This decides how the track is handled by the user agent. The kind is represented by a string.
The possible strings are:
subtitles
captions
descriptions
chapters
metadata
The kind of track can change dynamically, in the case of
a text track corresponding to a <{track}> element.
A label
This is a human-readable string intended to identify the track for the user.
The label of a track can change dynamically, in the
case of a text track corresponding to a <{track}> element.
When a text track label is the empty string, the user agent should automatically
generate an appropriate label from the text track's other properties (e.g., the kind of text
track and the text track's language) for use in its user interface.
This automatically-generated label is not exposed in the API.
An in-band metadata track dispatch type
This is a string extracted from the media resource specifically for in-band
metadata tracks to enable such tracks to be dispatched to different scripts in the document.
For example, a traditional TV station broadcast streamed on the Web and
augmented with Web-specific interactive features could include text tracks with metadata for ad
targeting, trivia game data during game shows, player states during sports games, recipe
information during food programs, and so forth. As each program starts and ends, new tracks
might be added or removed from the stream, and as each one is added, the user agent could bind
them to dedicated script modules using the value of this attribute.
This is a string (a BCP 47 language tag) representing the language of the text track's cues.
[[!BCP47]]
The language of a text track can change dynamically,
in the case of a text track corresponding to a <{track}> element.
A readiness state
One of the following:
Not loaded
Indicates that the text track's cues have not been obtained.
Loading
Indicates that the text track is loading and there have been no fatal errors encountered so
far. Further cues might still be added to the track by the parser.
Loaded
Indicates that the text track has been loaded with no fatal errors.
Failed to load
Indicates that the text track was enabled, but when the user agent attempted to obtain it,
this failed in some way (e.g., [=url/URL=] could not be parsed, network error,
unknown text track format). Some or all of the cues are likely missing and will not be
obtained.
Indicates that the text track is not active. Other than for the purposes of exposing the
track in the DOM, the user agent is ignoring the text track. No cues are active, no events are
fired, and the user agent will not attempt to obtain the track's cues.
Hidden
Indicates that the text track is active, but that the user agent is not actively displaying
the cues. If no attempt has yet been made to obtain the track's cues, the user agent will
perform such an attempt momentarily. The user agent is maintaining a list of which cues are
active, and events are being fired accordingly.
Showing
Indicates that the text track is active. If no attempt has yet been made to obtain the
track's cues, the user agent will perform such an attempt momentarily. The user agent is
maintaining a list of which cues are active, and events are being fired accordingly. In
addition, for text tracks whose kind is subtitles or
captions, the cues are being overlaid on the video as appropriate; for text
tracks whose kind is descriptions, the user agent is making the
cues available to the user in a non-visual fashion; and for text tracks whose kind
is chapters, the user agent is making available to the user a mechanism by
which the user can navigate to any point in the media resource by selecting a cue.
The task source for the tasks listed in this section is the
DOM manipulation task source.
A text track cue is the unit of time-sensitive data in a
text track, corresponding for instance for subtitles and captions to the text that
appears at a particular time and disappears at another time.
Each text track cue consists of:
An identifier
An arbitrary string.
A start time
The time, in seconds and fractions of a second, that describes the beginning of the range of
the media data to which the cue applies.
An end time
The time, in seconds and fractions of a second, that describes the end of the range of the
media data to which the cue applies.
A pause-on-exit flag
A boolean indicating whether playback of the media resource is to pause when the
end of the range to which the cue applies is reached.
Some additional format-specific data
Additional fields, as needed for the format. For example, WebVTT has a text track cue
writing direction and so forth. [[WEBVTT]]
Rules for extracting the chapter title
An algorithm which, when applied to the cue, returns a string that can be used in user
interfaces that use the cue as a chapter title.
Each text track cue has a corresponding TextTrackCue object (or more
specifically, an object that inherits from TextTrackCue — for example, WebVTT
cues use the VTTCue interface). A text track cue's in-memory
representation can be dynamically changed through this TextTrackCue API. [[WEBVTT]]
A text track cue is associated with rules for updating the text track rendering,
as defined by the specification for the specific kind of text track cue. These rules are
used specifically when the object representing the cue is added to a {{TextTrack}} object using
the addCue() method.
In addition, each text track cue has two pieces of dynamic information:
This is used as part of the rendering model, to keep cues in a consistent position. It must
initially be empty. Whenever the text track cue active flag is unset, the user
agent must empty the text track cue display state.
The text track cues of a media element's text tracks are ordered relative
to each other in the text track cue order, which is determined as follows: first
group the cues by their text track, with the groups being sorted in the same order
as their text tracks appear in the media element's list of text tracks;
then, within each group, cues must be sorted by their start time, earliest first;
then, any cues with the same start time must be sorted by their end time,
latest first; and finally, any cues with identical end times must be sorted in the
order they were last added to their respective text track list of cues, oldest first (so
e.g., for cues from a WebVTT file, that would initially be the order in which the cues were
listed in the file). [[WEBVTT]]
Sourcing in-band text tracks
A media-resource-specific text track is a text track that corresponds
to data found in the media resource.
Rules for processing and rendering such data are defined by the relevant specifications, e.g.,
the specification of the video format if the media resource is a video. Details for
some legacy formats can be found in the Sourcing In-band Media Resource Tracks from Media
Containers into HTML specification. [[INBANDTRACKS]]
When a media resource contains data that the user agent recognizes and supports as
being equivalent to a text track, the user agent runs the steps to expose a
media-resource-specific text track with the relevant data, as follows.
Set the new text track's kind, label, and
language based on the semantics of the relevant data, as defined for the relevant
format [[INBANDTRACKS]]. If there is no label in that data, then the
label must be set to the empty string.
Let stream type be the value of the "stream_type" field describing the
text track's type in the file's program map section, interpreted as an 8-bit unsigned
integer.
Let length be the value of the "ES_info_length" field for the track in the
same part of the program map section, interpreted as an integer as defined by the MPEG-2
specification. Let descriptor bytes be the length bytes
following the "ES_info_length" field. The text track in-band metadata track dispatch
type must be set to the concatenation of the stream type byte and
the zero or more descriptor bytes bytes, expressed in hexadecimal using
uppercase ASCII hex digits. [[!MPEG2TS]]
Let the
first stsd box of the
first stbl box of the
first minf box of the
first mdia box of the
text track's trak box in the
first moov box
of the file be the stsd box, if any.
If the file has no stsd box, or if the stsd box has neither a mett box nor a metx box, then the text track
in-band metadata track dispatch type must be set to the empty string.
Otherwise, if the stsd box has a mett box then the text
track in-band metadata track dispatch type must be set to the concatenation of the
string "mett", a U+0020 SPACE character, and the value of the first mime_format field of the first mett box of the stsd
box, or the empty string if that field is absent in that box.
Otherwise, if the stsd box has no mett box but has a metx box then the text track in-band metadata track dispatch type
must be set to the concatenation of the string "metx", a U+0020 SPACE
character, and the value of the first namespace field of the first metx box of the stsd box, or the empty string if that field is absent in
that box.
[[!MPEG4]]
Set the new text track's mode to the
mode consistent with the user's preferences and the requirements of the relevant specification
for the data.
For instance, if there are no other active subtitles, and this is a forced subtitle track
(a subtitle track giving subtitles in the audio track's primary language, but only for audio
that is actually in another language), then those subtitles might be activated here.
Fire a trusted event with the name addtrack, that does not
bubble and is not cancelable, and that uses the {{TrackEvent}} interface, with the
track attribute initialized to the text track's {{TextTrack}} object,
at the media element's textTracks attribute's
{{TextTrackList}} object.
Sourcing out-of-band text tracks
When a <{track}> element is created, it must be associated with a new text track (with
its value set as defined below) and its corresponding new {{TextTrack}} object.
The text track kind is determined from the state of the element's kind
attribute according to the following table; for a state given in a cell of the first column,
the kind is the string given in the second column:
The text track label is the element's track label.
The text track language is the element's track language, if any, or
the empty string otherwise.
As the kind, label, and srclang attributes are set,
changed, or removed, the text track must update accordingly, as per the definitions above.
Changes to the track URL are handled in the algorithm below.
When the user agent is required to
honor user preferences for automatic text track selection for a media element,
the user agent must run the following steps:
For example, the user could have set a browser preference to the effect of "I want French
captions whenever possible", or "If there is a subtitle track with "Commentary" in
the title, enable it", or "If there are audio description tracks available, enable one,
ideally in Swiss German, but failing that in Standard Swiss German or Standard German".
Otherwise, if there are any text tracks in candidates that correspond to
<{track}> elements with a default attribute set whose text track mode is
set to disabled, then set the text track mode of the first such track to
showing.
The <{track}> element's parent element changes and the new parent is a
media element.
When a user agent is to start the track processing model for a
text track and its <{track}> element, it must run the following algorithm.
This algorithm interacts closely with the event loop mechanism; in particular, it has
a synchronous section (which is triggered as part of the event loop
algorithm). The steps in that section are marked with ⌛.
If another occurrence of this algorithm is already running for this text track and
its <{track}> element, abort these steps, letting that other algorithm take care of this
element.
⌛ Let URL be the track URL of the <{track}> element.
⌛ If the <{track}> element's parent is a media element then
let corsAttributeState be the state of the parent media element's
crossorigin content attribute. Otherwise, let
corsAttributeState be No CORS.
The tasksqueued by the fetching algorithm on the
networking task source to process the data as it is being fetched must determine the
type of the resource. If the type of the resource is not a supported text track format, the
load will fail, as described below. Otherwise, the resource's data must be passed to the
appropriate parser (e.g., the WebVTT parser) as it is received, with the
text track list of cues being used for that parser's output. [[WEBVTT]]
This specification does not currently say whether or how to check the MIME types of text
tracks, or whether or how to perform file type sniffing using the actual file data.
Implementors differ in their intentions on this matter and it is therefore unclear what
the right solution is. In the absence of any requirement here, the HTTP specification's
strict requirement to follow the Content-Type header prevails ("Content-Type specifies the
media type of the underlying data." ... "If and only if the media type is not given by a
Content-Type field, the recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the resource.").
Whenever a <{track}> element has its src attribute set, changed, or removed, the
user agent must immediately empty the element's text track's text track list of
cues. (This also causes the algorithm above to stop adding cues from the resource being
obtained using the previously given URL, if any.)
Guidelines for exposing cues in various formats as text track cues
How a specific format's text track cues are to be interpreted for the purposes of processing by
an HTML user agent is defined by that format [[INBANDTRACKS]]. In the absence of such a
specification, this section provides some constraints within which implementations can
attempt to consistently expose such formats.
To support the text track model of HTML, each unit of timed data is converted to a
text track cue. Where the mapping of the format's features to the aspects of a
text track cue as defined in this specification are not defined, implementations must
ensure that the mapping is consistent with the definitions of the aspects of a
text track cue as defined above, as well as with the following constraints:
textTrack = media . textTracks . getTrackById( id )
Returns the {{TextTrack}} object with the given identifier, or null if no track has that identifier.
A {{TextTrackList}} object represents a dynamically updating list of text tracks in
a given order.
The textTracks attribute of media elements
must return a {{TextTrackList}} object representing the {{TextTrack}} objects of the
text tracks in the media element's list of text tracks, in the same order
as in the list of text tracks.
The length attribute of a {{TextTrackList}} object must
return the number of text tracks in the list represented by the {{TextTrackList}} object.
The supported property indices of a {{TextTrackList}} object at any
instant are the numbers from zero to the number of text tracks in
the list represented by the {{TextTrackList}} object minus one, if any. If there are no
text tracks in the list, there are no supported property indices.
To determine the value of an indexed property of a {{TextTrackList}}
object for a given index index, the user agent must return the
indexth text track in the list represented by the {{TextTrackList}} object.
The getTrackById(id) method must return the
first {{TextTrack}} in the {{TextTrackList}} object whose id IDL
attribute would return a value equal to the value of the id argument. When no tracks
match the given argument, the method must return null.
Returns the text track label, if there is one, or the empty string otherwise
(indicating that a custom label probably needs to be generated from the other attributes of the
object if the object is exposed to the user).
Returns the ID of the given track.
For in-band tracks, this is the ID that can be used with a fragment if the format
supports the media fragments syntax/cite>, and that can be used with the getTrackById() method. [[!MEDIA-FRAGS]]
For {{TextTrack}} objects corresponding to <{track}> elements, this is the
ID of the <{track}> element.
Queue a task to fire a trusted event with the name addtrack, that does not bubble and is not cancelable, and
that uses the {{TrackEvent}} interface, with the track attribute initialized to the new text
track's {{TextTrack}} object, at the media element's textTracks attribute's TextTrackList
object.
Return the new {{TextTrack}} object.
The kind attribute must return the
text track kind of the text track that the {{TextTrack}} object
represents.
The label attribute must return the
text track label of the text track that the {{TextTrack}}
object represents.
The language attribute must return the
text track language of the text track that the {{TextTrack}}
object represents.
The id attribute returns the track's
identifier, if it has one, or the empty string otherwise. For tracks that correspond to
<{track}> elements, the track's identifier is the value of the element's <{global/id}> attribute, if any. For in-band tracks, the track's identifier is
specified by the media resource. If the media resource is in a format
that supports the media fragments syntax, the identifier
returned for a particular track must be the same identifier that would enable the track if used as
the name of a track in the track dimension of such a fragment. [[!MEDIA-FRAGS]]
The inBandMetadataTrackDispatchType
attribute must return the text track in-band metadata track dispatch type of the
text track that the {{TextTrack}} object represents.
The mode attribute, on getting, must return
the string corresponding to the text track mode of the text track that
the {{TextTrack}} object represents, as defined by the following list:
The removeCue(cue)
method of {{TextTrack}} objects, when invoked, must run the following steps:
If the given cue is not currently listed in the method's
{{TextTrack}} object's text track's text track list of cues,
then throw a {{NotFoundError}} exception and abort these steps.
In this example, an <{audio}> element is used to play a specific sound-effect from a
sound file containing many sound effects. A cue is used to pause the audio, so that it ends
exactly at the end of the clip, even if the browser is busy running some script. If the page had
relied on script to pause the audio, then the start of the next clip might be heard if the
browser was not able to run the script at the exact time specified.
var sfx = new Audio('sfx.wav');
var sounds = sfx.addTextTrack('metadata');
// add sounds we care about
function addFX(start, end, name) {
var cue = new VTTCue(start, end, '');
cue.id = name;
cue.pauseOnExit = true;
sounds.addCue(cue);
}
addFX(12.783, 13.612, 'dog bark');
addFX(13.612, 15.091, 'kitten mew'))
function playSound(id) {
sfx.currentTime = sounds.getCueById(id).startTime;
sfx.play();
}
// play a bark as soon as we can
sfx.oncanplaythrough = function () {
playSound('dog bark');
}
// meow when the user tries to leave
window.onbeforeunload = function () {
playSound('kitten mew');
return 'Are you sure you want to leave this awesome page?';
}
interface TextTrackCueList {
readonly attribute unsigned long length;
getter TextTrackCue (unsigned long index);
TextTrackCue? getCueById(DOMString id);
};
A TextTrackCueList object represents a dynamically updating list of text track cues in a given order.
The length attribute must return
the number of cues in the list represented by the
TextTrackCueList object.
The supported property indices of a TextTrackCueList object at any
instant are the numbers from zero to the number of cues in the
list represented by the TextTrackCueList object minus one, if any. If there are no
cues in the list, there are no supported property
indices.
To determine the value of an indexed property for a given index index, the user agent must return the indexth text track
cue in the list represented by the TextTrackCueList object.
The getCueById(id) method, when called with an argument other than the empty string,
must return the first text track cue in the list represented by the
TextTrackCueList object whose text track cue identifier is id, if any, or null otherwise. If the argument is the empty string, then the method
must return null.
Media resources often contain one or more
media-resource-specific text tracks
containing data that browsers don't render, but want to expose to script to allow being
dealt with.
If the browser is unable to identify a TextTrackCue interface that is more
appropriate to expose the data in the cues of a media-resource-specific text track,
the {{DataCue}} object is used. [[INBANDTRACKS]]
The data attribute, on getting, must
return the raw text track cue data of the text track cue that the
TextTrackCue object represents. On setting, the text track cue data must
be set to the new value.
The user agent will use {{DataCue}} to expose only text track cue
objects that belong to a text track that has a text track kind of
metadata.
{{DataCue}} has a constructor to allow script to create {{DataCue}}
objects in cases where generic metadata needs to be managed for a text track.
Chapters are segments of a media resource with a given title. Chapters can be
nested, in the same way that sections in a document outline can have subsections.
Each text track cue in a text track being used for describing
chapters has three key features: the text track cue start time, giving the start time
of the chapter, the text track cue end time, giving the end time of the chapter, and
the text track rules for extracting the chapter title.
The rules for constructing the chapter tree from a text track are as follows. They
produce a potentially nested list of chapters, each of which have a start time, end time, title,
and a list of nested chapters. This algorithm discards cues that do not correctly nest within each
other, or that are out of order.
Let output be an empty list of chapters, where a chapter is a record
consisting of a start time, an end time, a title, and a (potentially empty) list of nested
chapters. For the purpose of this algorithm, each chapter also has a parent chapter.
Let current chapter be a stand-in chapter whose start time is negative
infinity, whose end time is positive infinity, and whose list of nested chapters is output. (This is just used to make the algorithm easier to describe.)
Loop: If list is empty, jump to the step labeled
end.
Let current cue be the first cue in list, and then
remove it from list.
If current cue's text track cue start time is less than
the start time of current chapter, then return to the step labeled
loop.
While current cue's text track cue start time is greater
than or equal to current chapter's end time, let current
chapter be current chapter's parent chapter.
If current cue's text track cue end time is greater than
the end time of current chapter, then return to the step labeled
loop.
Create a new chapter new chapter, whose start time is current cue's text track cue start time, whose end time is current cue's text track cue end time, whose title is current cue's text track cue data interpreted according to its
rules for rendering the cue in isolation, and whose list of nested chapters is
empty.
Append new chapter to current chapter's list of
nested chapters, and let current chapter be new chapter's
parent.
Let current chapter be new chapter.
Return to the step labeled loop.
End: Return output.
The following snippet of a WebVTT file shows how nested chapters can be marked
up. The file describes three 50-minute chapters, "Astrophysics", "Computational Physics", and
"General Relativity". The first has three subchapters, the second has four, and the third has
two. [[WEBVTT]]
WEBVTT
00:00:00.000 --> 00:50:00.000
Astrophysics
00:00:00.000 --> 00:10:00.000
Introduction to Astrophysics
00:10:00.000 --> 00:45:00.000
The Solar System
00:00:00.000 --> 00:10:00.000
Coursework Description
00:50:00.000 --> 01:40:00.000
Computational Physics
00:50:00.000 --> 00:55:00.000
Introduction to Programming
00:55:00.000 --> 01:30:00.000
Data Structures
01:30:00.000 --> 01:35:00.000
Answers to Last Exam
01:35:00.000 --> 01:40:00.000
Coursework Description
01:40:00.000 --> 02:30:00.000
General Relativity
01:40:00.000 --> 02:00:00.000
Tensor Algebra
02:00:00.000 --> 02:30:00.000
The General Relativistic Field Equations
This section is non-normative.
Text tracks can be used for storing data relating to the media data, for interactive or
augmented views.
For example, a page showing a sports broadcast could include information about the current
score. Suppose a robotics competition was being streamed live. The image could be overlayed with
the scores, as follows:
In order to make the score display render correctly whenever the user seeks to an arbitrary
point in the video, the metadata text track cues need to be as long as is appropriate for the
score. For example, in the frame above, there would be maybe one cue that lasts the length of the
match that gives the match number, one cue that lasts until the blue alliance's score changes, and
one cue that lasts until the red alliance's score changes. If the video is just a stream of the
live event, the time in the bottom right would presumably be automatically derived from the
current video time, rather than based on a cue. However, if the video was just the highlights,
then that might be given in cues also.
The following shows what fragments of this could look like in a WebVTT file:
The key here is to notice that the information is given in cues that span the length of time to
which the relevant event applies. If, instead, the scores were given as zero-length (or very
brief, nearly zero-length) cues when the score changes, for example saying "red+2" at
05:11:17.198, "red+3" at 05:11:25.912, etc, problems arise: primarily, seeking is much harder to
implement, as the script has to walk the entire list of cues to make sure that no notifications
have been missed; but also, if the cues are short it's possible the script will never see that
they are active unless it listens to them specifically.
When using cues in this manner, authors are encouraged to use the cuechange event
to update the current annotations. (In particular, using the timeupdate event
would be less appropriate as it would require doing work even when the cues haven't changed,
and, more importantly, would introduce a higher latency between when the metadata cues become
active and when the display is updated, since timeupdate events are rate-limited.)
Identifying a track kind through a URL
Other specifications or formats that need a URL to identify the return values of
the AudioTrack.kind or VideoTrack.kind IDL attributes, or identify the kind of text track, must use the about:html-kindURL.
User interface
The controls attribute is a
boolean attribute. If present, it indicates that the author has not provided a scripted
controller and would like the user agent to provide its own set of controls.
If the attribute is present, or if scripting is
disabled for the media element, then the user agent should expose a user interface to the user. This user interface should include features to begin playback, pause
playback, seek to an arbitrary position in the content (if the content supports arbitrary
seeking), change the volume, change the display of closed captions or embedded sign-language
tracks, select different audio tracks or turn on audio descriptions, and show the media content in
manners more suitable to the user (e.g., fullscreen video or in an independent resizable window).
Other controls may also be made available.
Even when the attribute is absent, however, user agents may provide controls to affect playback
of the media resource (e.g., play, pause, seeking, track selection, and volume controls), but
such features should not interfere with the page's normal rendering. For example, such features
could be exposed in the media element's platform media keys, or a remote
control. The user agent may implement this simply by exposing a user interface to the user
as described above (as if the <{mediaelements/controls}> attribute was present).
If the user agent exposes a user interface to
the user by displaying controls over the media element, then the user agent
should suppress any user interaction events while the user agent is interacting with this
interface. (For example, if the user clicks on a video's playback control, mousedown
events and so forth would not simultaneously be fired at elements on the page.)
Where possible (specifically, for starting, stopping, pausing, and unpausing playback, for
seeking, for changing the rate of playback, for fast-forwarding or rewinding, for listing,
enabling, and disabling text tracks, and for muting or changing the volume of the audio), user
interface features exposed by the user agent must be implemented in terms of the DOM API described
above, so that, e.g., all the same events fire.
For the purposes of listing chapters in the media resource, only text tracks in the
media element's list of text tracks that are showing and
whose text track kind is chapters should be used. Such tracks must be
interpreted according to the rules for constructing the chapter tree from a text
track. When seeking in response to a user manipulating a chapter selection interface, user
agents should not use the approximate-for-speed flag.
The controls IDL attribute must
reflect the content attribute of the same name.
media . volume [ = value ]
Returns the current playback volume, as a number in the range 0.0 to 1.0, where 0.0 is the
quietest and 1.0 the loudest.
Can be set, to change the volume.
Throws an {{IndexSizeError}} exception if the new value is not in the range 0.0 .. 1.0.
media . muted [ = value ]
Returns true if audio is muted, overriding the volume
attribute, and false if the volume attribute is being
honored.
Can be set, to change whether the audio is muted or not.
A media element has a playback volume, which is a fraction in the range 0.0 (silent) to 1.0 (loudest).
Initially, the volume should be 1.0, but user agents may remember the last set value across
sessions, on a per-site basis or otherwise, so the volume may start at other values.
The volume IDL attribute must return the
playback volume of any audio portions of the
media element. On setting, if the new value is in the range 0.0 to 1.0 inclusive, the
media element's playback volume must be
set to the new value. If the new value is outside the range 0.0 to 1.0 inclusive, then, on
setting, an {{IndexSizeError}} exception must be thrown instead.
A media element can also be muted. If
anything is muting the element, then it is muted. (For example, when the direction of
playback is backwards, the element is muted.)
The muted IDL attribute must return the value
to which it was last set. When a media element is created, if the element has a muted content attribute specified, then the muted IDL attribute should be set to true; otherwise, the user
agents may set the value to the user's preferred value (e.g., remembering the last set value across
sessions, on a per-site basis or otherwise). While the muted
IDL attribute is set to true, the media element must be muted.
Whenever either of the values that would be returned by the volume and muted IDL
attributes change, the user agent must queue a task to fire a simple
event named volumechange at the media element.
An element's effective media volume is determined as follows:
If the user has indicated that the user agent is to override the volume of the element,
then the element's effective media volume is the volume desired by the user. Abort
these steps.
Let volume be the playback
volume of the audio portions of the media element, in range 0.0 (silent) to
1.0 (loudest).
The element's effective media volume is volume,
interpreted relative to the range 0.0 to 1.0, with 0.0 being silent, and 1.0 being the loudest
setting, values in between increasing in loudness. The range need not be linear. The loudest
setting may be lower than the system's loudest possible setting; for example the user could have
set a maximum volume.
The muted content attribute on media elements is a boolean attribute that controls the
default state of the audio output of the media resource, potentially overriding user
preferences.
The defaultMuted IDL attribute must
reflect the <{media/muted}> content attribute.
This attribute has no dynamic effect (it only controls the default state of the
element).
This video (an advertisement) autoplays, but to avoid annoying users, it does so without
sound, and allows the user to turn the sound on.
Time ranges
Objects implementing the TimeRanges interface
represent a list of ranges (periods) of time.
interface TimeRanges {
readonly attribute unsigned long length;
double start(unsigned long index);
double end(unsigned long index);
};
media . length
Returns the number of ranges in the object.
time = media . start(index)
Returns the time for the start of the range with the given index.
Throws an {{IndexSizeError}} exception if the index is out of range.
time = media . end(index)
Returns the time for the end of the range with the given index.
Throws an {{IndexSizeError}} exception if the index is out of range.
The length IDL attribute must return the
number of ranges represented by the object.
The start(index)
method must return the position of the start of the indexth range represented
by the object, in seconds measured from the start of the timeline that the object covers.
The end(index) method
must return the position of the end of the indexth range represented by the
object, in seconds measured from the start of the timeline that the object covers.
These methods must throw {{IndexSizeError}} exceptions if called with an index argument greater than or equal to the number of ranges represented by the
object.
When a TimeRanges object is said to be a normalized TimeRanges
object, the ranges it represents must obey the following criteria:
The start of a range must be greater than the end of all earlier ranges.
The start of a range must be less than or equal to the end of that same range.
In other words, the ranges in such an object are ordered, don't overlap, and don't touch
(adjacent ranges are folded into one bigger range). A range can be empty (referencing just a
single moment in time), e.g., to indicate that only one frame is currently buffered in the case
that the user agent has discarded the entire media resource except for the current
frame, when a media element is paused.
Ranges in a TimeRanges object must be inclusive.
Thus, the end of a range would be equal to the start of a following adjacent
(touching but not overlapping) range. Similarly, a range covering a whole timeline anchored at
zero would have a start equal to zero and an end equal to the duration of the timeline.
The timelines used by the objects returned by the buffered, seekable and
played IDL attributes of media elements must be that element's media timeline.
The TrackEvent interface
[Constructor(DOMString type, optional TrackEventInit eventInitDict)]
interface TrackEvent : Event {
readonly attribute (VideoTrack or AudioTrack or TextTrack)? track;
};
dictionary TrackEventInit : EventInit {
(VideoTrack or AudioTrack or TextTrack)? track = null;
};
event . track
Returns the track object ({{TextTrack}}, {{AudioTrack}}, or {{VideoTrack}}) to which the event
relates.
The track attribute must return the value it
was initialized to. When the object is created, this attribute must be initialized to null. It
represents the context information for the event.
Event summary
This section is non-normative.
The following events fire on media elements as part of the processing model described
above:
The user agent is intentionally not currently fetching media data.
networkState equals NETWORK_IDLE
abort
{{Event}}
The user agent stops fetching the media data before it is completely
downloaded, but not due to an error.
error is an object with the code MEDIA_ERR_ABORTED. networkState equals either NETWORK_EMPTY or NETWORK_IDLE, depending on when the download was aborted.
error
{{Event}}
An error occurs while fetching the media data or the type of the resource is not supported media format.
error is an object with the code MEDIA_ERR_NETWORK or higher. networkState equals either NETWORK_EMPTY or NETWORK_IDLE, depending on when the download was aborted.
emptied
{{Event}}
A media element whose networkState
was previously not in the NETWORK_EMPTY state has
just switched to that state (either because of a fatal error during load that's about to be
reported, or because the load() method was invoked while
the resource selection algorithm was already
running).
networkState is NETWORK_EMPTY; all the IDL attributes are in their
initial states.
stalled
{{Event}}
The user agent is trying to fetch media data, but data is unexpectedly not
forthcoming.
readyState newly increased to HAVE_CURRENT_DATA or greater for the first time.
canplay
{{Event}}
The user agent can resume playback of the media data, but estimates that if
playback were to be started now, the media resource could not be rendered at the
current playback rate up to its end without having to stop for further buffering of content.
readyState newly increased to HAVE_FUTURE_DATA or greater.
canplaythrough
{{Event}}
The user agent estimates that if playback were to be started now, the media
resource could be rendered at the current playback rate all the way to its end without
having to stop for further buffering.
readyState is newly equal to HAVE_ENOUGH_DATA.
playing
{{Event}}
Playback is ready to start after having been paused or delayed due to lack of media
data.
{{HTMLMediaElement/readyState}} is newly equal to or greater than
HAVE_FUTURE_DATA and {{HTMLMediaElement/paused}} is false, or {{HTMLMediaElement/paused}} is newly false and readyState is equal to or greater than HAVE_FUTURE_DATA. Even if this event fires, the
element might still not be potentially playing, e.g., if the element is paused for user interaction or paused for in-band content.
waiting
{{Event}}
Playback has stopped because the next frame is not available, but the user agent expects
that frame to become available in due course.
readyState is equal to or less than HAVE_CURRENT_DATA, and {{HTMLMediaElement/paused}} is false. Either seeking is true, or the current playback position
is not contained in any of the ranges in buffered. It
is possible for playback to stop for other reasons without {{HTMLMediaElement/paused}} being false, but those reasons do not fire this event
(and when those situations resolve, a separate playing
event is not fired either): e.g., the playback ended, or playback
stopped due to errors, or the element has paused for user interaction
or paused for in-band content.
seeking
{{Event}}
The seeking IDL attribute changed to true, and the user agent has started seeking to a new position.
One or both of the videoWidth and videoHeight attributes have just been updated.
Media element is a <{video}> element; readyState is not HAVE_NOTHING
volumechange
{{Event}}
Either the volume attribute or the muted attribute has changed. Fired after the relevant
attribute's setter has returned.
The following event fires on <{source}> element:
Event name
Interface
Fired when...
error
Event
An error occurs while fetching the media data or the type of the resource
is not supported media format.
The following events fire on {{AudioTrackList}}, {{VideoTrackList}}, and
{{TextTrackList}} objects:
Event name
Interface
Fired when...
change
{{Event}}
One or more tracks in the track list have been enabled or disabled.
addtrack
{{TrackEvent}}
A track has been added to the track list.
removetrack
{{TrackEvent}}
A track has been removed from the track list.
The following event fires on {{TextTrack}} objects and <{track}> elements:
Event name
Interface
Fired when...
cuechange
{{Event}}
One or more cues in the track have become active or stopped being active.
The following events fire on track elements:
Event name
Interface
Fired when...
error
Event
An error occurs while fetching the track data or the type of the resource is not supported text track format.
load
Event
A track data has been fetched and successfully processed.
The following events fire on {{TextTrackCue}} objects:
Event name
Interface
Fired when...
enter
{{Event}}
The cue has become active.
exit
{{Event}}
The cue has stopped being active.
Security and privacy considerations
The main security and privacy implications of the video and audio
elements come from the ability to embed media cross-origin. There are two directions that threats
can flow: from hostile content to a victim page, and from a hostile page to victim content.
If a victim page embeds hostile content, the threat is that the content might contain scripted
code that attempts to interact with the {{Document}} that embeds the content. To avoid
this, user agents must ensure that there is no access from the content to the embedding page. In
the case of media content that uses DOM concepts, the embedded content must be treated as if it
was in its own unrelated top-level browsing context.
For instance, if an SVG animation was embedded in a <{video}> element,
the user agent would not give it access to the DOM of the outer page. From the perspective of
scripts in the SVG resource, the SVG file would appear to be in a lone top-level browsing context
with no parent.
If a hostile page embeds victim content, the threat is that the embedding page could obtain
information from the content that it would not otherwise have access to. The API does expose some
information: the existence of the media, its type, its duration, its size, and the performance
characteristics of its host. Such information is already potentially problematic, but in practice
the same information can be obtained using the <{img}> element, and so it has been deemed
acceptable.
However, significantly more sensitive information could be obtained if the user agent further
exposes metadata within the content such as subtitles or chapter titles. Such information is
therefore only exposed if the video resource passes a CORS resource sharing check.
The crossorigin attribute allows authors to control
how this check is performed. [[!FETCH]]
Without this restriction, an attacker could trick a user running within a
corporate network into visiting a site that attempts to load a video from a previously leaked
location on the corporation's intranet. If such a video included confidential plans for a new
product, then being able to read the subtitles would present a serious confidentiality breach.
Best practices for authors using media elements
This section is non-normative.
Playing audio and video resources on small devices such as set-top boxes or mobile phones is
often constrained by limited hardware resources in the device. For example, a device might only
support three simultaneous videos. For this reason, it is a good practice to release resources
held by media elements when they are done playing, either by
being very careful about removing all references to the element and allowing it to be garbage
collected, or, even better, by removing the element's src
attribute and any <{source}> element descendants, and invoking the element's load() method.
Similarly, when the playback rate is not exactly 1.0, hardware, software, or format limitations
can cause video frames to be dropped and audio to be choppy or muted.
Best practices for implementors of media elements
This section is non-normative.
How accurately various aspects of the media element API are implemented is
considered a quality-of-implementation issue.
For example, when implementing the buffered attribute,
how precise an implementation reports the ranges that have been buffered depends on how carefully
the user agent inspects the data. Since the API reports ranges as times, but the data is obtained
in byte streams, a user agent receiving a variable-bit-rate stream might only be able to determine
precise times by actually decoding all of the data. User agents aren't required to do this,
however; they can instead return estimates (e.g., based on the average bitrate seen so far) which
get revised as more information becomes available.
As a general rule, user agents are urged to be conservative rather than optimistic. For
example, it would be bad to report that everything had been buffered when it had not.
Another quality-of-implementation issue would be playing a video backwards when the codec is
designed only for forward playback (e.g., there aren't many key frames, and they are far apart, and
the intervening frames only have deltas from the previous frame). User agents could do a poor job,
e.g., only showing key frames; however, better implementations would do more work and thus do a
better job, e.g., actually decoding parts of the video forwards, storing the complete frames, and
then playing the frames backwards.
Similarly, while implementations are allowed to drop buffered data at any time (there is no
requirement that a user agent keep all the media data obtained for the lifetime of the media
element), it is again a quality of implementation issue: user agents with sufficient resources to
keep all the data around are encouraged to do so, as this allows for a better user experience. For
example, if the user is watching a live stream, a user agent could allow the user only to view the
live video; however, a better user agent would buffer everything and allow the user to seek
through the earlier material, pause it, play it forwards and backwards, etc.
When a media element that is paused is removed from a document and not reinserted before the next time the event loop reaches step 1, implementations that are resource constrained are encouraged to take
that opportunity to release all hardware resources (like video planes, networking resources, and
data buffers) used by the media element. (User agents still have to keep track of the
playback position and so forth, though, in case playback is later restarted.)
The <{map}> element, in conjunction with an <{img}> element and any <{area}> element
descendants, defines an image map. The element represents its children.
The name attribute gives the map a name so that
it can be referenced. The attribute must be present and must have a non-empty value with no
[=space characters=]. The value of the name attribute must not be equal to the value
of the name attribute of another <{map}> element in the same document.
If the <{global/id}> attribute is also specified, both attributes must have the same value.
map . areas
Returns an HTMLCollection of the <{area}> elements in the <{map}>.
map . images
Returns an HTMLCollection of the img and object
elements that use the <{map}>.
The areas IDL attribute must return an
HTMLCollection rooted at the <{map}> element, whose filter matches only
<{area}> elements.
The images IDL attribute must return an
HTMLCollection rooted at the {{Document}} node, whose filter matches only
<{img}> elements that are associated with this <{map}> element according
to the image map processing model.
The IDL attribute name must reflect
the content attribute of the same name.
Image maps can be defined in conjunction with other content on the page, to ease maintenance.
This example is of a page with an image map at the top of the page, and a corresponding set of
text links at the bottom.
The <{area}> element represents either a hyperlink with some text and a
corresponding area on an image map, or a dead area on an image map.
An <{area}> element with a parent node must have a <{map}> element ancestor
or a <{template}> element ancestor.
If the <{area}> element has an href
attribute, then the <{area}> element represents a hyperlink. In this case,
the alt attribute must be present. It specifies the
text of the hyperlink. Its value must be text that informs the user about the destination of the link.
If the <{area}> element has no href
attribute, then the area represented by the element cannot be selected, and the alt attribute must be omitted.
In both cases, the shape and coords attributes specify the area.
The shape attribute is an enumerated
attribute. The following table lists the keywords defined for this attribute. The states
given in the first cell of the rows with keywords give the states to which those keywords map.
Some of the keywords are non-conforming, as noted in the last column.
The attribute may be omitted. The missing value default is the rectangle state.
The coords attribute must, if specified,
contain a valid list of floating-point numbers. This attribute gives the coordinates for the shape
described by the shape attribute.
The processing for this attribute is described as part of the image map processing model.
In the circle state, <{area}> elements must
have a coords attribute present, with three integers, the
last of which must be non-negative. The first integer must be the distance in CSS pixels from the
left edge of the image to the center of the circle, the second integer must be the distance in CSS
pixels from the top edge of the image to the center of the circle, and the third integer must be
the radius of the circle, again in CSS pixels.
In the default state state, <{area}>
elements must not have a coords attribute. (The area is the
whole image.)
In the polygon state, <{area}> elements must
have a coords attribute with at least six integers, and the
number of integers must be even. Each pair of integers must represent a coordinate given as the
distances from the left and the top of the image in CSS pixels respectively, and all the
coordinates together must represent the points of the polygon, in order.
In the rectangle state, <{area}> elements must
have a coords attribute with exactly four integers, the
first of which must be less than the third, and the second of which must be less than the fourth.
The four points must represent, respectively, the distance from the left edge of the image to the
left side of the rectangle, the distance from the top edge to the top side, the distance from the
left edge to the right side, and the distance from the top edge to the bottom side, all in CSS
pixels.
When user agents allow users to follow hyperlinks or
download hyperlinks created using the
<{area}> element, as described in the next section, the href, target,
and download attributes decide how the link is followed. The <{area/rel}>, and hreflang attributes may be used to indicate to the user the likely nature of the target resource before the user follows the link.
The target, download, <{area/rel}>, hreflang,
type, and referrerpolicy attributes must be omitted if the
href attribute is not present.
The activation behavior of <{area}> elements is to run the following
steps:
If the <{area}> element has a download
attribute and the algorithm is not allowed to show a popup; or, if the user has not indicated a specific browsing context for following the link, and the element's target attribute is present, and applying the rules
for choosing a browsing context given a browsing context name, using the value of the
target attribute as the browsing context name, would
result in there not being a chosen browsing context, then run these substeps:
Abort these steps without following the hyperlink.
Otherwise, the user agent must follow the
hyperlink or download the hyperlink created
by the <{area}> element, if any, and as determined by the download attribute and any expressed user
preference.
The IDL attributes alt, coords, target, download,
rel, and hreflang, each must reflect the respective
content attributes of the same name.
The IDL attribute shape must
reflect the <{area/shape}> content attribute.
The IDL attribute relList must
reflect the <{links/rel}> content attribute.
The IDL attribute referrerPolicy must
reflect the <{link/referrerpolicy}> content attribute, limited to only known values.
The <{area}> element also supports the HTMLHyperlinkElementUtils interface. [[!URL]]
When the element is created, and whenever the element's href content attribute is set, changed, or removed, the user
agent must invoke the element's HTMLHyperlinkElementUtils interface's set the input algorithm with the value of the href content attribute, if any, or the empty string otherwise,
as the given value.
The element's HTMLHyperlinkElementUtils interface's get the base algorithm must simply return the document base URL.
The element's HTMLHyperlinkElementUtils interface's query encoding is the document's character encoding.
When the element's HTMLHyperlinkElementUtils interface invokes its update steps with a string value, the user
agent must set the element's href content attribute to
the string value.
Image maps
Authoring
An image map allows geometric areas on an image to be associated with hyperlinks.
An image, in the form of an <{img}> element, may be associated with an image map (in the form of a map
element) by specifying a usemap attribute on
the <{img}> element. The usemap attribute, if specified, must be a valid
hash-name reference to a <{map}> element.
Consider an image that looks as follows:
If we wanted just the colored areas to be clickable, we could do it as follows:
Please select a shape:
Processing model
If an <{img}> element has a
usemap attribute specified, user agents must process it
as follows:
If that returned null, then abort these steps. The image is not associated with an image
map after all.
Otherwise, the user agent must collect all the <{area}> elements that are
descendants of the map. Let those be the areas.
Having obtained the list of <{area}> elements that form the image map (the areas), interactive user agents must process the list in one of two ways.
If the user agent intends to show the text that the <{img}> element represents, then
it must use the following steps.
In user agents that do not support images, or that have images disabled,
<{object}> elements cannot represent images, and thus this section never applies (the
fallback content is shown instead). The following steps therefore only apply to
<{img}> elements.
Remove all the <{area}> elements in areas that have no href attribute.
Remove all the <{area}> elements in areas that have no alt attribute, or whose alt
attribute's value is the empty string, if there is another <{area}> element in
areas with the same value in the href attribute and with a non-empty alt attribute.
Each remaining <{area}> element in areas represents a
hyperlink. Those hyperlinks should all be made available to the user in a manner
associated with the text of the <{img}>.
In this context, user agents may represent area and <{img}> elements
with no specified alt attributes, or whose alt
attributes are the empty string or some other non-visible text, in a user-agent-defined fashion
intended to indicate the lack of suitable author-provided text.
If the user agent intends to show the image and allow interaction with the image to select
hyperlinks, then the image must be associated with a set of layered shapes, taken from the
<{area}> elements in areas, in reverse tree order (so the last
specified <{area}> element in the map is the bottom-most shape, and
the first element in the map, in tree order, is the top-most shape).
Each <{area}> element in areas must be processed as follows to
obtain a shape to layer onto the image:
Find the state that the element's shape attribute
represents.
Use the rules for parsing a list of floating-point numbers to parse the element's
coords attribute, if it is present, and let the result be the coords
list. If the attribute is absent, let the coords list be the empty list.
If the number of items in the coords list is less than the minimum number
given for the <{area}> element's current state, as per the following table, then the
shape is empty; abort these steps.
If the shape attribute represents the rectangle state, and the first number in the list is
numerically greater than the third number in the list, then swap those two numbers around.
If the shape attribute represents the rectangle state, and the second number in the list is
numerically greater than the fourth number in the list, then swap those two numbers around.
If the shape attribute represents the circle state, and the third number in the list is less than
or equal to zero, then the shape is empty; abort these steps.
Now, the shape represented by the element is the one described for the entry in the list
below corresponding to the state of the shape attribute:
Let x be the first number in coords, y be the second number, and r be the third number.
The shape is a circle whose center is x CSS pixels from the left edge
of the image and y CSS pixels from the top edge of the image, and whose
radius is r CSS pixels.
Let xi be the (2i)th entry in coords, and yi be the (2i+1)th entry in coords (the first entry in coords being the one with index 0).
Let the coordinates be (xi, yi),
interpreted in CSS pixels measured from the top left of the image, for all integer values of
i from 0 to (N/2)-1, where N is the number of items in coords.
The shape is a polygon whose vertices are given by the coordinates, and
whose interior is established using the even-odd rule. [[GRAPHICS]]
Let x1 be the first number in coords, y1 be the second number, x2 be the third number, and y2 be the fourth number.
The shape is a rectangle whose top-left corner is given by the coordinate (x1, y1) and whose
bottom right corner is given by the coordinate (x2,
y2), those coordinates being interpreted as CSS pixels
from the top left corner of the image.
For historical reasons, the coordinates must be interpreted relative to the
displayed image after any stretching caused by the CSS 'width' and 'height' properties
(or, for non-CSS browsers, the image element's width and height attributes
— CSS browsers map those attributes to the aforementioned CSS properties).
Browser zoom features and transforms applied using CSS or SVG do not affect the
coordinates.
Pointing device interaction with an image associated with a set of layered shapes per the above
algorithm must result in the relevant user interaction events being first fired to the top-most
shape covering the point that the pointing device indicated, if any, or to the image element
itself, if there is no shape covering that point. User agents should make <{area}> elements
representing hyperlinksfocusable, to ensure that they can be selected and
activated by all users.
Because a <{map}> element (and its <{area}> elements) can be associated with multiple
<{img}> and <{object}> elements, it is possible for an <{area}> element to correspond to
multiple focusable areas of the document.
Image maps are live; if the DOM is mutated, then the user agent must act as if it
had rerun the algorithms for image maps.
MathML
The MathML <{math}> element falls into the embedded content,
phrasing content, flow content, and palpable content categories for the
purposes of the content models in this specification.
When the MathML annotation-xml element contains elements from the
HTML namespace, such elements must all be flow content.
When the MathML token elements (MathML mi, MathML mo, MathML mn, MathML ms,
and MathML mtext) are descendants of HTML elements, they may contain
phrasing content elements from the HTML namespace. [[!MATHML]]
User agents must handle text other than inter-element white space found in MathML
elements whose content models do not allow straight text by pretending for the purposes of MathML
content models, layout, and rendering that the text is actually wrapped in an MathML mtext element in
the MathML namespace. (Such text is not, however, conforming.)
User agents must act as if any MathML element whose contents does not match the element's
content model was replaced, for the purposes of MathML layout and rendering, by an MathML merror
element containing some appropriate error message.
To enable authors to use MathML tools that only accept MathML in its XML form, interactive HTML
user agents are encouraged to provide a way to export any MathML fragment as an XML
namespace-well-formed XML fragment.
The semantics of MathML elements are defined by the MathML specification and
[=other applicable specifications=]. [[!MATHML]]
Here is an example of the use of MathML in an HTML document.
Some browsers may not be able to render it correctly.
The quadratic formula
The quadratic formula
SVG
The SVG <{svg}> element falls into the embedded content,
phrasing content, flow content, and palpable content categories for the
purposes of the content models in this specification.
To enable authors to use SVG tools that only accept SVG in its XML form, interactive HTML user
agents are encouraged to provide a way to export any SVG fragment as an XML namespace-well-formed
XML fragment.
When the SVG <{foreignObject}> element contains elements from the HTML namespace, such
elements must all be flow content. [[!SVG11]]
The content model for SVG title elements inside HTML documents is
phrasing content. (This further constrains the requirements given in the SVG specification.)
The semantics of SVG elements are defined by the SVG specification and other applicable
specifications. [[!SVG11]]
Dimension attributes
Author requirements: The
width and
height
attributes on <{img}>, <{iframe}>, <{embed}>, <{object}>, <{video}>, and, when their
type attribute is in the <{input/Image|Image Button}> state, <{input}> elements
may be specified to give the dimensions of the visual content of the element (the width and
height respectively, relative to the nominal direction of the output medium), in CSS pixels.
The attributes, if specified, must have values that are valid non-negative integers.
The specified dimensions given may differ from the dimensions specified in the resource itself,
since the resource may have a resolution that differs from the CSS pixel resolution. (On screens,
CSS pixels have a resolution of 96ppi, but in general the CSS pixel resolution depends on the
reading distance.) If both attributes are specified, then one of the following statements must be
true:
The target ratio is the ratio of the intrinsic width to the
intrinsic height in the resource. The specified width and
specified height are the values of the width and height
attributes respectively.
The two attributes must be omitted if the resource in question does not have both an
intrinsic width and an intrinsic height.
If the two attributes are both zero, it indicates that the element is not intended for the user
(e.g., it might be a part of a service to count page views).
The dimension attributes are not intended to be used to stretch the image.
User agent requirements: User agents are expected to use these attributes as hints for the rendering.
The
width
and
height
IDL attributes on the <{iframe}>, <{embed}>, <{object}>, and <{video}> elements must
reflect the respective content attributes of the same name.
For <{iframe}>, <{embed}>, and <{object}> the IDL
attributes are DOMString; for <{video}> the IDL attributes are
unsigned long.
The corresponding IDL attributes for <{img}> and
<{input}> elements are defined in those respective elements'
sections, as they are slightly more specific to those elements' other behaviors.