Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in this non-normative format: EPUB
Copyright © 2020 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This specification defines a general manifest format for expressing information about a digital publication. It uses [schema.org] metadata augmented to include various structural properties about publications, serialized in [json-ld11], to enable interoperability between publishing formats while accommodating variances in the information that needs to be expressed.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Publishing Working Group as a Recommendation.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to [email protected] (archives).
A W3C Recommendation is a specification that, after extensive consensus-building, has received the endorsement of the W3C and its Members. W3C recommends the wide deployment of this specification as a standard for the Web. Future updates to this Recommendation may incorporate new features.
This document was produced by a group operating under the 1 August 2017 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 15 September 2020 W3C Process Document.
This specification defines a general manifest format to describe publications. It is designed to be adaptable to the needs of specific areas of publishing, such as audiobook production, by specifying a modular approach for creating specializations.
This specification is also intended to facilitate different user agent architectures. While it is expected that traditional Web user agents (browsers) will be able to consume a publication manifest, this should not limit the capabilities of any other possible type of user agent (e.g., applications, whether standalone or running within a user agent, or even publications that include their own user interface).
This specification does not define how user agents are expected to render publications that use the manifest format.
This section is non-normative.
A digital publication is described by its manifest, which provides a set of properties expressed using a specific shape of JSON-LD [json-ld11] (a variant of JSON [ecma-404] for linked data).
The manifest is what enables user agents to understand the bounds of digital publication and the connection between its resources. It includes metadata that describes the digital publication, as a publication has an identity and nature beyond its constituent resources. The manifest also provides a list of resources that belong to the digital publication and a default reading order, which is how it connects resources into a single contiguous work.
The properties of the manifest describe the basic information a user agent requires to process and render a publication. For ease of understanding, these properties are categorized as follows:
Descriptive properties describe aspects of a digital publication, such as its title, creator, and language.
Resource categorization properties describe or identify common sets of resources, such as the resource list and default reading order. These properties refer to one or more resources, such as HTML documents, images, scripts, and metadata records.
The manifest also identifies key resources of a digital publication using link relations. These
relations are defined in the rel
property of
objects (i.e.,
the JSON objects that represent each resource in the default reading order, resource list, and links
sections).LinkedResource
The types of resources these relations identify are categorized as follows:
Informative resources are resources that contain additional information about the publication, such as its privacy policy, accessibility report, or preview.
Structural resources are key meta structures of the publication, such as the cover image, table of contents, and page list.
This specification defines the publication manifest as a specific "shape" of [json-ld11]. This means that the manifest SHOULD be expressed using only the syntactic constructions defined in this specification, as opposed to all the possibilities offered by the JSON-LD syntax.
This shape is also defined, informally, through a JSON schema [json-schema] that expresses the constraints defined in this specification. This schema is maintained at https://www.w3.org/ns/pub-schema/manifest/.
The publication manifest also has several authoring flexibilities and compact authoring expressions. For example, it is not always required that object types be explicitly authored, as these are automatically generated during processing when missing (see § 4.2.4 Explicit and Implied Objects for more information). An internal representation of the manifest data is defined separately; see § A. Internal Representation Data Model for further details.
Consequently, a user agent does not have to be a full JSON-LD processor. User agents only need to be able to read the manifest's specific shape and internalize the data.
This section is non-normative.
Manifest properties, in particular those categorized as descriptive properties, are primarily drawn from Schema.org and its hosted extensions [schema.org]. Consequently, these properties inherit their syntax and semantics from Schema.org, making manifest authoring compatible with Schema.org authoring.
When a manifest item corresponds to a Schema.org property, its property definition identifies its mapping and includes the defining type (e.g., CreativeWork or Book) in parentheses.
Schema.org additionally includes a many properties that, though relevant for publishing, are not mentioned in this specification. These properties can be used in a manifest as this document defines only the minimal set of manifest items (see § 4.7.3.2 Additional Manifest Properties).
When using additional Schema.org properties, ensure that they are valid for the type of publication specified in the manifest. Properties are often available in many Schema.org types, as a result of the inheritance model used by the vocabulary, but not all properties are available for all types. For more detailed information about which types accept which properties, refer to [schema.org].
More information about using additional Schema.org properties is also available in § 4.5 Publication Types and § 4.7.3.2 Additional Manifest Properties.
This specification depends on the Infra Standard [infra].
A digital publication consists of a finite set of resources that represent its content. This extent is known as its bounds and is defined within its manifest as described in § 5. Publication Resources.
A digital publication is any publication authored in a format that uses a profile of the manifest.
The internal representation of a manifest is the data structure created by user agents when they process the manifest and remove all possible ambiguities and incorporate any missing values that can be inferred from another source.
It is possible for the information expressed in the manifest to be the equivalent of the internal representation created by user agents if there are no ambiguities or missing information.
A manifest represents structured information about a publication, such as informative metadata, a list of resources, and a default reading order.
Profiles are publication formats (e.g., audiobooks) that use the manifest format defined in this specification to describe their bounds and content. These formats can extend the core definition in this specification with profile-specific terms and/or new requirements.
Although profiles can differ in their structural and content requirements, such variances are restricted to maintain a high degree of predictability between formats. (See § 8. Modular Extensions.)
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, MUST NOT, OPTIONAL, RECOMMENDED, REQUIRED, SHOULD, and SHOULD NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
All algorithm explanations are informative.
The following properties MUST be set in the manifest:
The following properties are RECOMMENDED:
The priority of all other properties and resource relations is OPTIONAL, but MAY be modified by implementations of the manifest format.
Some properties are implicitly required, as they are compiled from alternative information when not explicitly authored. See § A. Internal Representation Data Model for more information.
This section describes the categories of values that can be used with properties of the publication manifest.
When a manifest property expects a literal text string — one that is not language-dependent, such as a code value or date — as its value, the value MUST be expressed as a [json] string.
Literal values are not changed during processing of the manifest, unlike other values which might be, for example, converted to objects.
When a manifest property expects a number as its value, the value MUST be expressed as a [json] number.
When a manifest property expects a boolean as its value, the
value MUST be expressed as an [ecmascript] Boolean value
(true
or false
).
Various manifest properties are expected to be expressed as [json] objects. Although the use of explicit objects is usually advised, the following sections identify cases where it is also acceptable to use string values. These strings are automatically translated into objects during processing of the manifest by a user agent (the exact mapping of text values to objects is included in each definition).
When a manifest property expects a localizable text string as its value, the value MUST be expressed as one of:
LocalizableString
.A single string value represents an implied object whose value
property is the
string's text and whose language and base direction is determined from other information in
the manifest.
As localizable strings are intended to facilitate multiple language representations of a value, properties that accept a localizable string always accept an array of these values. For this reason, although only a single string or object has to be authored, such values are converted to arrays for consistency of processing.
A LocalizableString
is a [json] object consisting of the following properties:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
value
|
The value of the localizable string. REQUIRED. | Text. | Literal | (None) |
language
|
The language of the value. OPTIONAL. | A well-formed language tag [bcp47]. | Literal | (None) |
direction
|
The base direction of the value. OPTIONAL. | ltr or rtl |
Literal | (None) |
The meanings of the base direction values are:
ltr
: indicates that the textual value is explicitly directionally set to
left-to-right text.rtl
: indicates that the textual value is explicitly directionally set to
right-to-left text.A missing base direction value means that that the textual value is explicitly directionally set to the direction of the first character with a strong directionality, following the rules of the Unicode Bidirectional Algorithm [bidi].
If the base direction value were not set in the last example, the text would be displayed, following the Unicode Bidirectional Algorithm [bidi] and due to the presence of a Latin character starting the string, as:
HTML היא שפת סימון.
However, that would be incorrect. The extra direction
value is
necessary to control the display to yield:
HTML היא שפת סימון.
Note that the value field in the example represents the text as it is stored in memory, hence the discrepancy between it and the two renderings depicted here. Text editors might also display the JSON value differently (e.g., using the Unicode Bidirectional Algorithm only).
See also the [string-meta] document for further explanations and examples.
When a manifest property expects an entity (i.e., an individual or organization responsible for the various aspects of creation), its value MUST be expressed either as:
A single string value represents an instance of an Entity
object whose
name
property is the string's text and whose type
is assumed
to be Person [schema.org].
An Entity is defined as an instance of
either the [schema.org] Person
or Organization
type with the following minimal property set:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
type
|
The type of entity. OPTIONAL | One or more Text. Sequence MUST include
"Person " or "Organization ". |
Array of Literals | (None) |
name
|
Name of the entity. REQUIRED. | One or more Text. | Array of Localizable Strings |
name
|
id
|
A canonical identifier associated with the entity. OPTIONAL. | A URL record [url]. | Identifier | (None) |
url
|
An address associated with the entity. OPTIONAL. | A valid URL string [url]. | URL |
url
|
identifier
|
An identifier associated with the entity (e.g., ORCID). OPTIONAL. | One or more Text. | Array of Literals |
identifier
|
This minimal set of properties is not restrictive. Authors can include any
additional properties defined for the [schema.org] Person
or Organization
types, as
appropriate. User agents are similarly not limited to interpreting only the preceding
properties.
When a manifest property links to one or more resources, it MUST be expressed either as:
LinkedResource
.A string value represents an implied LinkedResource
object whose
url
property is set to the string value.
A LinkedResource
object is defined as follows:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
type
|
The type of resource. OPTIONAL | One or more Text. Sequence MUST include
"LinkedResource ". |
Array of Literals | (None) |
url
|
Location of the resource. REQUIRED. | A valid URL string [url]. Refer to the property definitions that accept this type for additional restrictions. | URL |
url
|
encodingFormat
|
Media type of the resource (e.g., text/html ). OPTIONAL. |
MIME Media Type [rfc2046]. | Literal |
encodingFormat
|
name
|
Name of the item. OPTIONAL. | One or more Text. | Array of Localizable Strings |
name
|
description
|
Description of the item. OPTIONAL. | One or more Text. | Array of Localizable Strings |
description
|
rel
|
The relation of the resource to the publication. OPTIONAL. |
One or more relations. Keywords are ASCII case-insensitive [infra] and MUST be compared as such. |
Array of Literals | (None) |
integrity
|
A cryptographic hashing of the resource that allows its integrity to be verified. OPTIONAL. |
One or more whitespace-separated sets of integrity metadata [sri]. The value MUST conform to the metadata definition [sri]. Refer to [sri] for the list of cryptographic hashing functions that user agents are expected to support. |
Literal | (None) |
duration
|
Overall duration of a time-based media resource. OPTIONAL | Duration value as defined by [iso8601-1]. | Literal |
duration (Property) |
alternate
|
References to one or more reformulation(s) of the resource in alternative
formats, where the |
One or more of:
A string value represents an implied |
Array of Linked Resources | (None) |
Although user agent support for the integrity
property is OPTIONAL, user agents that support cryptographic hashing comparisons using this
property MUST do so in accordance with [sri].
This specification only defines the alternate
property for selecting from
alternative formats (i.e., based on encodingFormat
or by inspecting URLs). Profiles
MAY extend this behaviour to allow selection based on other
criteria. The process for selecting an alternate is described in § B.
Selecting an Alternate Resource.
When defining a LinkedResource
object, it is advised to always
specify the media type of the resource using the encodingFormat
property.
Doing so allows user agents to more readily determine the usability of the resource.
{
"type" : "LinkedResource",
"url" : "chapter1.html",
"encodingFormat" : "text/html",
"name" : "Chapter 1 - Loomings",
"integrity" : "sha256-13AE04E21177BABEDFDE721577615A638341F963731EA936BBB8C3862F57CDFC"
}
{
"type" : "LinkedResource",
"url" : "chapter1.mp3",
"encodingFormat" : "audio/mpeg",
"name" : "Chapter 1 - Loomings",
"alternate" : [
"chapter1.html",
{
"type": "LinkedResource",
"url": "chapter1.json",
"encodingFormat": "application/vnd.syncnarr+json",
"duration": "PT1669S"
}
]
}
{
…
"resources" : [
"datatypes.svg",
{
"type" : "LinkedResource",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv",
"name" : "Test Results",
"description" : "CSV file containing the full data set used."
},
{
"type" : "LinkedResource",
"url" : "terminology.html",
"encodingFormat" : "text/html",
"rel" : "glossary"
}
],
…
}
When a manifest property expects a type of object not defined in this section, or by a profile, it MUST be expressed as a [json] object (i.e., the property's value will not be processed to create an object).
URLs are used to identify resources associated with a digital publication. When a property expects a URL value, it MUST be a valid URL string [url].
In the case of relative-URL strings, these are resolved to absolute-URL strings using a base URL [url].
The base URL for relative-URL strings is determined as follows:
By consequence, relative-URL strings in embedded manifests are resolved against the URL of the
document that references the manifest unless the document declares a base URL (i.e., in
a <base>
element in its header).
Identifiers are used to refer to a digital publication and the entities responsible for its creation in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, and PURLs are all examples of persistent identifiers frequently used in publishing.
Identifiers MUST be expressed as URL records [url]
When a manifest property allows one or more value of their respective type (e.g., literal, object, or URL), these values are expressed as [json] arrays. When a property value is a single element, however, the array syntax MAY be omitted.
A manifest MUST set its JSON-LD context [json-ld11] with the following two components, in the specified order:
https://schema.org
https://www.w3.org/ns/pub-context
Although Schema.org is often referenced using the http
URI scheme, the vocabulary is being migrated to use the
secure https
scheme as its default. As a result, only the https
scheme
is recognized in the publication manifest context.
{
"@context" : [
"https://schema.org",
"https://www.w3.org/ns/pub-context"
],
…
}
The publication context document adds features to the properties defined in Schema.org (e.g., the requirement for the creator property to be order preserving).
Profiles of this specification MAY require additional context URLs, but such URLs MUST be ordered after these two components.
The context can be extended by including additional parameters — such as the global language and direction declarations — in an object following the publication context.
{
"@context" : [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{
"language" : "es"
}
],
…
}
Each natural language property value in a manifest (e.g., title, creators) has a default natural language, which is the language that it is expressed in (e.g., English, French, Chinese). It also has a natural base direction in which it is written — the display direction, either left-to-right or right-to-left.
The digital publication manifest provides the ability to set both these concepts globally as well as on individual items to aid user agents in interpreting and presenting the metadata.
The ability to set the base direction is a JSON-LD 1.1 [json-ld11] feature. In other words, the Publication Manifest has a dependency on that version of the JSON-LD specification (as opposed to the earlier 1.0 [json-ld10] version).
The global language and base direction declarations for natural language manifest properties are
set in the context using the language
and direction
keywords [json-ld11], respectively. These
values are used to expand simple string values into localizable strings during the processing of the
manifest, as well as to provide a language and the base direction for localizable
strings that omit one.
The value of language
MUST be a well-formed language tag [bcp47].
The value of direction
MUST have one of the following values:
"ltr"
: indicates that the textual values are explicitly directionally set to
left-to-right text."rtl"
: indicates that the textual values are explicitly directionally set to
right-to-left text.The global language and base direction declaration, when present, MUST follow the publication context.
Default values are not specified for the global language or base direction.
It is possible to set the language or a base direction locally for any natural language value in
the manifest using a localizable string
:
The possible values of the language
and direction
keywords [json-ld11] are the same as for the global declaration. Furthermore, both values can also
be the (JSON) value of null
, indicating that no explicit language, respectively
direction, is set.
Setting the value of language
to null
can be useful if a
value (e.g., the name of an organization) is commonly used without any associated language
(e.g., "Google").
A local declaration of the language, respectively the base direction, takes precedence over a global declaration.
A digital publication's
manifest defines its Publication Type using the
type
keyword [json-ld11]. The type MAY be mapped onto any [schema.org] type, but CreativeWork
is assumed as the default when no type is specified.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : "CreativeWork",
…
}
More specific subtypes of CreativeWork
, such as Article
, Book
, TechArticle
, and Course
can be used instead of, or in addition
to, CreativeWork
.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
}
Each Schema.org type defines a set of properties that are valid for use with it. To ensure that the manifest can be validated and processed by Schema.org-aware processors, the manifest SHOULD contain only the properties associated with the selected type.
If properties from more than one type are needed, the manifest MAY include multiple type declarations.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : ["Book", "VisualArtwork"],
…
}
User agents SHOULD NOT fail to process manifests that are not valid to their declared Schema.org type(s).
Refer to the Schema.org site for the complete list of CreativeWork
subtypes.
A digital publication indicates the profile its manifest and content conform to using the conformsTo
property.
Term | Description | Required Value | Value Category | [dcterms] Mapping |
---|---|---|---|---|
conformsTo
|
URL of the profile. | An absolute-URL-with-fragment string [url]. | Array of Literals | conformsTo |
The URL to use for a profile is defined in its respective specification.
The conformsTo
property can also be used to indicate conformance to other
specifications and standards (e.g., to [wcag21]).
{
…
"conformsTo" : "https://www.w3.org/TR/audiobooks/",
…
}
The abridged
property provides information on whether or not a digital publication has been shortened from
its original form.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
abridged
|
Indicates whether the book is an abridged edition. | Either true or false . |
Boolean |
abridged (Book) |
{
…
"abridged" : true,
…
}
The accessibility properties provide information about the suitability of a digital publication for consumption by users with different preferred reading modalities. These properties typically supplement an evaluation against established accessibility criteria, such as those provided in [wcag21].
The following properties are categorized as accessibility properties:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
accessMode
|
The human sensory perceptual system or cognitive faculty through which a person may process or perceive information. | One or more Text. | Array of Literals |
accessMode (CreativeWork) |
accessModeSufficient
|
A list of single or combined access modes that are sufficient to understand all the intellectual content of a resource. | One or more ItemList. | Array of Object |
accessModeSufficient (CreativeWork) |
accessibilityFeature
|
Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility. | One or more Text. | Array of Literals |
accessibilityFeature (CreativeWork) |
accessibilityHazard
|
A characteristic of the described resource that is physiologically dangerous to some users. | One or more Text. | Array of Literals |
accessibilityHazard (CreativeWork) |
accessibilitySummary
|
A human-readable summary of specific accessibility features or deficiencies that is consistent with the other accessibility metadata. | Text. | Array of Localizable Strings |
accessibilitySummary (CreativeWork) |
Detailed descriptions of these properties, including the expected values to use with them, are available at [webschemas-a11y].
A reference to a detailed accessibility report can also be provided if more information is needed than can be expressed by these properties.
{
…
"accessMode" : ["textual", "visual"],
"accessibilityFeature" : ["alternativeText", "longDescription"]
"accessModeSufficient" : [
{
"type" : "ItemList",
"itemListElement" : ["textual", "visual"]
},
{
"type" : "ItemList",
"itemListElement" : ["textual"]
}
],
…
}
An address is a URL that identifies the source location
of a digital publication. It is expressed using
the url
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
url
|
URL of the publication. | A valid URL string [url]. | Array of URLs |
url (Thing) |
A digital publication MAY have more than one address, but all the addresses MUST resolve to the same document.
{
…
"url" : "https://publisher.example.org/frankenstein",
…
}
A digital publication's
canonical identifier property provides a unique
identifier for a digital publication.
It is expressed using the id
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
id
|
Preferred version of the publication. | A URL record [url]. | Identifier | (None) |
Ensuring uniqueness of canonical identifiers is outside the scope of this specification. The actual achievable uniqueness depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.
If a canonical identifier is not provided in the manifest, or the value is an invalid URL, the digital publication does not have a canonical identifier. User agents MUST NOT attempt to construct a canonical identifier from any other identifiers provided in the manifest.
The specification of the canonical identifier MAY be complemented by
the inclusion of additional types of identifiers using the identifier
property [schema.org] and/or its subtypes.
{
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
}
{
…
"id" : "urn:isbn:9780123456789",
"url" : "https://publisher.example.org/wuthering-heights",
…
}
A creator is an individual or organization responsible for the creation of a digital publication.
The following properties are categorized as creators:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
artist
|
The primary artist for the publication, in a medium other than pencils or digital line art. | One or more Person . |
Array of Entities |
artist (VisualArtwork) |
author
|
The author of the publication. | One or more Person and/or
Organization . |
Array of Entities |
author (CreativeWork) |
colorist
|
The individual who adds color to inked drawings. | One or more Person . |
Array of Entities |
colorist (VisualArtwork) |
contributor
|
Contributor whose role does not fit to one of the other roles in this table. | One or more Person and/or
Organization . |
Array of Entities |
contributor (CreativeWork) |
creator
|
The creator of the publication. Use of this property might lead to inconsistent results in user agents. It is marked as a synonym for author in [schema.org], but there is no guidance on which takes precedence or how to combine them. It is advised to use only one or the other, with preference given to the more specific author property. |
One or more Person and/or
Organization . |
Array of Entities |
creator (CreativeWork) |
editor
|
The editor of the publication. | One or more Person . |
Array of Entities |
editor (CreativeWork) |
illustrator
|
The illustrator of the publication. | One or more Person . |
Array of Entities |
illustrator (Book) |
inker
|
The individual who traces over the pencil drawings in ink. | One or more Person . |
Array of Entities |
inker (VisualArtwork) |
letterer
|
The individual who adds lettering, including speech balloons and sound effects, to artwork. | One or more Person . |
Array of Entities |
letterer (VisualArtwork) |
penciler
|
The individual who draws the primary narrative artwork. | One or more Person . |
Array of Entities |
penciler (VisualArtwork) |
publisher
|
The publisher of the publication. | One or more Person and/or
Organization . |
Array of Entities |
publisher (CreativeWork) |
readBy
|
A person who reads (performs) the publication (for audiobooks). | One or more Person . |
Array of Entities |
readBy (Audiobook) |
translator
|
The translator of the publication. | One or more Person and/or
Organization . |
Array of Entities |
translator (CreativeWork) |
Creators MUST be represented either as:
Person
[schema.org]; orPerson
or Organization
[schema.org].A single string value is a shorthand for a [schema.org] Person
whose name
property is set to that string value. (See also § 4.2.4.2
Entities.)
The manifest MAY include more than one of each type of creator.
{
…
"url" : "https://publisher.example.org/alice-in-wonderland",
"author" : {
"type" : "Person",
"name" : "Lewis Carroll"
}
}
{
…
"author" : [
"Jeni Tennison",
{
"type" : "Person",
"name" : "Gregg Kellogg",
},
{
"type" : "Person",
"name" : "Ivan Herman",
"id" : "https://www.w3.org/People/Ivan/"
"identifier" : "0000-0003-0782-2704",
}
],
"editor" : [
"Jeni Tennison",
{
"type" : "Person",
"name" : "Gregg Kellogg",
}
],
"publisher" : {
"type" : "Organization",
"name" : "World Wide Web Consortium",
"id" : "https://www.w3.org/"
}
…
}
The global duration indicates the overall length of a
time-based
digital publication (e.g., an audiobook or
a book consisting of a series of video clips). It is expressed using the
duration
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
duration
|
Overall duration of a time-based publication. | Duration value as defined by [iso8601-1]. | Literal |
duration (Property) |
{
…
"type" : "Audiobook",
"id" : "https://example.org/flatland-a-romance-of-many-dimensions/",
"url" : "https://w3c.github.io/pub-manifest/experiments/audiobook/",
"name" : "Flatland: A Romance of Many Dimensions",
…
"duration" : "PT15153S",
…
}
The relevant Wikipedia page gives a concise description of the ISO duration syntax.
The last modification date is the date when a digital publication was last updated (i.e.,
whenever changes were last made to any of the resources of the publication, including the manifest). It is expressed using the
dateModified
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
dateModified
|
Last modification date of the publication. | A Date or DateTime
value [schema.org], both
expressed in ISO 8601 Date, or Date Time formats, respectively [iso8601-1]. |
Literal |
dateModified (CreativeWork) |
The last modification date does not necessarily reflect all changes to a publication (e.g., if a digital publication format allows references to third-party content). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.
{
…
"dateModified" : "2015-12-17",
…
}
The publication date is the date on which a digital publication was originally
published. It represents a static event in the lifecycle of a publication and allows
subsequent revisions to be identified and compared. It is expressed using the
datePublished
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
datePublished
|
Creation date of the publication. | A Date or DateTime , both expressed
in ISO 8601 Date, or Date Time formats, respectively [iso8601-1]. |
Literal |
datePublished (CreativeWork) |
The exact moment of publication is intentionally left open to interpretation: it could be when the publication is first made available or could be a point in time before publication when the publication is considered final.
{
…
"datePublished" : "2015-12-17",
"dateModified" : "2016-01-30",
…
}
A digital publication has at least one natural language, which is the language that the content is expressed in (e.g., English, French, Chinese). The manifest includes the following property to set this concept, which can influence, for example, the behavior of a user agent (e.g., to preload a dictionary or text-to-speech engine).
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
inLanguage
|
Default language for the publication. | One or more well-formed language tags [bcp47]. | Array of Literals |
inLanguage (Property) |
The natural language MUST be a well-formed language tag [bcp47].
If a user agent requires the publication language and it is not available in the manifest, or the obtained value is not well-formed [bcp47], the user agent MAY attempt to determine the publication language when generating its internal representation. This specification does not mandate how such a language tag is created. The user agent might:
If a user agent requires a primary language for the publication and more than one language is
specified, the first entry in the inLanguage
array MUST be recognized as the primary.
It is important to differentiate the language of the publication from the language of the individual resources that compose it. If such resources are, for example, in HTML, the language needs to be set in those resources, too. The language of the publication is not inherited.
The reading progression direction establishes the
reading direction from one resource to the next within a digital publication. It is used to adapt such publication-level interactions as
menu position, touch gestures, swap direction, and tap zones for next and previous page. The
reading progression is expressed using the readingDirection
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
readingProgression
|
Reading progression direction from one resource to the other. | One of: ltr or rtl . |
Literal | (None) |
The value of this property MUST be either:
ltr
: left-to-right; orrtl
: right-to-left.The default value is ltr
. If the readingProgression
is not set,
user agents MUST use the default value when generating their internal representation.
This property has no effect on the rendering of the individual primary resources; it is only relevant for the progression direction from one resource to the other.
{
…
"readingProgression" : "ltr",
…
}
The title provides the human-readable name of a digital publication. It is expressed using the name
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
name
|
Human-readable title of the publication. | One or more Text. | Array of Localizable Strings |
name (Thing) |
If a title is not included in the manifest, the user agent MUST create one. The process for obtaining the title is defined in § 7.4.3 Add Default Values.
A user agent is not expected to produce a meaningful title [wcag21] for a publication when one is not specified.
{
…
"name" : "Heart of Darkness",
…
}
Publication resources are specified via the default reading order, the resource list, and the links, as defined in this section. These lists contain references to informative resources like the privacy policy, and structural resources like the table of contents.
It is not necessary to include a reference to the manifest in any of these lists.
The default reading order is a specific progression through a set of digital publication resources. A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.
The default reading order is expressed using the readingOrder
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
readingOrder
|
Order of progression through the resources of a digital publication. |
One or more |
Array of Linked Resources | (None) |
Each element of the readingOrder
property MUST be
expressed either as:
LinkedResource
object.A single string value represents an instance of a LinkedResource
object whose
url
property is the string's text.
The order of items is significant.
The URLs expressed in the reading order MAY include fragment identifiers, although profiles of this specification MAY restrict both their use as well as what schemes and features are supported. Fragment identifiers are to be interpreted as defined by their respective specifications (e.g., the start location to move the user to, or the range of content to render before moving to the next item in the reading order).
Resources SHOULD NOT be listed more than once in the reading order, as this can lead to unexpected results in user agents (e.g., links to the resource might not resolve to the right instance in the reading order).
The default reading order MAY be omitted when a digital publication consists only of the resource that links to the manifest. When the default reading order is absent, user agents MUST include an entry for the linking resource when compiling the internal representation. See § 7.4.3 Add Default Values for more information.
The default reading order MUST include at least one resource after processing of the manifest.
{
…
"readingOrder" : [
"html/title.html",
"html/copyright.html",
"html/introduction.html",
"html/epigraph.html",
"html/c001.html",
…
],
…
}
{
…
"readingOrder" : [
{
"type" : "LinkedResource",
"url" : "html/title.html",
"encodingFormat" : "text/html",
"name" : "Title page"
},
{
"type" : "LinkedResource",
"url" : "html/copyright.html",
"encodingFormat" : "text/html",
"name" : "Copyright page"
},
…
],
…
}
The resource list enumerates any additional resources used
in the processing or rendering of a digital publication
that are not already listed in the default reading order.
It is expressed using the resources
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
resources
|
List of additional publication resources used in the processing or rendering of a publication. |
One or more |
Array of Linked Resources | (None) |
Each element of the resources
property MUST be
expressed either as:
LinkedResource
object.A single string value represents an instance of a LinkedResource
object whose
url
property is the string's text.
The order of items is not significant.
To avoid conflicting information about a resource, a particular resource's URL SHOULD NOT be repeated within the resource list.
The URLs expressed in the resource list SHOULD NOT include fragment identifiers.
The completeness of the resource list can affect the usability of a digital publication in certain reading scenarios (e.g., the ability to read it offline). For this reason, it is strongly advised to provide a comprehensive list of all of the publication's constituent resources beyond those listed in the default reading order.
In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a publication even if some of these resources are not identified as belonging to the publication (e.g., if it is taken offline without them).
{
…
"resources" : [
"datatypes.html",
"datatypes.svg",
"datatypes.png",
"diff.html",
{
"type" : "LinkedResource",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv"
},
{
"type" : "LinkedResource",
"url" : "test-utf8-bom.csv",
"encodingFormat" : "text/csv"
},
…
],
…
}
The Links list is used to provide a list of resources that are
not required for the processing and rendering of a digital publication (i.e., the content of
the publication remains unaffected even if these resources are not available). Links are
expressed using the links
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
links
|
List of resources associated with a publication but not required for its processing or rendering. |
One or more |
Array of Linked Resources | (None) |
Each element of the links
property MUST be expressed
either as:
LinkedResource
object.A single string value represents an instance of a LinkedResource
object whose
url
property is the string's text.
The order of items is not significant.
It is RECOMMENDED to use LinkedResource
objects with
their rel
values set.
Linked resources are typically made available to user agents to augment or enhance the processing or rendering, such as:
Links can also be used to identify resources used in the online rendering of a publication, but that are not essential to include when the publication is taken offline or packaged (e.g., to minimize the size). These include:
The links
list SHOULD include resources necessary to
render a linked resource (e.g., scripts, images, style sheets).
Resources listed in the links
list MUST NOT be listed
in the default reading order or resource list.
User agents MAY ignore linked resources and are not required to take them offline with a publication. These resources SHOULD NOT be included when packaging a publication.
The manifest is designed to provide a basic set of properties for use by user agents in presenting and rendering a digital publication, but MAY be extended in the following ways:
This specification does not define how such additional properties are compiled, stored or exposed by user agents in their internal representation of the manifest. A user agent MAY ignore some or all extended properties.
The manifest MAY be extended through links to metadata records, such
as an ONIX [onix] or BibTeX [bibtex], using a
object, where:LinkedResource
rel
property of the
LinkedResource
includes a relevant identifier (e.g., if the linked
record contains descriptive metadata, the describedby
identifier [iana-link-relations] can be used); encodingFormat
identifies the MIME media
type [rfc2046] defined for that particular type of record, if
applicable.Linked records are included in the resource list when they are part of the publication (i.e., are needed for more than just manifest extensibility). Otherwise, they are included in the links list.
{
…
"links" : [
{
"type" : "LinkedResource",
"url" : "https://www.publisher.example.org/time-machine/onix.xml",
"encodingFormat" : "application/onix+xml",
"rel" : "describedby"
},
…
],
…
}
The application/onix+xml
MIME type has not yet been registered by
IANA at the time of writing this document and is included in the example for
illustrative purposes only.
Additional properties MAY be included directly in the manifest using public schemes like [schema.org] or [dcterms]. Proprietary terms MAY be used, but it is RECOMMENDED that such terms be included using Compact IRIs [json-ld11], with prefixes defined as part of the context.
Proper use of prefixes and compact IRIs is necessary to use a manifest with a full JSON-LD processor, but is not a requirement for the processing algorithm defined by this specification. Validation of prefixed terms has to be carried out separately if full JSON-LD processing is expected.
{
"@context" : [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{
"language" : "en",
"ex" : "https://example.org/vocab"
}
],
…
"ex:region" : "North America",
…
}
The Schema.org context file
[schema.org] defines several prefixes for commonly used
vocabularies, such as the Dublin Core Terms (dcterms
) [dcterms] and Element Set (dc
) [dc11], the FOAF
vocabulary (foaf
) [foaf], and the Bibliographic Ontology (bibo
) [bibo]. Properties from these
vocabularies can be used without their prefixes having to be declared.
{
…
"copyrightYear" : "2015",
"copyrightHolder" : "World Wide Web Consortium",
…
}
{
…
"dcterms:subject" : ["Web data description languages","Data integration","Data Exchange"],
…
}
The cover is a resource that user agents can use to present a digital publication (e.g., in a library or bookshelf, or when initially loading the publication).
The cover is identified by the cover
link relation.
The link to the cover MUST NOT be specified in the links list.
The cover
term is not currently registered in the IANA link
relations, but the Working Group expects to add it.
{
…
"resources" : [
{
"type" : "LinkedResource",
"url" : "cover.html",
"encodingFormat" : "text/html",
"rel" : "cover"
},
…
],
…
}
If the cover is an image (whether embedded in an HTML resource or not), it is strongly
advised to follow Success Criterion
1.1.1 [wcag21] for the
provision of alternative text and extended descriptions. For image formats that do not
provide the ability to embed this information, the name
and description
properties of
can be used to provide alternative
text and extended descriptions, respectively. In these cases, the LinkedResource
name
property
SHOULD always be set — the property can be left empty for
decorative images.
{
…
"resources" : [
{
"type" : "LinkedResource",
"url" : "whale-image.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "cover",
"name" : "Moby Dick attacking hunters",
"description" : "A white whale is seen surfacing from the water to attack a small whaling boat"
},
…
],
…
}
{
…
"resources" : [
{
"type" : "LinkedResource",
"url" : "cover.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "cover",
"name" : "",
},
…
],
…
}
If a user agent requires alternative text for a cover image to make an interface accessible,
and the name
property is not specified, it MAY attempt
to construct the alternative text from the publication metadata. This specification does not
mandate how such alternative text is created. One method is to construct the alternative
text as a string that identifies that the image as the cover, followed by the publication title.
Only one resource MAY be identified as the cover, but additional
covers MAY specified using the alternate
property (e.g., to provide alternative dimensions or
resolution).
{
…
"resources" : [
{
"type" : "LinkedResource",
"url" : "lilliput.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "cover"
"alternate" : [
{
"type" : "LinkedResource",
"url" : "lilliput.svg",
"encodingFormat" : "image/svg+xml",
"rel" : "cover"
}
]
},
…
],
…
}
The page list is a navigational aid that contains a list of static page demarcation points within a digital publication.
The page list is identified by the pagelist
link relation.
The pagelist
term is not currently registered in the IANA link
relations but the Working Group expects to add it.
Only one resource MAY be identified as containing a page list. If multiple instances are specified, user agents MUST use the first instance encountered, with precedence given to the reading order.
The link to the page list MUST NOT be specified in the links list.
{
…
"resources" : [
{
"type" : "LinkedResource",
"url" : "toc_file.html",
"rel" : "pagelist"
},
…
],
…
}
The table of contents is a navigational aid that provides links to the major structural sections of a digital publication.
The resource that contains the table of
contents is identified by the contents
link relation [iana-link-relations]. The table of contents proper
is the first element inside that resource with the role
value
doc-toc
, as defined in § C.2 HTML Structure.
Only one resource MAY be identified as containing the table of contents. If multiple instances are specified, user agents MUST use the first instance encountered, with precedence given to resources in the reading order.
Profiles of this specification MAY define how to locate a
resource containing the table of contents when no resource is identified by the
contents
relation.
The link to the table of contents MUST NOT be specified in the links list.
The RECOMMENDED structure and processing model for the table of contents is defined in § C. Machine-Processable Table of Contents.
{
…
"resources" : [
{
"type" : "LinkedResource",
"url" : "toc_file.html",
"rel" : "contents"
},
…
],
…
}
An accessibility report provides information about the suitability of a digital publication for consumption by users with varying preferred reading modalities. These reports typically identify the result of an evaluation against established accessibility criteria, such as those provided in [wcag21], and are an important source of information in determining the usability of a publication.
An accessibility report is identified using the accessibility-report
link
relation.
The accessibility-report
term is not currently registered in the
IANA link relations but the Working Group expects to add it.
It is helpful to include the report as a resource of the publication so that it is available, for example, when a publication is read offline.
Providing the accessibility report in a human-readable format, such as HTML [html], helps ensure that it can be accessed and understood by users. Augmenting the report with machine-processable metadata, such as provided in Schema.org [schema.org], will additionally aid in machine processing.
{
…
"resources" : [
…
{
"type" : "LinkedResource",
"url" : "https://www.publisher.example.org/sherlock-holmes-accessibility.html",
"rel" : "accessibility-report"
},
…
],
…
}
Not all digital publications will be available to all users (e.g., they might be restricted to registered users of a site). In such cases, the publisher might wish to provide a preview of the content to entice users to access the full version.
A preview is identified using the
preview
link relation [iana-link-relations].
Previews MAY be located externally or included as resources of digital publications.
{
…
"links" : [
{
"type" : "LinkedResource",
"url" : "preview.mp3",
"encodingFormat" : "audio/mpeg",
"rel" : "preview"
},
…
],
…
}
{
…
"links" : [
{
"type" : "LinkedResource",
"url" : "https://publisher.example.org/jekyll-hyde-preview.html",
"encodingFormat" : "text/html",
"rel" : "preview"
},
…
],
…
}
Users often have the legal right to know and control what information is collected about them, how such information is stored and for how long, whether it is personally identifiable, and how it can be expunged. Including a statement that addresses such privacy concerns is consequently an important part of publishing digital publications. Even if no information is collected, such a declaration increases the trust users have in the content.
A link to a privacy policy can be included in the manifest for this purpose. It is helpful to include the privacy policy as a resource of the publication so that it is available, for example, when a publication is read offline.
A privacy policy is
identified using the privacy-policy
link relation [iana-link-relations].
{
…
"resources" : [
…
{
"type" : "LinkedResource",
"url" : "https://www.w3.org/Consortium/Legal/privacy-statement-20140324",
"encodingFormat" : "text/html",
"rel" : "privacy-policy"
},
…
],
…
}
If additional relations beyond those defined in this specification need to be expressed, the rel
property can be extended in one of the
following ways:
The list of unique resources belonging to a digital publication — its bounds —
is obtained from the union of resources listed in the readingOrder
and resources
,
including any alternate
resources. The exact
process for creating this list is described in the manifest
processing algorithm.
All other resources are outside the bounds of the digital publication (e.g., resources listed in the links
section and hyperlinks in the content to external resources on the
Web).
This specification does not place any restrictions on publication resources, but profiles of this specification MAY restrict both the content type and location of resources.
User agents MAY opt to process and render resources differently depending on whether they are within the bounds of a digital publication (e.g., exclude external resources from an offline or packaged version of a publication).
Links to the manifest MUST take one or both of the following forms:
An HTTP Link
header field [rfc5988] with its
rel
parameter set to the value "publication
".
Link: <https://example.com/pub/manifest>; rel=publication
A link
element [html] with its rel
attribute set to the value "publication
".
<link href="https://example.com/pub/manifest" rel="publication"/>
When a manifest is embedded within an HTML document, the link MUST include a fragment identifier that references the script
element that contains the manifest (see § 6.2 Embedding).
<link href="#example_manifest" rel="publication">
…
<script id="example_manifest" type="application/ld+json">
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
…
}
</script>
The resource that links to the manifest MUST be included in either the default reading order or the resource list.
When a digital publication format allows manifests to be
embedded within an HTML document, the manifest MUST be included in a script
element [html] whose type
attribute is set to application/ld+json
[json-ld11].
<script type="application/ld+json">
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
…
}
</script>
Digital publication formats MAY define alternative methods of discovering a manifest that do no involve linking to, or embedding, a manifest (e.g., that manifest could be discovered using a restricted name and/or location). This specification does not add any restrictions on such methods.
This section depends on the Infra Standard [infra].
This section is non-normative.
Although a digital publication's manifest is authored as [json-ld11], the steps for processing a manifest described in this section detail how a user agent transforms the manifest into its internal representation of the data. The algorithm describes the process using the terminology and data types defined in [infra], and, if successful results in an [infra] map of the data being returned.
An actual implementation of this algorithm will use the corresponding constructs and data types of whatever language is used.
The following error types are used in the processing algorithm:
User agents SHOULD expose both validation and fatal errors, but this specification does not prescribe the way this is done.
For validation errors, user agents SHOULD differentiate the severity of the error (i.e., whether a required or recommended practice has been violated).
Some steps in the processing algorithm depend on the expected value
category of a term, so the context in which a term is used can affect processing (e.g., url
expects an Array of URLs only when the direct property of the Publication
Manifest). To differentiate these uses, a context is provided to certain
function calls. This context is set to the type of object that initiates the processing call.
The default list of recognized
types includes Person
, Organization
and
LinkedResource
. Profiles
MAY extend this list to include additional object types.
If a context is not provided to a function, the term being processed is considered part of the global context (i.e., it is a direct child of the manifest).
When extending the list of recognized types, the normalize data function might also need to be extended to ensure that all objects have their type specified (e.g., when string values are automatically expanded to objects).
This algorithm takes the following arguments:
This algorithm does not describe how the manifest is discovered and obtained. The steps by which to do so are defined by each digital publication format.
To generate the internal representation, run the following steps:
Let processed be an empty map that will contain the internal representation of the manifest.
Let manifest be the result of parsing JSON into Infra values given text. If manifest is not a map, fatal error, return failure.
(§ 4.3 Manifest
Contexts) If manifest["@context"] is not set to a list, or the first and second items in
manifest["@context"] are not the string values
"https://schema.org
" and "https://www.w3.org/ns/pub-context
",
in this order, fatal error, return failure.
If the context URLs are not set as expected, the JSON data does not represent a publication manifest.
(§ 4.6 Profile Conformance) Let processed["profile"] be the profile the manifest conforms to. Set processed["profile"] as follows:
If manifest["conformsTo"] is not set, or does not include a profile the user agent recognizes as capable of processing and/or rendering, the user agent SHOULD inspect the media type(s) of the resources in the reading order to determine if the publication matches a profile it is capable of processing or rendering. If so, validation error, set processed["profile"] to the matching profile. Otherwise, fatal error, return failure.
Otherwise, set processed["profile"] to the first URL in manifest["conformsTo"] the user agent is capable of processing and/or rendering.
The profile the publication conforms to determines any additional extension steps that have to be performed during processing. These steps are defined by their respective specifications.
The new term profile is created because conformsTo is not restricted to profile identifiers (i.e., the new term provides a persistent identifier of the profile within the internal representation).
(§ 4.4.1 Global Declarations) Let lang be the global language and dir be the global direction obtained from this step. Set each initially to an empty string.
For each context of manifest["@context"], moving from the last item to the first, if context is a map:
If lang is neither an empty string nor a well-formed [bcp47] language tag, validation error, set lang to an empty string.
If dir is neither an empty string nor one of the values "ltr
" or "rtl
", validation error, set dir to an
empty string.
The global language and direction declarations obtained here are used to set the language and base direction, respectively, for localizable strings without a declaration.
The iterator moves backwards through @context as the last language and direction declarations override any earlier ones.
(§ 4.3 Manifest Contexts) If a profile requires additional validation of the manifest context, those steps are performed here.
This extension step allows verification of any information a profile requires be present
in the manifest context (e.g., additional context URLs or parameters). These steps have to be
performed at this point, as @context
terms are removed as part of the data normalization in the next step. A more general
step for processing profile data is provided at a later
step.
For each term → value of manifest, set processed[term] to the result, when successful, of calling normalize data given term, value, lang, dir and base. If failure is returned, do not add term to processed.
The data normalization steps standardize the incoming manifest data to remove any authoring conveniences, such as the ability to use strings where objects or arrays are expected. The resulting processed data are added to the processed variable and are operated on in subsequent steps.
Set processed to the result of running data validation given processed.
The data validation checks ensure that the incoming data matches its expected value categories. Any restrictions on the expected values are also enforced at this step, and any invalid data is removed from the final representation.
If a profile specifies additional processing functions that need to be run, those steps are executed at this point.
Set processed to the result of running add default values, when successful, given processed and document, when specified. Otherwise, terminate processing, return failure.
This step checks if any information missing from the manifest can be obtained from the HTML document that links to the document, or from other sources.
Return processed.
For a visualization of the resulting structure, see § A. Internal Representation Data Model.
To normalize data for a property term's value, with the global language lang, global direction dir, base URL base, and optional context context run these steps:
Let normalized be the value of value.
The data normalization steps are performed on the copy of the incoming value held in the normalized variable defined in this step. This variable is returned at the end of a successful normalization process.
(§ 4.3 Manifest Contexts) If term is @context, return failure.
@context provides information for the initial processing of the manifest, but is not retained in the internal data representation. Returning a failure signals to remove the term.
(§ 4.2.7 Arrays) If, depending on context, term expects an array and value is not a list, set normalized to the list: « value ».
Various terms require their values to be arrays, but, for the sake of convenience, authors are allowed to use a single value instead of a one element array. For example,
{
…
"name" : "Et dukkehjem",
"author" : "Henrik Ibsen",
…
}
yields:
«[
…
"name" → « "Et dukkehjem" »,
"author" → « "Henrik Ibsen" »,
…
]»
(§ 4.2.4.2 Entities) If, depending on context, term expects an array of entities, for each entity of normalized:
if entity is a string, set entity to the map:
«[
"type" → « "Person" »,
"name" → entity
]»
otherwise, if entity is not a map, validation error, remove entity from normalized.
otherwise, if entity["type"] is not set, set it to the list:
« "Person" »
. If entity["type"] is set but
does not include the value Person
or Organization
, append the value
Person
to the list.
Creators (authors, editors, etc.), are expected to be explicitly defined as an object, but, for the sake of convenience, only their name has to be specified in the manifest. For example:
{
…
"author": "Ralph Ellison",
…
}
This rule converts such string values to maps with a default type of
Person
, yielding the following for the preceding example:
«[
…
"author" → «
«[
"type" → « "Person" »
"name" → "Ralph Ellison"
]»
»,
…
]»
For simplicity, the conversion of name to a localizable string is described by a later step.
(§ 4.2.4.1 Localizable Strings) If, depending on context, term expects an array of localizable strings, for each item of normalized:
if item is a string, set item to the map:
«[
"value" → item,
"language" → lang,
"direction" → dir
]»
if lang or dir is not set, or is an empty string, remove item["language"] or item["direction"], respectively.
otherwise, if item is not a map, validation error, remove item from normalized.
otherwise, process the map in item as follows:
If item["language"] is not set, set it to the value of lang when lang is set and is not an empty string.
Otherwise, if item["language"] is null, remove item["language"].
If item["direction"] is not set, set it to the value of dir when dir is set and is not an empty string.
Otherwise, if item["direction"] is null, remove item["direction"].
Natural language text values are expected to be explicitly defined as localizable string objects, but, for the sake of convenience, can be simple strings in the manifest. For example, if no language information has been provided via the global language declaration then:
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"name" : ["La Comédie humaine"],
…
}
yields:
«[
"name" → «
«[
"value" → "La Comédie humaine"
]»
»,
…
]»
If, however, an explicit language has been provided in the manifest, that language is added to the localizable string object. For example,
{
"@context" : [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{"language": "fr"}
],
"name" : ["La Comédie humaine"],
…
}
yields:
{
"name" → «
«[
"value" → "La Comédie humaine"
"language" → "fr"
]»
»,
…
}
A local setting or a local null
value prevents the global value from
taking effect.
{
"@context" : [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{"language":"fr"}
],
…
"name" : [{
"value" : "La Comédie humaine"
}],
"publisher" : [{
"type":["Organization"],
"name":[{
"value": "Hachette",
"language": null
}]
}],
…
}
yields:
{
"name" → «
«[
"value" → "La Comédie humaine"
"language" → "fr"
]»
»,
"publisher" → «
«[
"type" → « "Organization" »,
"name" → «
«[
"value" → "Hachette",
]»
]»
»,
…
}
(§ 4.2.4.3 Linked Resources) If, depending on context, term expects an array of LinkedResources, for each resource of normalized:
if resource is a string, convert resource to the map:
«[
"type" → « "LinkedResource" »,
"url" → resource
]»
otherwise, if resource is not a map, validation error, remove resource from normalized.
otherwise, if resource["type"] is not set, set it to the list:
« "LinkedResource" »
. If resource["type"]
is set but does not include the value LinkedResource
, append that value to
the list.
Resource links are expected to be explicitly designed as an object of type
LinkedResource
, but, for the sake of convenience, only their
absolute or relative URL has to be specified in the manifest. For example,
{
…
"resources" : [
"css/book.css",
…
],
…
}
This step converts the string values to objects, yielding the following for the preceding example:
«[
…
"resources" → «
«[
"type" → « "LinkedResource" »,
"url" → "css/book.css"
]»,
…
»,
…
]»
For simplicity, the conversion of relative paths to absolute is described by a later step.
(§ 4.2.5 URLs) If, depending on context, term expects a URL or array of URLs:
if normalized is a string, set normalized to the result of running convert to absolute URL, when successful, given normalized. If failure is returned, return failure.
otherwise, if normalized is a list, for each item of normalized, set item to the result of running convert to absolute URL, when successful, given normalized. If failure is returned, remove item from normalized.
otherwise, validation error, return failure.
Relative URLs in the manifest are resolved against the base value to obtain absolute URLs. For example:
"url": "chapter01.html"
for a publication hosted at
https://example.org/publications/wuthering-heights
would yield:
"url" → "https://example.org/publications/wuthering-heights/chater01.html"
(§ 8. Modular Extensions, extension point) If a profile defines processing steps for profile-specific terms, those steps are executed at this point.
Recursively check normalized as follows to ensure that all properties get normalized:
if normalized is a list, for each item of normalized that is a map:
if item["type"] is set and includes a recognized type, for each key → keyValue of item, set key to the result of running normalize data, when successful, given key, keyValue, lang, dir, base and using item["type"] as the context. If failure is returned, remove key from item.
otherwise, do nothing.
otherwise, if normalized is a map:
if normalized["type"] is set and includes a recognized type, for each key → keyValue of normalized, set key to the result of running normalize data, when successful, given key, keyValue, lang, dir, base and using normalized["type"] as the context. If failure is returned, remove key from normalized.
otherwise, do nothing.
otherwise, do nothing.
To ensure that all the properties in the manifest get processed, this step recursively checks normalized for additional map entries to process. If normalized is a list, each item is inspected to determine if it is a map that can be processed.
If a failure is returned, the item is removed from the map.
return normalized.
To convert to absolute URL url, with a base URL base, run the following steps:
If url or base is not a string, or is an empty string, validation error, return failure.
This step checks that both url and base are non-empty strings before attempting to use them.
Set url to the result of running the URL parser [url], when successful, with url as input and base as the base URL. If failure is returned, validation error, return failure.
This step calls the URL parser function on the url to be processed. If the url is not an absolute URL, the parser converts it to one using the base URL.
If parsing returns a failure, a failure is returned to the caller to indicate to remove the URL.
Return url.
To perform data validation on map data, run the following steps:
For each term → value of data, set term to the result of running the global data checks, when successful, given term and value. If failure is returned, remove data[term].
This step passes each entry to a set of global validation checks that need to be run on the value and recursively on any properties within the value.
A failure is returned if the property is invalid and has to be removed.
If a profile specifies data validation checks, those steps are executed at this point.
Profile validation steps are prioritized over the default steps so that if profiles have, for example, different default values to apply, those values get applied.
(§ 4.5
Publication Types) If data["type"] is not set or is an empty list, validation error, set to
« "CreativeWork" »
.
(§ 4.7.1.2
Accessibility) If data["accessModeSufficient"] is set, for each
item of data["accessModeSufficient"], if item["type"]
is not set or does not contain
"ItemList
", remove
item from data["accessModeSufficient"].
(§ 4.7.1.4 Canonical Identifier) If data["id"] is not set or is an empty string, validation error.
(§ 4.7.1.6 Duration) If data["duration"] is set and is not a valid duration value, per [iso8601-1], validation error, remove data["duration"].
(§ 4.7.1.7 Last Modification Date) If data["dateModified"] is set and is not a valid date or date-time per [iso8601-1], validation error, remove data["dateModified"].
(§ 4.7.1.8 Publication Date) If data["datePublished"] is set and is not a valid date or date-time per [iso8601-1], validation error, remove data["datePublished"].
(§ 4.7.1.9 Publication Language) If data["inLanguage"] is set, for each item of data["inLanguage"], if item is not well-formed [bcp47], validation error, remove item from data["inLanguage"].
(§ 4.7.1.10 Reading Progression Direction) If
data["readingProgression"] is not set, set to "ltr
".
Otherwise, if it is not one of the required
directional values, validation error, set
to "ltr
".
(§ 5. Publication Resources) Obtain and verify the unique URLs within the publication bounds as follows:
If readingOrder is set, let readingOrderURLs be the result of running get unique URLs given readingOrder. Otherwise, let readingOrderURLs be an empty ordered set.
If resources is set, let resourcesURLs be the result of running get unique URLs given resources. Otherwise, let resourcesURLs be an empty ordered set.
Set data['uniqueResources'] to the union of readingOrderURLs and resourceURLs.
This step gets the list of unique URLs within the reading order and the resource list. It then sets data['uniqueResources'] the union of these two sets, which represents the complete list of unique resources within the bounds of the publication.
This step also warns if either the readingOrder or resources contains duplicate resource declarations. The validation errors are emitted as part of obtaining the unique URLs from each list.
(§ 4.7.2.3 Links) If data["links"] is set, for each link in data["links"]:
let url be the result of running URL serializer [url] on link["url"] with the exclude fragment flag set.
if data["uniqueResources"] contains url, validation error, remove link from data["links"], then continue.
if link["rel"] is not set or is an empty list, validation error, then continue.
if link["rel"]
contains any of the
case-insensitive values "contents
", "pagelist
" or
"cover
", validation
error, remove
link from data["links"].
After obtaining the list of unique publication resources in the previous step, the links property is checked to ensure that any linked resources are not also listed as publication resources.
If the link does not specify a rel
value, a warning is raised. If its
rel
property specifies a structural resource, the link is removed,
as structural resources have to be within the publication bounds.
(§ 4.8.1 Structural Resources) Verify the use of structural relations as follows:
Set resources to the value of data["readingOrder"], when defined, otherwise to an empty list. Extend resources with data["resources"], when defined.
If more than one item in
resources has a rel
entry that contains the
case-insensitive value "contents
", validation error.
If more than one item in
resources has a rel
entry that contains the
case-insensitive value "pagelist
", validation error.
If more than one item in
resources has a rel
entry that contains the
case-insensitive value "cover
", validation error.
If the cover(s) have an encodingFormat
entry that specifies an
image media type (image/*
), and do not have a name
entry, validation error.
This checks the resources specified in the reading order and resource list to verify that only one instance of a table of content, page list and cover have been specified.
For covers, it also checks that a name has been set on image-based formats for accessibility purposes.
For each term → value of data, if running remove empty arrays given the variables term and value returns failure, remove data["term"].
As the processing of the manifest involves removing invalid values at various stages, the final data structure might end up with some lists that not no longer contain any values. This step iterates back over the data and removes any such empty lists.
Return data.
To process the global data checks on a property term's value with an optional context context, run these steps:
(§ 4.2 Value Categories) If term has a known value category, set value to the result of calling verify value category, when successful, given the variables term, value and context. If failure is returned, return failure.
Otherwise, return value.
This step verifies that the value of the term matches the expected category required for the term. For example, the abridged term requires a boolean value, so any other value used with the term will result in a failure.
If a failure occurs calling the function, this step also returns a failure so that the property is removed from the final data set.
Terms without a known value category are not processed, so the incoming value is returned.
Recursively descend into value as follows to check any sub-properties first:
if value is a map:
if value["type"] includes a recognized type, for each key → keyValue of value, set value[key] to the result of running global data checks, when successful, given key, keyValue and using value["type"] as the context. If failure is returned, remove value[key].
otherwise, do nothing.
otherwise, if value is a list, for each item of value, if item is a map:
if item["type"] includes a recognized type, for each key → keyValue of item, set item[key] to the result of running global data checks, when successful, given key, keyValue and using item["type"] as the context. If failure is returned, remove item[key].
otherwise, do nothing.
otherwise, do nothing.
To ensure that all the properties in the manifest get processed, this step recursively checks each entry for additional map entries to process. If the value is a list, each item is inspected to determine if it is a map that can be processed.
Its placement also ensures that all subproperties are checked first, so that the higher-level checks later in the step are tested after any invalid values are removed.
(§ 4.4.1
Global Declarations and § 4.4.2 Item-Specific
Declarations) If term expects an array
of
, for each
item of value:LocalizableStrings
if item["value"] is not set, remove item from value.
if item["language"] is set and its value is not well-formed [bcp47], validation error, remove item["language"].
if item["direction"] is set and its value is not one of
"ltr
" or "rtl
", validation error, remove
item["direction"].
This step checks that localizable strings have values, that their language declarations are well formed, and that their direction declarations have either the value "ltr" or "rtl".
(§ 4.2.4.2 Entities) If term expects an array of entities, for each item of value, check whether item["name"] is set:
If not, validation error, remove item from value.
This step ensures that all entities have a name. Entities without a name are removed.
(§ 4.2.4.3
Linked Resources) If term expects an array of LinkedResources
, for each
resource of value:
if resource["url"] is not set, or its value is an empty string, validation error, remove resource from value, then continue.
Otherwise, if resource["url"] is not a valid URL [url], validation error, remove resource from value, then continue.
if resource["duration"] is set and is not a valid duration value, per [iso8601-1], validation error, remove resource["duration"].
This step performs the following two checks on the terms of a
LinkedResource
:
LinkedResource
is
removed.Return value.
To verify value category of a property term's value with a context context, run these steps:
If, depending on the context, term expects an array:
if value is not a list, validation error, return failure.
otherwise, for each item of value:
if item does not match the expected value category of the array, validation error, remove item from value, then continue.
if item is a map, for each key → keyValue of item, if key has an expected value category, set key to the result of running verify value category given key, keyValue, and using item["type"] as the context. If the result of processing item is an empty map, validation error, remove item from value.
If the result of processing value is an empty array, validation error, return failure.
Otherwise, if, depending on the context, term expects a map:
if value is not a map, validation error, return failure.
otherwise, for each key → keyValue of value, if key has an expected value category, set key to the result of running verify value category given key, keyValue and using value["type"] as the context. If the result of processing value is an empty map, validation error, return failure.
Otherwise, if, depending on the context, value does not match the expected value category of term, validation error, return failure.
Return value.
This function checks that the value of the term being processed matches its expected value category. The function is recursively called when the value is a list or map to ensure that all properties in the manifest get checked.
To get unique URLs from resources, run the following steps:
Let uniqueURLs be an empty ordered set.
For each resource of resources:
let url be the result of running URL serializer [url] on resource["url"] with exclude fragment flag set.
if uniqueURLs contains url, validation error. Otherwise, append url to uniqueURLs.
if resource["alternate"] is set, for each alternate of resource["alternate"]:
let alt_url be the result of running URL serializer [url] on alternate["url"] with exclude fragment flag set.
if uniqueURLs contains alt_url, validation error.
otherwise, append alt_url to uniqueURLs.
Return uniqueURLs.
This function takes a list of
objects — from either the
reading order or resource list — and returns the set of unique URLs. If duplicates are encountered,
warnings are issued.LinkedResource
To remove empty arrays from a property term's value, run these steps:
If value is an empty list, return failure.
Otherwise, if value is a map, for each key → keyValue of value, if running remove empty arrays given key and keyValue returns failure, remove value[key].
This function checks that the value of the term being processed is not an empty list. A term that initially has a list can lose entries as it gets processed (i.e., when the list items are invalid).
To add default values for missing properties in map data with an optional HTML Document (DOM) Node [html] document, run the following steps:
(§ 4.7.1.11 Title) If data["name"] is not set:
Let title be an empty map. Set its values as follows:
if document is set, if the title
element [html] of document is set and is not
empty, set title["value"] to the text content of the
title
element.
Set title["language"] to the language [html], if
available, and title["direction"] to the base direction [html] if that value is available and its value is
either "ltr
" or "rtl
".
otherwise, validation error, generate a value for title["value"] (see the separate note for details). Set title["language"] and title["direction"] as appropriate for the generated title.
« title »
.This step adds the content of the title
element of document
when the name
property is not specified in the manifest. For
example:
<html>
<head lang="en">
<title>The Golden Bough</title>
…
<script type="application/ld+json">
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
…
}
</script>
yields:
«[
…
"name" → «
«[
"value" → "The Golden Bough",
"language" → "en"
]»
»,
…
]»
(§ 4.7.2.1 Default Reading Order and § 6.1 Linking) If data["readingOrder"] is not set:
if either document or document.URL is not set, fatal error, return failure.
set data["readingOrder"] to an empty list and append the map
«[ "url" → document.URL ]»
.
append document.URL to data["uniqueResources"].
If the Digital Publication consists only of the referencing document, the default reading order can be omitted; it will consist, automatically, of that single resource.
If a profile specifies default values the user agent has to generate, those steps are executed at this point.
(§ 6.1 Linking) If document.URL is set and data["uniqueResources"] does not contain document.URL, validation error.
If the page that links to the manifest is not listed as a unique resource of the publication after processing core and extension default value rules, an error is raised as it has to be a publication resource.
Return data.
The manifest format defined in this specification is designed to be implemented and extended by publishing communities in the production of new profiles (e.g., audiobooks and scholarly publications). The flexibility the manifest format offers allows it to be tailored to each community's specific needs while also providing a common base for user agents that need to process the profiles (i.e., minimizing the differences between each profile and simplifying interoperability).
For a profile to be compatible with this specification, the following conditions MUST be met:
conformsTo
property.Adding an example of a term added by, e.g., the audiobook profile would be a good idea, when available.
As the manifest is expressed using JSON-LD, the privacy and security considerations [json-ld11] detailed in that specification are applicable to all profiles of the manifest.
Some additional general considerations for profiles include:
More specific security and privacy considerations are left to each profile to detail, as these will vary depending on the nature of the digital publication format.
This section is non-normative.
The manifest includes several authoring conveniences, such as default values, the ability to use strings where objects would normally be required, and the automatic compilation of information from other sources (e.g., for the title and reading order). The processing of the manifest normalizes these conveniences and results in a consistent data set for user agents (the internal representation), but this set is not easily visualized from the processing algorithm.
This appendix provides an informative abstract data model using [WebIDL] that describes the resulting data structure. This definition expresses the expected names, datatypes, and possible restrictions for each member of the manifest after processing.
The choice of WebIDL is only for illustrative purposes. This specification does not define an API for exposing the manifest data.
PublicationManifest
Dictionarydictionary PublicationManifest {
sequence<DOMString> type = "CreativeWork";
required DOMString profile;
sequence<DOMString> conformsTo;
DOMString id;
boolean abridged;
sequence<DOMString> accessMode;
sequence<DOMString> accessModeSufficient;
sequence<DOMString> accessibilityFeature;
sequence<DOMString> accessibilityHazard;
sequence<LocalizableString> accessibilitySummary;
sequence<Entity> artist;
sequence<Entity> author;
sequence<Entity> colorist;
sequence<Entity> contributor;
sequence<Entity> creator;
sequence<Entity> editor;
sequence<Entity> illustrator;
sequence<Entity> inker;
sequence<Entity> letterer;
sequence<Entity> penciler;
sequence<Entity> publisher;
sequence<Entity> readBy;
sequence<Entity> translator;
sequence<DOMString> url;
DOMString duration;
sequence<DOMString> inLanguage;
DOMString dateModified;
DOMString datePublished;
TextDirection readingProgression = "ltr";
required sequence<LocalizableString> name;
required sequence<LinkedResource> readingOrder;
sequence<LinkedResource> resources;
sequence<LinkedResource> links;
sequence<DOMString> uniqueResources;
};
enum TextDirection {
"ltr",
"rtl"
};
LinkedResource
Dictionarydictionary LinkedResource {
required DOMString url;
DOMString encodingFormat;
sequence<LocalizableString> name;
sequence<LocalizableString> description;
sequence<DOMString> rel;
DOMString integrity;
DOMString duration;
sequence<LinkedResource> alternate;
};
Entity
Dictionarydictionary Entity {
sequence<DOMString> type;
required sequence<LocalizableString> name;
DOMString id;
DOMString url;
sequence<DOMString> identifier;
};
LocalizableString
Dictionarydictionary LocalizableString {
required DOMString value;
DOMString language;
TextDirection direction;
};
This appendix depends on the Infra Standard [infra].
To select an alternate resource for a
resource, run the following steps.LinkedResource
If successful, this algorithm returns an alternate resource. Otherwise, it returns failure.
Let possibleAlternates be an empty list.
If resource["alternate"] is not set, return failure.
For each alternate of resource["alternate"]:
if alternate["encodingFormat"] is set and the user agent supports the specified media type, append to possibleAlternates.
otherwise, if a profile defines additional selection criteria, evaluate alternate against them in this extension step.
otherwise, optionally inspect alternate["url"] for clues about the media type. If the resource appears to be supported, append alternate to possibleAlternates.
If possibleAlternates is an empty list, return failure.
Otherwise, if the size of possibleAlternates is 1, return the resource from possibleAlternates.
Otherwise, return a resource from possibleAlternates as determined by the user agent.
This function iterates the alternative formats for a resource and compiles a list of possibilities. If more than one possibility is found, the user agent determines how to prioritize and select the best alternative.
User agents are not required to add alternatives to the list of possibilities if they do not specify an explicit media type.
This section is non-normative.
To facilitate navigation within pages and across sites, HTML uses the nav
element [html] to express lists of links. Although generic in nature
by default, the purpose of a nav
element can be more specifically identified by use of
the role
attribute [html]. In particular, the doc-toc
role from
the [dpub-aria-1.0] vocabulary
identifies the nav
element as the digital
publication's table of contents.
Including an identifiable table of contents is an accessible way to produce any digital publication, but due to the flexibility of HTML markup, it also presents challenges for user agents trying to extract a meaningful hierarchy of links (e.g., to provide a custom view available from any page). To avoid duplicating the tables of contents for different uses, this section defines a syntax that is both human friendly and commonly used while still providing enough structure for user agent extraction.
Authors have a choice of lists (ordered or unordered) to construct their table of contents. By
tagging each link within these lists in anchor tags (a
elements), user agents can easily differentiate the information they
need from any peripheral content (asides) or stylistic tagging that has also been added. The table
of contents can consist of both active links (with an href
attribute) and inactive
links (excluding the href
attribute), providing additional flexibility in how the table
of contents is constructed (e.g., to omit links to certain headings or only link to certain content
in a preview).
Note, however, that user agents are not required to preserve the presentational aspects of the table of contents (i.e., the user agent is typically extracting the information in order to present it in a common way across all publications). User agents are only expected to retain the text content of the link elements, for example, so text styling, inline images and other non-text content might be lost. Similarly, list styling and even how many levels deep of linking to display are at the discretion of the user agent. For this reason, linking to the presentational table of contents so that users are not limited to the machine-processed one is advised.
The table of contents is expressed via an [html] element (typically a nav
element). This element MUST be identified by the role
attribute [html] value "doc-toc
" [dpub-aria-1.0], and MUST be the first element in the document in document tree order [dom]
with that role
value. The element MAY be hidden from
users.
The manifest SHOULD identify the resource that contains the table of contents.
Although the content model of the nav
element is not restricted, user agents will only
be able to extract a usable table of contents when the following markup guidelines are followed:
Although a title for the table of contents is optional, to avoid having a user agent generate
a placeholder title when one is needed, it is advised to add one. Titles are specified using
any of the [html] h1
through h6
elements. Note that only the first such
element is recognized as the title. If a heading element is not found before the list of links, user agents will assume that one has not been
specified.
The first [html] ol
or ul
list element encountered in the nav
element is
assumed to contain the list that defines the links into the content. This list will be found
even if it is nested inside of div
elements, for example, as the algorithm ignores elements that are not relevant to its
processing. The list cannot occur inside of any skipped
elements, however, since their internal contents are not evaluated.
If the nav
element does not contain one of these elements, then user agents will
not register the digital publication as containing a usable table of contents (e.g., a
machine-rendered option will not be available).
If the table of contents is considered as a tree of links, then each list item (li
element) inside of the list of
links represents one branch. Each of these branches has to have a name and optional
destination in order to be presented to users, and this information is obtained from the
first a
element found within the list item, wherever it is nested (again, excluding any
a
elements inside of skipped
elements.)
The link destination for the branch is obtained from the a
element's
href
attribute, when specified. This attribute can be omitted if a link is
not available (e.g., in a preview) or not relevant (e.g., a grouping header). When providing
a link into the content, it is also possible to specify the relation of the linked document
(in a rel
attribute) and the media type of the linked resource (in a
type
attribute).
After finding the a
element that labels the branch, user agents will continue to
inspect the markup for another list element (i.e., sub-branches). If a list is found, it is
similarly processed to extract its links, and so on, until there are no more nested branches
left to process.
A small set of elements are ignored when the parsing table of contents to avoid misinterpretation. These are the [html] sectioning content elements and sectioning root elements. The reason they are ignored is because they can define their own outlines (i.e., they can represent embedded content that is self-contained and not necessarily related to the structure of content links).
Any element that has its hidden
attribute set is also skipped, since hidden elements are
not intended to be directly accessed by users.
Although these elements can be included in the nav
element, care has to be taken
not to embed important content within them (e.g., do not wrap a section
element
around the list item that contains all the links into the content).
All elements that are not relevant to extracting the table of contents, and are not skipped, are ignored. Unlike skipped elements, ignoring means that user agents will continue to search inside them for relevant content, allowing greater flexibility in terms of the tagging that can be used.
This section is non-normative.
This section depends on the Infra Standard [infra].
This section defines an algorithm for extracting a table of contents from a nav
element.
It is defined in terms of a walk over the nodes of a DOM tree, in tree order [dom],
with each node being visited when it is entered and when it is exited during the
walk. Each time a node is visited, it can be seen as triggering an enter or exit
event. In some steps, user agents are provided a choice in how to process the content to provide
flexibility for different presentation models.
This algorithm is not defined in purely event driven terms, as inspecting all descendant nodes is not always necessary to obtain the needed information from the DOM. In some cases, an element, and all its descendants, is skipped immediately after it is processed on enter. An event approach could be applied but would require modifying the algorithm to process/ignore the skipped nodes.
User agents can process and internalize the resulting structure using any language that can represent the final form of the data.
For the purposes of this algorithm, a list element is defined as
either an [html] ol
or ul
element.
The following algorithm MUST be applied to a walk of a DOM subtree rooted at
the first element in document order with the role
attribute value doc-toc
,
regardless of whether the element has been declaratively hidden [html] or styled by CSS not to be visible:
The rules for locating the resource containing the table of contents element are defined in § 4.8.1.3 Table of Contents.
If a table of contents element is not found, the publication does not have a table of contents that can be used for machine rendering purposes.
Let toc be the map
«[ "name" → "", "entries" → « » ]»
representing the table of contents.
This step initializes the map that will store the title and the branches of the table of contents. In this map:
Initialize the stack branches to hold branches of the table of contents as they are created.
The stack is used to hold branches that are not yet complete. As a new sub-branch is encountered, the parent gets pushed onto the stack so it can be retrieved later.
Let current_toc_node be a variable set to null.
current_toc_node is used to hold the map that represents the branch of the table of contents that is currently being processed.
Walk over the DOM in tree order [dom], starting with the element the table of contents is being built from, and trigger the first relevant step below for each element as the walk enters and exits it.
When entering a heading content element:
Run these steps:
If branches is empty, and toc["name"] is an empty string, set toc["name"] to one of the following:
If the resulting value of toc["name"] is an empty string (e.g., after removing any presentational elements and trimming all leading and trailing whitespace), set toc["name"] either to a placeholder value or to null.
This step identifies the heading for the table of contents. A heading is only processed if the value of toc["name"] is an empty string (i.e., no headings have yet been encountered).
Whether a user agent sets name to the descendant content of the heading element, or generates a text string from it, depends on whether it will re-use any descendant tagging in the presentation (e.g., to retain images, MathML, ruby and other content that does not translate to text easily).
«[
"name" → "Contents",
"entries" → « »
]»
If name is not an empty string, or is null
, then a
previous heading has already been encountered or content has been encountered
that indicates the nav
element does not have a heading (e.g., a
list has already been processed, since the heading would not follow the list of
links).
«[
"name" → null,
"entries" → « »
]»
If a heading is not specified, the user agent can provide its own for later use.
When entering a list element:
Run these steps:
If the toc["name"] is an empty string, set toc["name"] to null.
If current_toc_node is not null:
Otherwise, if branches is empty:
This algorithm does not process multiple lists in a single branch or at the root
of the nav
element, so if a list has already been encountered (the
entries property contains one or more branches or is set to null), this list is
skipped.
If a list is encountered and the table of contents (toc) still does not have a name (i.e., no heading element has been encountered), the table of contents is assumed to not have a heading (i.e., the heading for the table of contents cannot appear after the first list of entries). The value of the name property is changed from an empty string to null as no further headings encountered apply, either.
When exiting a list element:
If branches is not empty, pop the top map from branches and set current_toc_node to it.
Otherwise, if toc.entries contains an empty list, set it to null.
This step resets current_toc_node back to the parent object after all of its child branches have been processed.
If there are no branches in the stack, the toc.entries is set to null if it doesn't contain any items (to avoid processing any further lists at the root level).
When entering a list item element, set current_toc_node to the following map:
«[
"name" → null,
"url" → null,
"type" → null,
"rel" → null,
"entries" → « »
]»
Each list item represents a possible new branch in the table of contents, so whenever one is encountered a new blank object is created in current_toc_node.
This object gets populated with information as a descendant a
element and list are encountered.
When exiting a list item element:
Run these steps:
If current_toc_node["entries"] contains an empty list, set it to null.
If current_toc_node["name"] is null or an empty string:
If branches is not empty, append current_toc_node to the entries property of the map at the top of branches. Otherwise, append current_toc_node to toc["entries"].
Set current_toc_node to null.
Exiting a list item indicates that processing of the current branch is complete. Before adding this branch to its parent's entries array, the branch needs to be tested to see if it has a name and/or any sub-branches. If it does not have a name but has sub-branches, the branch is kept. The user agent can either supply a placeholder value of its own creation or set the value to null. If it does not have a name or any branches, it is invalid and is discarded.
To determine where to merge the branch, the stack is checked. If there are no items in the stack, it is added into the entries property of the root toc object (i.e., it is a top-level branch). Otherwise, it gets added into the entries property of the object immediately preceding it in the stack.
As a final step, current_toc_node is reset back to
null
.
When entering an anchor element and current_toc_node is not null:
Run these steps:
If current_toc_node["name"] is not null, do nothing.
Otherwise:
Set current_toc_node["name"] to one of the following:
href
attribute and the URL in the
attribute resolves to a resource in uniqueResources, set current_toc_node["url"] to the
value.type
attribute, and the value of the
attribute is not an empty string after trimming leading and trailing
white space, set current_toc_node["type"] to the trimmed
value.rel
attribute, and the value of the
attribute is not an empty string after trimming leading and trailing
white space, split the trimmed value on whitespace and set
current_toc_node["rel"] to the resulting list of tokens.Skip further processing of the element and continue to the next.
This step processes anchor tags to obtain values for the name and url properties of a branch.
If the name of the current branch is already defined, then processing of this element is terminated (i.e., to avoid processing multiple links for a single branch).
Whether a user agent sets the name of the entry to the descendant
content of the a
element, or generates a text string from it,
depends on whether it will re-use any descendant tagging in the presentation
(e.g., to retain images, MathML, ruby and other content that does not translate
to text easily).
In addition to having an href
attribute specified, it is necessary
that it resolve to a resource that belongs to the digital publication to meet
the requirements of this specification. If not, the branch is retained but the
entry will not be linkable.
Additional information about the target of the link — the type of resource and its relation — is also retained.
«[
"name" → "In the Beginning",
"url" → "http://example.com/page1.svg",
"type" → "image/svg",
"rel" → null,
"entries" → « »
]»
When entering a sectioning content element, a sectioning root element, or an element with a hidden attribute:
Skip further processing of the element and continue to the next.
As sectioning and sectioning root elements can define their own outlines, descending into them poses problems for generating the table of contents (i.e., they may contain content that is not directly related). As a result, they are skipped over when encountered to prevent their child content from being processed.
Otherwise: do nothing.
For all other elements, this step allows their descendant elements to continue to be processed.
After completing the DOM walk, if toc["entries"] contains a non-empty list, return toc. Otherwise,
return null
.
If the entries
array in the root toc object does not contain any
branches (either because no list was found in the nav
element or the list
did not contain any conforming list items), then the algorithm did not produce a usable
table of contents.
Substantive changes since the First Public Working Draft:
For a complete list of issues addressed, refer to the GitHub tracker.
This section is non-normative.
This section is non-normative.
The following is a manifest with a basic set of metadata for an example book profile.
A JSON encoding of the internal representation of this manifest is also available.
{
"@context": [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{"language" : "en"}
],
"conformsTo": "https://example.com/publication",
"type": "Book",
"url": "https://publisher.example.org/mobydick",
"author": "Herman Melville",
"dateModified": "2018-02-10T17:00:00Z",
"readingOrder": [
"html/title.html",
"html/copyright.html",
"html/introduction.html",
"html/epigraph.html",
"html/c001.html",
"html/c002.html",
"html/c003.html",
"html/c004.html",
"html/c005.html",
"html/c006.html"
],
"resources": [
"css/mobydick.css",
{
"type": "LinkedResource",
"rel": "cover",
"url": "images/cover.jpg",
"encodingFormat": "image/jpeg"
},{
"type": "LinkedResource",
"url": "html/toc.html",
"rel": "contents"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneral.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneralBol.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneralBolIta.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneralItalic.otf",
"encodingFormat": "application/vnd.ms-opentype"
}
]
}
The following is a manifest for an example article profile. The article consists only of the document the manifest is embedded in. The title and reading order are omitted from the manifest, as these properties are automatically generated during processing from the title and URL of the containing document, respectively.
A JSON encoding of the internal representation of the manifest is also available, as well as a more elaborate version for the same document.
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Model for Tabular Data and Metadata on the Web</title>
<link href="#wpm" rel="publication" />
...
<script id="wpm" type="application/ld+json">
{
"@context" : [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{"language" : "en-US"}
],
"conformsTo" : "https://example.com/article",
"type" : "TechArticle",
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"copyrightYear" : "2015",
"copyrightHolder" : "World Wide Web Consortium",
"creator" : ["Jeni Tennison", "Gregg Kellogg", "Ivan Herman"],
"publisher" : {
"type" : "Organization",
"name" : "World Wide Web Consortium",
"id" : "https://www.w3.org/"
},
"datePublished" : "2015-12-17",
"resources" : [
"datatypes.html",
"datatypes.svg",
"datatypes.png",
"diff.html",
{
"type" : "LinkedResource",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv"
},
{
"type" : "LinkedResource",
"url" : "test.xlsx",
"encodingFormat" : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
],
}
</script>
</head>
<body>
....
<section id="toc" role="doc-toc">
<h2 resource="#h-toc" id="h-toc" class="introductory">Table of Contents</h2>
<ul class="toc">
<li class="tocline"><a class="tocxref" href="#intro">
<span class="secno">1. </span>Introduction</a>
</li>
...
</ul>
</section>
...
</body>
</html>
The following example shows a manifest that conforms to the Audiobooks profile [audiobooks].
A JSON encoding of the internal representation of this manifest is also available.
{
"@context": [
"https://schema.org",
"https://www.w3.org/ns/pub-context",
{"language": "en"}
],
"conformsTo": "https://www.w3.org/TR/audiobooks/",
"type": "Audiobook",
"id": "https://librivox.org/flatland-a-romance-of-many-dimensions-by-edwin-abbott-abbott/",
"url": "https://w3c.github.io/pub-manifest/experiments/audiobook/",
"name": "Flatland: A Romance of Many Dimensions",
"author": "Edwin Abbott Abbott",
"readBy": "Ruth Golding",
"publisher": "Librivox",
"inLanguage": "en",
"dateModified": "2019-11-14",
"datePublished": "2008-10-12",
"duration": "PT13774S",
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"abridged": false,
"accessMode": "auditory",
"accessModeSufficient": [{
"type": "ItemList",
"itemListElement": ["auditory"],
"description": "Audio"
}],
"accessibilityFeature": ["readingOrder", "unlocked"],
"accessibilityHazard": "noSoundHazard",
"accessibilitySummary": "This is just a test summary",
"readingProgression": "ltr",
"resources": [
{
"rel": "cover",
"url": "http://ia800704.us.archive.org/9/items/LibrivoxCdCoverArt12/Flatland_1109.jpg",
"encodingFormat": "image/jpeg",
"name": "Cover page with title and author"
},{
"rel": "contents",
"url": "toc.html",
"encodingFormat": "text/html"
},{
"rel": "accessibility-report",
"url": "a11y.html",
"encodingFormat": "text/html"
},{
"rel": "privacy-policy,",
"url": "privacy.html",
"encodingFormat": "text/html"
}
],
"readingOrder": [
{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1371S",
"name": "Part 1, Sections 1 - 3"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_2_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1669S",
"name": "Part 1, Sections 4 - 5"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_3_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1506S",
"name": "Part 1, Sections 6 - 7"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_4_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1669S",
"name": "Part 1, Sections 8 - 10"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_5_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1506S",
"name": "Part 1, Sections 11 - 12"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_6_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1798S",
"name": "Part 2, Sections 13 - 14"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_7_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1225S",
"name": "Part 2, Sections 15 - 17"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_8_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1371S",
"name": "Part 2, Sections 18 - 20"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_9_abbott.mp3",
"encodingFormat": "audio/mpeg",
"duration": "PT1659S",
"name": "Part 2, Sections 21 - 22"
}
]
}
This section is non-normative.
The following table identifies where manifest properties are defined and extended.
This section is non-normative.
The following table identifies where the use of resource relations is defined.
Name | Publication Manifest |
---|---|
accessibility-report
|
§ 4.8.2.1 Accessibility Report |
contents
|
§ 4.8.1.3 Table of Contents |
cover
|
§ 4.8.1.1 Cover |
pagelist
|
§ 4.8.1.2 Page List |
privacy-policy
|
§ 4.8.2.3 Privacy Policy |
preview
|
§ 4.8.2.2 Preview |
This section is non-normative.
The editors would like to thank the members of the Publishing Working Group for their contributions to this specification:
The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.