APEX@IGP

Infogrid Pacific-The Science of Information

6

The Manifest

ePub Packaging-3

ePub3 Packaging-3 discusses the manifest in detail. This is a critical component for Reading Systems to understand what is in the package, and whether they can use that contents. This topic discusses some of the more complex manifest structures. These are supported by demonstration ePub3 resources Updated: 2012-07-28

Epub3 packaging is not a lot different from ePub2. Epub3 provides more flexibility, but the real differences are in the wider types of content that can be packaged. The manifest has the job of letting reader applications know what to expect before a file is opened.

The manifest is substantially the same with the introduction of the properties attribute; and if you ever get around to using it SMIL media-overlay references. We are not discussing SMIL in this article.

The IGP approach to manifest creation is to assemble the files in a predetermined order so files can be located quickly if required.

There is no technical requirement for this as the manifest is nothing but a list of all files included in the package. We do this to make it easy if and when a production engineer needs to open and explore an ePub for any debugging or development purpose. It is just as easy to do it as not!

Our assembly sequence is highlighted with comments: 

<manifest>
<!-- navigation -->
   <item id="toc" properties="nav" href="toc.xhtml" media-type="application/xhtml+xml" />
<!-- covers -->
    <item id="cover" href="cover.html" media-type="application/xhtml+xml" />
    <item properties="cover-image" id="cover-image" href="SVG-Cover3Portrait.svg" media-type="image/svg+xml"/>
<!-- primary content -->
    <item id="s001" href="s001-Part-001.xhtml" media-type="application/xhtml+xml"/>
    <item id="s002" href="s002-Chapter-001.xhtml" media-type="application/xhtml+xml"/>
    <item id="s003" href="s003-Chapter-002.xhtml" media-type="application/xhtml+xml"/>
<!--css -->    
    <item id="css-001" href="css/book.css" media-type="text/css"/>
<!--images -->
    <item id="images-001" href="image1.jpg" media-type="image/jpeg"/>
<!--audio -->
    <item id="audio-001" href="audio/sound1.mp3" media-type="audio/mpeg" />    
<!--video -->
    <item id="video-001" href="video/video1.webm" media-type="video/webm" />
<!-- fonts -->
    <item id="font-001" href="fonts/font1.woff" media-type="application/font-woff" />
</manifest>

As content becomes more complex, keeping the files organized inside the OPF manifest can be a big time-saver in the packager development phase as it makes testing easier. Plus as enhancements and new requirements emerge, these are more easily aborbed and integrated if there is a reasonable attempt at organization. Of course if you are hand-packaging your files none of this matters particularly.

The Manifest Properties attribute

A big change for the manifest is the properties attribute. This attribute lets the reading device understand if there is anything in the content that may need special treatment or alternative display methods. The properties are:

  1. cover-image
  2. nav
  3. mathml
  4. svg
  5. scripted
  6. remote-resources
  7. switch

The properties cover-image and nav are two of the core properties to get a basc ePub 3 up and running. For example if a reader cannot find  properties="cover-image" it can reasonably assume there isn't one in the package and generate an avatar. It is possible for a packager to generate an avatar image at packing time, but at this stage we decided to leave that alone.

The packager has a lot of work to do to create the nav file so there is no evalation to be done, just put it into the package. Navigation is discussed in a separate topic.

We are not currently processing the switch property as there are a number of issues around this and how it will be supported by devices. So this is wait and see at this time.

The packager must evaluate all other HTML files for inclusion of elements and references that need to be marked: 

  1. Is there MathML inline in the file?  If yes, set the mathml property
  2. Is there SVG inline in the file? If yes, set the svg property
  3. Is there Javascript or HTML5 forms inline in the file? If yes, set the scripted property
  4. Are there any media links to permitted external resources in the file? If yes set the external-reference property.

If there are more than one properties in a file they are all included in the properties element. Here is an illustrative set for a test book containing the various items in combinations.

<manifest>
<!-- navigation -->
  <item id="toc" 
     properties="nav" href="toc.xhtml" 
     media-type="application/xhtml+xml" />
<!-- covers -->
  <item id="cover" 
     href="cover.html" media-type="application/xhtml+xml" />
  <item id="cover-image" 
     properties="cover-image" href="cover-image.svg" 
     media-type="image/svg+xml"/>
<!-- reading content -->
  <item id="s001" href="s001-Chapter-01.html" 
     media-type="application/xhtml+xml" />
  <item id="s002" 
     properties="mathml" href="s001-Chapter-01.html" 
     media-type="application/xhtml+xml" />
  <item id="s003" 
     properties="svg scripted" href="s002-Chapter-02.html" 
     media-type="application/xhtml+xml" />
  <item id="s004" 
     properties="mathml svg scripted" href="s003-Chapter-03.html" 
     media-type="application/xhtml+xml" />
  <item id="s005" 
     properties="external-reference" href="s004-Chapter-04.html" 
     media-type="application/xhtml+xml" />
</manifest>

 The specification states:

  1. There can only be one file with properties="nav"
  2. There can only be one file with properties="cover-image"

 In our ePub2 packager we used rather complex IDs for sections based on the ISBN. It isn't required so we have gone with "art-IDs" in the ePub3 packaging.

If the pages are available we always package in the sequence: navigation, covers, content-pages, CSS, images, script, and fonts always at the bottom. That is just our thing. There is no rule for it.

<manifest>
<item id="toc" properties="toc" href="toc.xhtml" media-type="application/xhtml+xml"/>    
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
<item id="cover" href="cover.xhtml" media-type="application/xhtml+xml"/>
<item id="cover-image" href="cover.svg" media-type="application/xhtml+xml"/>
    <item id="s001" href="s001-HalfTitle-01.xhtml" media-type="application/xhtml+xml"/>
    <item id="s002" href="s002-BookTitlePage-01.xhtml" media-type="application/xhtml+xml"/>
    <item id="s003" href="s003-Preface-01.xhtml" media-type="application/xhtml+xml"/>
    <item id="s004" href="s004-Part-001.xhtml" media-type="application/xhtml+xml"/>
    <item id="s005" href="s005-Chapter-001.xhtml" media-type="application/xhtml+xml"/>
    <item id="s006" href="s006-Chapter-002.xhtml" media-type="application/xhtml+xml"/>
    <item id="s007" href="s007-Chapter-003.xhtml" media-type="application/xhtml+xml"/>
    <item id="s008" href="s008-Part-002.xhtml" media-type="application/xhtml+xml"/>
    <item id="s009" href="s009-Chapter-004.xhtml" media-type="application/xhtml+xml"/>
    <item id="s010" href="s010-Chapter-005.xhtml" media-type="application/xhtml+xml"/>
    <item id="s011" href="s011-Chapter-006.xhtml" media-type="application/xhtml+xml"/>
    <item id="s012" href="s012-Part-003.xhtml" media-type="application/xhtml+xml"/>
    <item id="s013" href="s013-Chapter-007.xhtml" media-type="application/xhtml+xml"/>
    <item id="s014" href="s014-Chapter-008.xhtml" media-type="application/xhtml+xml"/>
    <item id="s015" href="s015-Chapter-009.xhtml" media-type="application/xhtml+xml"/>
    <item id="css-001" href="css/fx-ptb-sections.css" media-type="text/css"/>
</manifest>

File Naming

This is a big deal when it comes time to unravelling an ePub. ePub2 books have a mishmash of HTML and XHTML file extensions. We recommend sticking with the XHTML file extension for ePub3. In AZARDI it works well as the rendering engine recognizes this as an XHTML file type without any further effort.

To ensure nice file presentation in the package the HTML files use the ID sequence number, the Section type (retrieved from the construction IGP:FoundationXHTML) and the section sequence number. It is like have a TOC structure in the files.

In the interest of presentation flexibility AZARDI turns each XHTML5 page into standard HTML5 after extraction of the content and loading it into memory. This makes the presentation of non-ePub3 standard HTML5 content easy when required. This is especially important for extensive forms and interactive content. However if a file has the HTML extension it must turn the file into XHTML first to ensure all UTF-8 encoding conforms to the XHTML 1.0 DTD.

The inclusion of cover-image in the manifest is a big thing for readers and removes a significant problem of ePub2. Now it remains to be seen how devices and readers will handle it.

Notice that this example includes the ncx reference which makes it an ePub3+2 transitional package (if the guide is also included in the final OPF).

Cover

ePub2 books mostly have a cover.html in the spine. This is not needed now and the reader should be responsible for cover presentation.

At present AZARDI will only show a cover if there is a reference to the cover-image. Internally it (and most readers) are are not interested in the HTML page. If there is no image found AZARDI will display an avatar image showing the metadata title and creator.

We originally had a large "search for the cover" algorithm, but decided to drop it in favour of the ePub3 properties approach. We still look for the meta cover ID, but nothing else. That means some, especially older ePub2s may not display covers in AZARDI at present.

AZARDI

You can see ePub3 files in operation in AZARDI.

The AZARDI desktop reader has very high conformance support for all properties. Because all of these properties are supported natively in AZARDI it does not have to use the properties references for display.

AZARDI supports MathML, SVG, external references and scripted.

GET AZARDI HERE

comments powered by Disqus