APEX@IGP

Infogrid Pacific-The Science of Information

4

The ePub 3 Package

ePub3 Packaging-1 This first topic is a quick overview of a standard ePub3 package Updated: 2012-07-28

An ePub package is simple.

  1. Create a linked collection of XHTML, CSS, images, and possibly audio, video and javascript files.
  2. Add to this a number of ePub defined files:  a mime-type text file, a how to use this content container.xml file
  3. Add a package information file called the OPF (Open Package Format) containing metadata, a manifext and a spine.
  4. Add a Table of Contents file for moving around the content.
  5. Zip together and renamed as *.epub.

There are rules of course. There are always rules!       

Packaging objectives

The Infogrid Pacific (IGP) core ePub3 packaging strategy is simple and lean - learning from the lessons of the last four years. Theoretically no-one needs to go inside an ePub package once it has been assembled, but every now-and-then some customization may have to be applied to an ePub, or more likely a strange behaviour has to be examined and understood. Content can be crazy stuff!

The IGP ePub3 package is designed to be well organized and easily used by any production engineer who may need to open the package for any reason. The packaging objectives were:

  1. Simple IDs (that makes link, spine, toc.html, etc. checking very easy), but IDs that can easily handle the modern trend of hundreds of chapters/sections.
  2. Structurally named and sequenced content filenames so they can be quickly located, even with deeply nested sections such as text-books.
  3. DC only metadata. Don't get clever with Metadata in the ePub until there is a market driven requirement.
  4. Exploit the new ePub 3 navigation structures to the limit.
  5. Package conformance. It is always a valid ePub 2, 2+3 or 3.
  6. Make sure all files are in a tidy and predictable structure.
  7. Make it easy to add the ePub3 extras and new things as demand arises.

Package Root

The ePub3 package root contains three items:

mime-type (file)
META-INF (Directory)
OPS (Directory)

mime-type - a UTF encoded statement that is the first file in the package. It says: application/epub+zip

META-INF - There has to be one directory with this name. It must have the mandatory container.xml file inside, which points to the OPF file. This is the starting point for any reader.

OPS - contains all text and other content files in the package. Importantly it contains the defining ePub OPF file, which for a single document package in our system is always named package.opf. No need to be creative here.

We also name the primary navigation page toc.xhtml. This makes it fast and easy to find and locate these files when required. There can only be one toc page in the package so again, creative naming is not required.

META-INF

The META-INF directory must have one container.xml file. This file can remain the same for every book with a simple package strategy. It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<container  version="1.0">
    <rootfiles>
        <rootfile full-path="OPS/package.opf" 
            media-type="application/oebps-package+xml"/>
    </rootfiles>
</container>

The important job this little XML nugget does is it tells the reading system where to find the OPF file, relative to the package root. It is effectively the start up instructions for any ePub 3 reader.

There is no restriction to a single book file here. An ePub3 package could contain different renderings of the same book (but not different books), for example a text version and an image version. They must have clearly separate directories and be two different and complete packages.

We don't need to get worked up with the multiple renditions issue just yet. We have to know that there is a reader available that can use it first. 

There are other optional files that can go into the META-INF directory such as DRM information digital signatures, etc. We don't need to worry about those unless the ePub is having DRM applied.

OPS

In addition to the previously mentioned OPF file, all HTML content and components are in a folder called OPS (Open Package Structure). Remember this is the IGP package. The directory can be any name you like. Sticking to the spec. examples makes a tidy package, and it is clean and easy to unassemble and hand-assemble an ePub when required.

The package.opf, toc.xhtml and all content xhtml files are directly in the root of  the OPS directory in the IGP package.

All other files and components are in sub-directories immediately inside the OPS. There are no nested directories permitted in our packaging (currently).

alt

The Packaging Police

Now that Javascript is allowed, "standard-library" bloat is something to guard against.

For example by default JQuery and JQueryUI come with a large number of skin structures. With automated packaging we have to specifically prevent overly-clever (and careless) content assembly getting into and bloating a package with unused external library components from standard packages.

You can end up with a manifest full of unused files, in a perfectly valid ePub; or a manifest full of large files of which the ePub only uses a small part.

IGP packaging puts the following directories inside the OPS directory if there is content to include in the directories. There are no empty directories allowed. If there are no files for a directory it is removed from the package.

  1. css - contains all CSS files
  2. js - contains all Javascript files
  3. images - contains all PNG, JPG and SVG images
  4. audio - contains all audio files in the package
  5. video - contains all video files in the package
  6. proprietary - contains any PDF, SWF, Silverlight or other non-standards files
  7. fonts - contains all fonts in all formats.

Discussion on file placement

In our ePub2 packager we placed all files into the root of the package. There is no reason not to do this. A good reason to do this is that there are no file paths and the package construction is considerably simplified. The test books provided by the IDPF put all the files in a package directory /EPUB/. ePubs created by Calibre put them directly in the package root.

Some ePub2 packaging systems put the HTML files in a separate directory, Sigil is an example. Again, there is no reason why this cannot be done.

With the IGP ePub3 packaging we walked deftly down the center.

All files go into a single directory /OPS/. This has the advantage that all the content can be moved from the package as a single directory. We also don't like the structural linking complication of putting the XHTML files into a separate directory, mainly because it is un-necessary. So the core content, the text XHTML pages, at the same level as the toc.xhtml and package.opf are all in the root of the /OPS/ folder. All XML/XHTML together.

All the other files - CSS, Javascript, Fonts, Images, Media, etc. are placed in the /OPS/ in separate nested directories more or less defined by mimetype. There is probably no real reason to separate CSS (which is intimately tied to the XHTML), except our experience with EPUB3 UNLEASHED and a number of commercial fixed-layout products hint that CSS is a sleeping monster, especially with highly interactive publications. So CSS gets a directory. 

There is no pro or con on internal directories and file location as long as files and paths are present and correct. If you do have to open and examine an ePub you will find our strategy very straight-forward and fast to use. Who wants to navigate back and forth between directories!

Mime-type

This file must be in the root and the first file in the zip package.

It is named mimetype and contains the following text always encoded as ASCII and no no trailing end-of-line.

application/epub+zip

Final Package form

mimetype
[META-INF] 
    container.xml
[OPS]
    package.opf
    toc.xhtml
    *.html
    [css]
    [images]
    [svg]
    [mathml]
    [audio]
    [video] 
    [proprietary]
    [fonts]

 You can download ePub3 sample books and open them to see the construction of the IGP package.

To look inside, rename the *.epub to *.zip, and then extract the file. If you haven't done it before, make a copy of the ePub so you don't destroy your favorite novel.

References & Resources

EPUB Open Container Format (OCF) 3.0

EPUB 3 Sample files

AZARDI

You can see ePub3 files in operation in AZARDI.

The AZARDI desktop reader has high conformance support for all properties. Because all of these properties are supported natively in AZARDI it does not have to use the properties references for display.

AZARDI supports MathML, SVG, Javascript and SMIL audio overlay content.

Download the free version of Azardi here

comments powered by Disqus