ePub3 Packaging-1 This first topic is a quick overview of a standard ePub3 package Updated: 2012-07-28
An ePub package is simple.
There are rules of course. There are always rules!
The Infogrid Pacific (IGP) core ePub3 packaging strategy is simple and lean - learning from the lessons of the last four years. Theoretically no-one needs to go inside an ePub package once it has been assembled, but every now-and-then some customization may have to be applied to an ePub, or more likely a strange behaviour has to be examined and understood. Content can be crazy stuff!
The IGP ePub3 package is designed to be well organized and easily used by any production engineer who may need to open the package for any reason. The packaging objectives were:
The ePub3 package root contains three items:
mime-type - a UTF encoded statement that is the first file in the package. It says: application/epub+zip
META-INF - There has to be one directory with this name. It must have the mandatory container.xml file inside, which points to the OPF file. This is the starting point for any reader.
OPS - contains all text and other content files in the package. Importantly it contains the defining ePub OPF file, which for a single document package in our system is always named package.opf. No need to be creative here.
We also name the primary navigation page toc.xhtml. This makes it fast and easy to find and locate these files when required. There can only be one toc page in the package so again, creative naming is not required.
The META-INF directory must have one container.xml file. This file can remain the same for every book with a simple package strategy. It looks like this:
<?xml version="1.0" encoding="UTF-8"?> <container version="1.0"> <rootfiles> <rootfile full-path="OPS/package.opf" media-type="application/oebps-package+xml"/> </rootfiles> </container>
The important job this little XML nugget does is it tells the reading system where to find the OPF file, relative to the package root. It is effectively the start up instructions for any ePub 3 reader.
There is no restriction to a single book file here. An ePub3 package could contain different renderings of the same book (but not different books), for example a text version and an image version. They must have clearly separate directories and be two different and complete packages.
We don't need to get worked up with the multiple renditions issue just yet. We have to know that there is a reader available that can use it first.
There are other optional files that can go into the META-INF directory such as DRM information digital signatures, etc. We don't need to worry about those unless the ePub is having DRM applied.
In addition to the previously mentioned OPF file, all HTML content and components are in a folder called OPS (Open Package Structure). Remember this is the IGP package. The directory can be any name you like. Sticking to the spec. examples makes a tidy package, and it is clean and easy to unassemble and hand-assemble an ePub when required.
The package.opf, toc.xhtml and all content xhtml files are directly in the root of the OPS directory in the IGP package.
All other files and components are in sub-directories immediately inside the OPS. There are no nested directories permitted in our packaging (currently).
For example by default JQuery and JQueryUI come with a large number of skin structures. With automated packaging we have to specifically prevent overly-clever (and careless) content assembly getting into and bloating a package with unused external library components from standard packages.
You can end up with a manifest full of unused files, in a perfectly valid ePub; or a manifest full of large files of which the ePub only uses a small part.
IGP packaging puts the following directories inside the OPS directory if there is content to include in the directories. There are no empty directories allowed. If there are no files for a directory it is removed from the package.
In our ePub2 packager we placed all files into the root of the package. There is no reason not to do this. A good reason to do this is that there are no file paths and the package construction is considerably simplified. The test books provided by the IDPF put all the files in a package directory /EPUB/. ePubs created by Calibre put them directly in the package root.
Some ePub2 packaging systems put the HTML files in a separate directory, Sigil is an example. Again, there is no reason why this cannot be done.
With the IGP ePub3 packaging we walked deftly down the center.
All files go into a single directory /OPS/. This has the advantage that all the content can be moved from the package as a single directory. We also don't like the structural linking complication of putting the XHTML files into a separate directory, mainly because it is un-necessary. So the core content, the text XHTML pages, at the same level as the toc.xhtml and package.opf are all in the root of the /OPS/ folder. All XML/XHTML together.
There is no pro or con on internal directories and file location as long as files and paths are present and correct. If you do have to open and examine an ePub you will find our strategy very straight-forward and fast to use. Who wants to navigate back and forth between directories!
This file must be in the root and the first file in the zip package.
It is named mimetype and contains the following text always encoded as ASCII and no no trailing end-of-line.
mimetype [META-INF] container.xml [OPS] package.opf toc.xhtml *.html [css] [images] [svg] [mathml] [audio] [video] [proprietary] [fonts]
You can download ePub3 sample books and open them to see the construction of the IGP package.
To look inside, rename the *.epub to *.zip, and then extract the file. If you haven't done it before, make a copy of the ePub so you don't destroy your favorite novel.
You can see ePub3 files in operation in AZARDI.
The AZARDI desktop reader has high conformance support for all properties. Because all of these properties are supported natively in AZARDI it does not have to use the properties references for display.