APEX@IGP

Infogrid Pacific-The Science of Information

15

Font Subsetting

Fonts in e-books are starting to grow up fast. The main sticking point at present is font foundry policies on embedded fonts in ePub. The IGP:Digital Publisher font packaging strategy now includes font sub-setting (just like a PDF), abstract naming, WOFF conversion and obfuscation. Updated: 2012-08-05

 

Throwing all these fonts at an e-book are one thing. Handling the embedded font file-size explosion is another.

Font subsetting is the reduction of a font filesize by optimizating the characters and glyphs in a font-face to be just those characters and glyphs that are used in a specific document. All other characters are removed.

IGP:Digital Publisher licensees produce a lot of books in Languages Other Than English (LOTE). This includes East European languages as well as scripts and CJK. While new reading devices are supporting a wider range of languages with default fonts, they are still far from complete for the languages of the world. The poor language handling of yesterdays readers also has to be addressed with embedded fonts for the foreseeable future. 

Currently the option to handle LOTE is to embedded one of the extensive language supporting open-source fonts like Liberation (TTF), DejaVu (TTF) or Linux Libertine (OTF/TTF). The problem here is they are big fonts, typically 150-300KB per font-face because they are doing such a good job at supporting multiple languages. This means to put the four major font faces (regular, bold, italic, bold-italic) into an ePub in serif and sans-serif starts at around 1.1MB overhead. That is just to get maybe 20-30 specific language characters. 

It has to be better than this! Font sub-setting to the rescue. Font sub-setting has been used in PDFs for years. Font sub-setting results in a dramatic reduction in font file size. For most specific language font usage purposes think of a  150KByte font-face reduced to 15-30KBtyes, thats an 80-90% file size reduction for most script and alphabet languages!

The Requirement

The need to use embedded fonts in ePubs is broad and diverse. The serious reason is of course availability of appropriately styled language characters within an ePub irrespective of a Reading Systems supplied fonts (historically think ADE here: very, very, bad).

Other reasons to embed fonts are for artistic and communication design; and lets not forget "just for fun". Our experience is that a range of fonts is generally necessary in most K-12 textbooks, and even tertiary textbooks. You can design your ePubs for the fonts available in a specific Reading System, or design your ePubs with your embedded fonts, with appropriate font fallbacks for the dumb Reading Systems.

Digital content is now ubiquitous. The Internet has handled the world's language requirements primarily through a combination of browser, operating system and default font standardization.

Reading Systems are different. Fonts on devices take valuable storage space. If every font is stored in every ePub2/3 for every book (as is required in many languages) at its full font size, they become storage guzzling monsters. It is important to reduce the guzzle as much as possible. Here are just some situtations where font sub-setting is of value: 

Languages Not Latin. Arabic, Cyrillic, Greek, Hebrew, Indic scripts, Thai or even Amharic can be included without the character overhead of a hundred other languages and specific language presentation glyphs. 

Other Alphabets with Latin. If you have just a few letters of Greek, Arabic, Hebrew or Cyrillic in your book, you can bring them to life with a small and tidy 10K or less font file.

Symbols and decorations. You may want even your simple book to have distinctive symbols, icons and decorations. Just load a Dingbats or custom symbols font at 10K or less. 

Available Sub-setting Tools

www.fontsquirrel.com is an amazing resource available to everyone who needs language optimized sub-set fonts. Once you know which fonts to select you can create an ePub3 sub-set of WOFF fonts with just a few clicks. Try it in expert mode.

Upload a font such as Liberation, select expert mode and choose your language or character requirements and in a matter of seconds Font Squirrel will deliver you your subset fonts.

If you are working with a single language all the time you only have to do this once. You can even rename your font to your language so you can end up with a set of fonts such as de-Sans-Regular.woff, de-Sans-Bold.woff, etc. It's easy to do and it's free.

Font Sub-setting Action

Font subsetting the Font Squirrel way is excellent but we needed to move forward and automate font sub-setting. While creating the new IGP:Font Manager 2 (FM2) for IGP:Digital Publisher, we learned more about "font internals" than normal, nice people really should want or need to know.  It is the only font manager environment we are aware of that gives a view of any font by all Unicode characters, organized by Unicode block; AND all glyphs in OTF font features. You can see examples here and here. This "Deep Font" experience gave us the confidence to move to the next step. Font sub-setting for ePubs and of course any other packaged or delivered format. 

There are two issues font sub-setting addresses: 

Excessive file sizes

Our first ePub3 experiment with massive font embedding was EPUB3 UNLEASHED. The file size is 12MB. The fonts contribute 10MB of that files size. With sub-setting the font size is reduced to a mere 3MB. Quite a saving in file size, transfer times and ultimately bandwidth costs for both consumers and suppliers.

Font rights

Font subsetting plus font obfuscation will go a long way to making font foundries more relaxed about allowing their valuable fonts packaged into ePubs and other content. The combination of font sub-setting plus obfucation is more secure than obfuscation alone. Of course now the IDPF or other appropriate organizations have to have that dialogue with font foundries and independent font sellers.

Font sub-setting tools

We decided to use the Google font subsetter as the core engine. We don't like reinventing the wheel, the Google project is aggressively supported, and it is MIT licensed. Green lights all the way there.

However it only works for TTF at present, but that was not a significant obstacle as the really big fonts out there that have the language coverage are the open source master-pieces such as Linux Libertine, Liberation, DejaVu, STIX and the like. 

Our main task was character analysing/parsing an FX production XHTML document while it was being packaged as an ePub to get the list of UTF-8 or UTF-16 characters being used in the document. This is relatively efficient, even for very large books.

We decided to make the font sub-setting a little more efficient by handling italic, bold and bold-italic styles explicitly. That is somewhat more complicated as we have to find all characters in CSS classes for each font-family that are applied with bold (for example). It is important to get the correct character map for each font-face for optimal size reduction.

The outcome

Lots of very small fonts in an ePub is easy, automatic and no longer a stress item. If ePub3 Reading System font support is reasonable the days of images as characters is over.

The downside is the font sub-setter currently only subsets TrueType fonts. OTF is probably in the "do it later" basket. But TTF is not an approved font in ePub3! WOFF to the rescue.

The TTF fonts are sub-set, then WOFF converted. Voila, instant ePub3 specification conformance. With SIL open licensed fonts there is no requirement to obfuscate fonts as sub-setting is obfuscation enough. However IDPF font obfuscation option is supported and available as an option in IGP:Formats On Demand packaging.

Summary

So our font packaging processors seem to be complete at this time; and rather powerful. Here is what IGP:Digital Publisher can do with fonts at format generation and package generation time:

  1. Sub-set TTF fonts. Let the few open OTF fonts ride the ride unmolested.
  2. Stay away from commercial fonts until the various foundries work out what they are going to allow for ePub3 font embedding by providing different Font Schemes for print and reader outputs.
  3. Optionally auto rename the fonts and update the CSS.
  4. Optionally convert fonts to WOFF.
  5. Optionally obfuscate the WOFF fonts using the IDPF algorithm if required.

The time, cost, complexity and compromises involved with embedding fonts for language, design or entertainment is a few checkboxes on the IGP:Digital Publisher Document Processing Instructions interface.

There is nothing like option-powered automation of annoying, fiddly tasks to help make the job of digital content production easier and specification conformant.

References

Font Squirrel http://www.fontsquirrel.com/

Linux Libertine. Character and font-feature glyphs exposed.

MEgalopolis Extra. Character and font-feature glyphs exposed.

 

comments powered by Disqus