Preview. This article is not yet finalized. With IGP:Digital Publisher and the IGP:SMIL Toolkit it is possible to create interactive highlighting audio quickly and easily. Updated: 2013-08-21
This article is about creating SMIL files for e-books in general and ePub3 in particular.
SMIL means Synchronized Multimedia Integration Language. It is an XML language that allows timing, layout, animations, transitions and other things to be scripted and executed by software application or reading device that understands SMIL XML.
There has not been a lot of mainstream support for SMIL production tools or SMIL presentation tools. Presentation engines are aged, limited, and have limited SMIL functionality. It many respects ePub3 support of SMIL is one of the more significant events in the life of the language.
SMIL is very difficult to author and use which is why there is probably little or no current browser support. Fortunately the ePub audio overlay requirement is a small subset of what is a tough and punishing way to create interactive content.
We have started out with full support for SMIL audio overlay in production, and linear audio overlay support in all versions of AZARDI. The Apple iBooks minimal support in their ePub fixed layout format has also lifted awareness.
If AZARDI recognizes a book section contains SMIL from the manifest properties, it loads the SMIL XML file and XSL transforms it to a Javacript array. This is very flexible and means we can easily make changes in the future for variations, customizations, and handling additional complexity if and when required.
The primary purpose of SMIL audio overlays in ePub3 is so text can be highlighted in syncronization with audio at any required text granularity.
The primary highlighting granularity options provide in the IGP:SMIL Toolkit are:
Each option has applicability in certain scenarios, and of course there are times when you want a mix and match of all of them. That just takes more time to produce, and costs increase accordingly.
Sequential word highlighting in a reading context is possibly relevant for the young reader learner, or second language learner. It is useful for vocabulary building, but unless the reading is very slow, the interaction can be distracting.
Word highlighting can be used in dictionary and grammar building products as well. Remember there is no requirement for every word to be highlighted.
0.000000 1.397163 1.397163 2.000738 2.000738 2.514894 2.514894 3.587916 3.587916 4.448568 4.448568 4.884483 4.884483 5.700427 5.700427 6.326356 6.326356 8.818896 8.818896 9.145832 9.145832 9.383350 9.383350 9.592924 9.592924 10.598882 10.598882 10.758158 10.758158 10.945378 10.945378 11.291875 11.291875 11.697052 11.697052 12.580000
This is word highlighting at a slow reading speed.
This is word highlighting at a fast reading speed.
This means syncronizing at the minor terminator level. That means commas, dashes colons and semi-colons. If creates a relatively compelling engagement process with normal reading speeds.
0.000000 0.971074 0.971074 1.738222 1.738222 2.816114 2.816114 3.612394 3.612394 4.544625 4.544625 6.059500 6.059500 6.982020 6.982020 9.050406 9.050406 10.817760 10.817760 13.090073 13.090073 13.847510 13.847510 17.081185 17.081185 21.334488
Wow! Hey, do your like this, I mean this—hang on a minute—phrase based highlighting. I do because: it makes my eyes follow the text; it helps me concentrate; especially with boring content; and, it's like getting a lot of SMS's at the same time. It really fits my 2012 attention span deficit problem.
The final output delivered a nicely paragraph aligned and processor ready label track for final SMIL file updating. The final generated and processed label track file looks something like this example below. This contains the highlight section ID, start-time, end-time and the cue sequence number and text for editing. Currently punctuation and special characters are stripped out. It has been used for Chinese, but only the sequence number survives the process.
#azs1 0.000000 0.971074 1. Wow #azs2 0.971074 1.738222 2. Hey #azs3 1.738222 2.816114 3. Do you lik #azs4 2.816114 3.612394 4. I mean the #azs5 3.612394 4.544625 5. Hang on #azs6 4.544625 6.059500 6. phrase bas #azs7 6.059500 6.982020 7. I do becau #azs8 6.982020 9.050406 8. it makes m #azs9 9.050406 10.817760 9. it helps m #azs10 10.817760 13.090073 10. especially #azs11 13.090073 13.847510 11. and #azs12 13.847510 17.081185 12. its like g #azs13 17.081185 21.334488 3. It really
Sentence highlighting is a nice middle ground between phrases and paragraphs. Sentence lengths tend to be relatively uneven but generally do break long paragraphs into relevant parts.
The problem with sentence highlighting is relative paragraph length. Sentence length is highly variable. Even in a single paragraph. Charles Dickens was the master at creating the long, comma separated, noun, adjective, verb, adverb rich sentence, to build a strong mental picture; while keeping the story moving. So? Bah! Humbug!
Paragraph lengths in all types of books is highly variable. However a paragraph theoretically does contain the expression of a self-contained theme or idea and has presentation styling rather than punctuation to give it isolation within text. For accessibility this is probably the preferrable option, and even for classroom learning it is a highly relevant approach.
Paragraph highlighting is probably the most useful granularity for accessibility bringing the flow of ideas and interactivity together.
Language education, learning or training content can be easily enriched with SMIL tagging. It can be used in structures such as vocabularies, glossary words, terms and much more.
Click on the nouns in the list of words and listen.
The core content digital production is carried out in IGP:Digital Publisher. This enables simultaneous print, e-book and audio book production from a single master XHTML source. This has to be completed before any audio processing can start, primarily because we need IDs on the content.
IGP:FoundationXHTML has full paragraph IDs by default, so paragraph level audio sync needs no additional work. More granular highlighting options need more processing.
It also allows the timing information to be directly inserted into the XHTML to allow instant testing, evaluation and quality control.
Audacity is the well known, premier, open source audio editing application. It is relatively easy to learn to use for basic operations.
The main reason for using this application for SMIL production is to create an Audacity label.txt file. This lets you set and fine-tune the text highlight syncronization points to the millisecond.
You can also use Audacity for recording your audio, and if required mixing in a few effects. Or you can spend hours with Audacity and become a budding world class audio engineer.
Creating those complex, annoying SMIL files is the big production issue and must be directly addresssed for cost effective AND high quality user experiences. In our system SMIL production is a two-pass process.
Step 1. Generate the Audacity Label track. Edit and fine tune it.
Step 2. Use the Audacity Label track to create the final SMIL file.
To alleviate some of the pain from inserting 1-10,000 label points in each Audacity label track for each chapter, we had to write a bit of software. The algorithm works like this.
The HTML and audio files are sent to the processor. It returns the HTML files with granular tagging, IDs generated, and a matching audacity label tracks.
The label file is now available for the human-touch fine-tune. It now only takes reasonable effort. The algorithm is moderately obvious, moderately clever, and really needs to grow and evolve. We are on the first step of a thousand mile journey here! Anyone got anything better! We would love to hear about it, see it, test it, etc. We will show you ours if you will show us yours!
If you are going to do this SMIL production you have to learn to love audio editing. Audacity is brilliant, fantastic, smooth, easy, glorious (OK I have been reading too much Dickens).
Learn Audacity shortcuts, and learn what CTRL-1, CTRL-2 and CTRL-3 do. It's the best damn zooming key combination in the desktop UI business.
With a little practice a chapter can be QC'ed, while applying subtle fine-tuning, in near real time.
The IGP:SMIL Toolkit requires only that you upload the section HTML file(s), and Audacity label.txt files you edited in Step 1, wait a few seconds and a perfect SMIL file is disgorged. The whole process is nearly fun!
For the nervous, there is a Step 3. available. It takes the Step 2. inputs and generates a playable version of the HTML files using the AZARDI Interactive Engine. Once generated they can be played instantly on the desktop.
Just open the XHTML files in a browser of your choice, click through the text to hear the audio, or click play, lean back and just listen, watch or give it to another person to apply a final critical QC eye.
The generated SMIL files and the audio files are now available in the IGP:Digital Publisher Components directory. The work is done.
You can apply a little personalization on the CSS for your highlighting appearance preferences. You can give the SMIL highlighting effects of your choice book by book or even at fine text granularity.
Make sure you have included all required SMIL metadata.
Finally. From IGP:Formats on Demand, click the generate ePub3 button. Wait a few seconds and the complete ePub3 audio book package is delivered, validated and ready to start earning its living.
High volume, high quality production of fine-granularity ePubs is non-trivial work whether it is for accessibility, entertainment or learning. Going past a few dozen words or lines in a kiddies book is not just more of the same. Moving audio book production to main-stream and making it cost-effective, and as easy as it should be, takes significant tools and attention to detail.
EPub3 has the opportunity to bring audio books out of the Amazon/Audible lock-in, make them much more than a mp3 file, and bring the spoken and written word together in ways that have never been seen before.
Packaging SMIL in ePub3 at APEX@IGP Digital Formats
W3C SMIL 3 Recommendation http://www.w3.org/TR/2008/REC-SMIL3-20081201/
W3C Audio & Video activities information page. http://www.w3.org/AudioVideo/
Infogrid Pacific has the tools to allow the creation of the most sublime, sophisticated or complex digital content, and the delivery platforms to allow that content to be seen anywhere and everywhere under publisher control.
IGP:Digital Publisher. The world's most advanced, flexible and customizable multi-format digital content production environment. It addresses print, e-books, fixed layout, interactive learning content, web-sites, SCORM and much more.