Audio Compression for ePub3 Textbooks

When producing digital textbooks with audio it is important that the audio is compressed to the optimized level for the type of audio and delivery platforms. Updated: 21 December 2013.

Overview
Technical Background
Producing Textbook Audio
Production
Summary

Overview

This article is a guideline for the production of audio files for delivery in digital textbooks. It is about aggressive optimization for filesize. There is also and some background information on audio files in general.

The audio in the article was processed with the open source Audacity Audio Editing program.

All files need to be optimized for quality and file size when creating large textbooks containing a lot of audio. The default recommendations for the highest quality given by some reading system guidelines are just wrong.

When delivering textbooks, especially in bandwidth resource constrained areas, package size does matter; especially in developing countries.

With a little bit of production effort a lot of narrative audio can be reduced to 20% of the normal quality. This means the package size is drastically reduced, or more audio can be contained in the product.

Technical Background

Digital audio has two major components that contribute inversely to quality and file size of audio. These are:

Sample rate: This is the number of times per second the audio is sampled. It needs to be twice the highest frequency of the audio being processed. The bigger the better theoretically.

Bit rate: This is the number of bits processed each second to transmit or play the audio. Again, the bigger the number the better.

For more technical information on this you can read the Wikipedia pages although the technical information is not really required in live production.

http://en.wikipedia.org/wiki/Sampling_rate

http://en.wikipedia.org/wiki/Bit_rate

In addition you will be needing to package the files into an audio format. At present that will probably be MP3, the most common and well supported audio format. More information here:

http://en.wikipedia.org/wiki/Mp3

Producing Textbook Audio

We are discussing common K-12 type textbooks here rather than those teaching music or other esoteric subjects that may need heavier file processing treatment.

Some of the types of audio that may be included for textbooks are:

Language teaching. This may be conversations, word pronunciation or any other narrative story line.
Instructions. Narratives explaining what to do, expected outcomes, etc. and these may also be accessibility items.
Songs. Sing-a-long music with and without voices
Sound Effects. Pops, bangs and whizzes with icons and for feedback rewards, etc.

The average human voice (child to adult, male and female) creates sounds between 100 Hz to 7000 Hz. We need a Sample Rate frequency greater than 2 X 7000 Hz, the highest range. 16000 Hz is ideal. This is also the Sample Rate used in VOIP services.

This Sample Rate is very suitable for sound effects and even songs.

If you want your music to be a little sharper in the higher registers you can move to 22050 Hz, but there is probably no real practical reason to. Let your ears be the judge.

The following example shows a 44100 Hz/256 kbps file converted down to 16000 Hz/125 kbps for dramatic file size reduction.

Production

We are assuming your audio has been produced reasonably professionally and you have been been delivered large audio files.

The audio file is courtesy Phoenix Publishing House, Quezon City, Philippines.

This demo conversation has an out-of-studio Sample Rate of 44.1KHz and a Bit Rate of 256kbps. This 11seconds of audio is 259KB file size.

It has been recorded in stereo with audio balanced on both tracks. It also has been recorded at a slightly low level.

The source file waveform. The recording level is a little low.

Play the audio at 44100 Hz / 256 kbps

Here is the same audio exported from Audacity at 16000 Hz sample rate and MPEG quality option 6. 95-135 kbps without any processing. The file size is reduced to 49 KB.

Play the audio at 16000 Hz / 125 kbps

Finally the Audacity Effects processors were used to adjust the levels. Leveller was set at moderate and applied. This is useful for getting soft and louder sounds balanced. This should only be used for conversation type audio and never with music.

The audio was then Normalized to the default settings. The files size is a little larger at 56 KB.

The audio after the Leveller and Normalize Effects tools have done their work.

Final Audio.

When exporting your MP3 from Audacity also select:

Channel Mode: Joint Stereo.

Bit Rate Mode: Variable. Speech has a lot of silent gaps and this reduces file the size.

Summary

With a little bit of knowledge and effort publishers can create audio files for digital textbooks that are small, fast to load and with sound subjectively as good as their larger studio source files.

The only way to assess audio processing and compression is with listening tests. If you have compressed using the guidelines above (or even more aggressively) let your ear be the judge. Avoid A/B comparisons of the studio audio with the delivered audio. Do your listening tests for clarity and quality using the equipment that will be used in the classroom, computer or mobile device.

Record at the highest quality and archive the files. Use a process similar to the one outlined above to get the smallest package size and save everyone money and time.