APEX@IGP-FX

Infogrid Pacific-The Science of Information

7

Frequently Asked Questions

Knowing that we have several thousand books that we would like to have available for the new system, how do you see this conversion being done to XML and would you recommend that we create a new content model, use an internal existing model or adopt an open model?

This is correctly positioned as question No 1., as this is definitely the most important issue to address long term modular content ownership.

All well designed and implemented content models can be effective as technology strategies. Each approach has a different set of complexities and maintenance issues. There is always a better technology and a set of pundits pushing a particular approach. The world of XML schemas and interchange protocols has probably been one of the most fragmented. We do not take a technology position on the content model requirements but rather a pragmatic one. We look at advanced complex content business requirements and say, we must we be able to show that the content model we recommend delivers everything a customer needs today, and that it can probably address unknown future requirements.

Based on this our recommendation is to use W3 standard XHTML, and a controlled class attribute grammar design. This is an open model with new aspects to consider. Our strategy has been demonstrated to address all known publisher tagging for reuse issues, and is the lowest possible long term cost of ownership solution for large collections of content.

This approach has many advantages:

  1. XHTML is the lingua franca of the Internet and is a universal content model. The core element structure is simple, but sufficient for any content processing needs.

  2. XHTML defines the lowest possible cost of conversion from legacy content to a reusable content. We have the tools and processes to ensure the interpretation of complex page layouts are handled in a consistent manner.

  3. It establishes the lowest possible cost for development of supporting processors.

  4. Subjective quality assessment can be in a standards compliant browser and it is WYSIWIG. What you see, in any given stylesheet instance is what you get.

  5. If the stylesheet collapses you still get working content.

  6. The block progression model is essentially flat making XSL and other processing easier and more resusable and hardly required, especially if commerce at the XML block level is an objective at some time.

  7. It is easy to process for legacy technology, and is ready for the newest technology.

Our deep experience in tagging the widest range of content has been that models such as DocBook, TEI, complex custom XML schemas, etc. do not have the flexibility to address all content variations – which are in effect infinite. In the past we have worked with many customers as they developed custom XML schemas.

There is a considerably different work effort between retrodigitizing to a Schema and front list publishing using a schema. In the later case the authoring is constained by the environment. Retrodigitization has to deal with books that have been set in other environments with different editorial and typesetting standards. Tagging is about interpretation of existing work, to a harmonious standard. That is what we specialize in. The controlled environment for tagging content consistently no matter what the content or its source.

This document is probably not the place to describe the entire strategy, but in summary, because of the consideration in ID and class semantics, we have an XHMTL strategy that can do anything. We have processed XHTML 1.1 with our ID and even processed XHTML back to complex DocBook with the usual custom DTD extensions.

User initiated front list assembly of content in a browser needs to be done in a visual manner. It has to be XHTML. There is no other practical option. Therefore if the content can be created, archived, processed, presented, compiled without additional processing, the benefits are immediately apparent in the bottom line.

Is XHTML really XML?

Yes. That is the who point of it. It is a standard maintained by the Worldwide Web Consortium (W3C) and has several modes, transitional and strict. We generally use strict, but some aspects of practical tagging mean the transitional DTD is needed.

What are several of the key considerations for us to address during content creation to ensure that the content is both modular enough and developed in such a way to facilitate a custom publishing model as envisaged?

The application philosophy is very clear on the modularization of content. Any publisher has a number and different set of modular requirements. We support any number of object genres and objects can be harvested from and replaced in active content.

For example a text book is digitized. Generally a chapter would be regarded as a suitable Content Object to be used for rebuilding a custom course. However if case studies and figures were thought to be valuable as objects, and reuse rights were clear, they could be extracted as well.

Because of the legacies of publishing let's say this book has a single glossary and book end notes. It may also have references to prior or post chapters. There are multiple strategies that can be considered for this specific scenario.

  1. The notes and glossary terms can be compiled into the object at object creation time (probably the best strategy).

  2. The glossary and book notes can be included into the object as a value addition.

  3. The notes and glossary references can be processed out.

Another significant issue is special content such as maths and chemistry (as a starter) and how you see it being produced and presented. The decision here will be a significant cost contributor to a number of parts of the solution.

Our core content object philosophy is about user needs, not technology. We address the immediate business drivers while providing the flexibility for emerging content strategies. The fact that the content is formed as XML in some form or other is a given. But our focus is business contexts, using semantic content (the content object has some meaning and value for the user), for immediate delivery in multiple formats and channels. The application brings the massive dual benefits of immediate cost savings and immediate increase in sales options and opportunities from existing content assets.

XML database systems may let you discover content and assemble it though some means, but there is still an implied cost of packaging and publishing after the discovery process. Our application eliminates last mile costs and uncertainties by ensuring the initial packaging is perfect for any subsequent use.

For example the methods and site operation of many online content assembly sites that compile PDFs are tradesmanlike and work. Chapters can be selected and assembled. But it tends to fails from an end user perspective in a number of areas. The user can't see what they are getting. They have limited customization options. The stored content is not flexible enough to provide the value added customizations clients expect. Output options are limited. Content needs to be assembled in from other sources.

We take the approach that publishers already know what they have, what they want, and to whom they want to sell their content. It already has semantic value and/or pedagogical characteristics. That just needs to be harnessed with better metadata and a supporting assembly and distribution environment. Obviously the content collections can be very large, staff changes, customers change, so discovery is the leading business driver. Some content ages gracefully and is reusable more or less forever (classic literature, first principle curriculum education), some content can be extended with updating and modification (travel guides and catalogues, vocational material), and some content has a six month shelf-life.

Identify the top challenges that could prevent this project from being a success.

Risk analysis is an important issue. However because our solution is fully functional and working - from manuscript to formats and on to distribution, the development of a core working variable content application is not a risk. It is associated issues that would affect the success.

  1. The decision that content must be in some XML for other business objectives. This immediately eliminates our framework unless the XML is converted to XHTML for content object management. This would also increase the cost of retrodigitization production and content reuse, probably 2-3 times.

  2. Lack of clarity in required business rules and User Stories, plus testing and critiquing of these using spike development. These types of projects cannot sustain a waterfall development model.

It's a given that requirements change. How does your implementation process allow for change while keeping projects on time and on budget?

Our core development process is RAD, collaborative and consultative. This is a new area for publishers and possibly we are the most experienced practitioner's of business driven variable content models. We scope each requirement change and advise the customer whether the change is feasible within project budget and timetable scope. We also assess all requirements against our development plans to see if strategies can be implemented earlier.

The core application has been designed for integration and change. It has also been designed for extension with additional services.

Your strategy states that you "strongly discourage the use of proprietary XML schemas". We currently require that all composition vendors XML tag all books using our DTD. How will this affect your proposed solution?

Not at all. Your content can be tagged in your XML, but it would be converted to IGP:FoundationXHTML for long term management. The result would be a digitization cost increase. Another factor to be taken into account is that this job is not composition but retrodigitization. If all legacy data is available in your XML it MAY be possible to do a conversion through processors. This would have to be evaluated.

However there are a number of issues that arise in this scenario. Books that are already in the marketplace inevitably have page references from other sources. If the XML is reflowed, that original pagination (and lineation) is lost. If this is not an issue, then XML reflow is adequate.

All of your examples appear to be trade or professional oriented, have you ever worked with a higher education publisher?  If so, what did you do?

Yes. We only use trade and retail type content in sales demonstrations to get the concept across. The system is capable of assembling complex content, as well as sites with supporting content such as rich media, downloadable files, and interactive quizzes.

In terms of page display within our system, how do you plan to address the issue of proper image placement, with respect to a printed page vs. XML layout?

We use one of two technologies CSS-3 Renderer (preferred because of cost of ownership and flexibility), and /or XSL:FO rendering. Both are able to position images. Images are tagged at production time for their flow state, and this can be modified prior to rendering in a flow check loop. Other flow parameters can also be adjusted such as text alignment, work and character kerning, hyphenation, widow/orphans. Setting figure captions to outer, generated text such as running headers, renumbered chapters, figures, and table of contents is handled.

Original pagination is always preserved which means legacy page references can be hyperlinked to the correct location even with reflow. In addition original page breaks can be presented in a number of ways – or not at all.

Our system uses three stylesheets each optimized for its application. There is an editing stylesheet, Reader presentation and print stylesheet (when the CSS-3 renderer is used). The print stylesheet can also be used with office printers if DRM enabled. The Reader stylesheet is also used as the style processing base for additional online and e-book formats.

 

 © 2005-2012 Infogrid Pacific. All rights reserved.

comments powered by Disqus