Infogrid Pacific-The Science of Information


This IGP:FoundationXHTML specification document is designed to be accessible to both non-technical and technical users. It combines philosophical rationalization, explanation, practical comments and code fragments to convey as completely and clearly as possible the various tagging strategies employed to create a reusable, future-proof XML strategy for content owners of all types.

This IGP:FoundationXHTML  (FX) Specification is not a theoretical examination or representation of the structure of a document. It presents practical, real-world strategies that are the culmination of well over a decade of experience analysing, tagging and processing over 30 million pages of high quality content with many different XML DTDs and schemas.

IGP:FoundationXHTML presents a real-world content ownership strategy that actually works. It allows the use of the best of tools available today, addresses legacy, current and emerging content issues, and can survive into the future.

In the late 90's XML knowledge and experience were limited and tools were crude. Browser support for XML was virtually non-existent. By 2007 the changing landscape of the Internet, ascendancy of mobile devices, and persistence of the Open Source community in implementing ever improving standards based applications, has resulted in better open methods for complex content ownership than complex custom or specialist content DTD's or proprietary systems.

IGP:FoundationXHTML takes this freedom of standards, processors, and presentation frameworks, starts with a clean slate, and drops a decade of legacy thinking about encoding complex unstructured content into the rubbish bin. The power of XHTML 1 as the de facto method to tag, process and print content has not yet hit home in most areas because there is this crazy notion that semantics is more important than structure. There are still even those who do not think XHTML is XML (which of course it is).  Dominant alternatives such as DocBook and TEI both have legacy overheads that create confusion and cost overheads, and still require to be transformed in to HTML or XHTML for final presentation on most portable devices and bring no advantages for print processing other than more costs and lower quality outputs. 

I would like to thank all those who assisted me in the development and refinement of IGP:FoundationXHTML.

Much of this assistance was unintentional but contributed to the learning curve. The customer who insisted complex academic content was tagged in DocBook, and then found that advanced drama and poetry just cannot be done without extensive customization; the customer who wanted TEI, but had to constantly customize it for their own content issues. The arbitrary change horrors of early computer book sites, the stylistic customization of OEB for various aggregators and formats. The Mobipocket format which has been a constant pain, and even today is little more than 1997 HTML 3.2., and so many more. Thank you for the experience, if not the pains.

Developing DX-XML from 1999 to 2001 while operating Versaware/Digital Publishing Solutions and DX Technologies was a large contributing experience and for three years meant looking at the mind-numbing details of a vast library of content. Especially having the trust and cooperation of Taylor and Francis where the focus was retro-digitization and the special requirements of complete and accurate content capture and tagging from print. DX-XML expanded into one of the most complete, powerful and practical XML Schemas available, and specifically eliminated compromises and weaknesses of DocBook and TEI. But DX-XML, designed for retro-digitization and multi-format production, had significant limitations when applied to front-list publishing and non-book content. Like all "clever" XML content schemas it contains significant complexities of implied content value that only introduced production and processing cost overheads.

Having the opportunity to redesign a better XML encoding system, was a luxury that emerged in 2005, just as Web 2.0, Service Oriented Architectures, PrinceXML for print from CSS - destroying XSLT, ePub as the e-book format with the most potential, and more pervasive and variable use and reuse of content were, and continue to emerge.

After a few false starts, and a bit of cogitation on the issue, the penny dropped. If everything is processed to XHTML/HTML for final presentation, and PrinceXML and Antenna House Formatter allowed high quality print processing from XHTML/CSS, why use any intermediary XML encoding? So the outcome of time and experience in the content trenches is IGP:FoundationXHTML. It's complete, complex, can address any content needs and is easy to create, value add, cross-walk and process for any conceivable purpose.

I reserve the most special thanks to the thousands of production editors who actually did the work in all the companies I managed through these last two decades, and whose tireless professionalism proved the tools, Schemas and systems work: Reality Information Systems, Versaware, Digital Publishing Solutions, DX Technologies, Estel Conversion Labs, and of course Infogrid Pacific. Exceptional content processing ultimately requires real people with dedication to quality. Thank you all.

    Richard Pipe
Auckland, New Zealand

comments powered by Disqus