13
Updated: 2012-10-20
Documents of all types have sections. Sections are document structures that interrupt the content flow and break a document into parts based on some mutually agreed and understood rules for different document genres.
Any FX document must have one or more document sections classified into a section group contained in the galley-rw element.
Sections are essential in all XML publisher content encoding, and possible more so in IGP:FoundationXHTML (FX). In FX they are regarded as both structures and Processing Instructions. There is emphasis on accuracy of section semantics for easy processing and future content reuse.
A high-quality approach with the section elements yields a number of benefits such as:
FX defined section selectors can be viewed in the next topic.
The use of special sections in body-rw is allowed but discouraged as by definition they are not body content. There is no rule to prevent (for example) advertisements being processed between chapters in a generated format. It is preferrable for special sections to be listed after backmatter-rw and to be processed into position at format generation time.
Parent section elements must have a section group and section name value. The sequence is of no importance to processors or parsers, but for human readability they should be maintained in the group - section sequence.
The advantage of the dual selector approach is that a simple processor can use the group selector when required. For example the group selector is all that is required for most format processing. A more specific processor can target exact section groups, values or custom values.
<div class="galley-rw"> <div class="body-rw Section-rw"> <!-- Section Content --> </div> </div>
HTML5 introduces the optional use of the <section> element which processors must understand. There are no other changes.
<div class="galley-rw"> <section class="body-rw Section-rw"> <!-- Section Content --> </section> </div>
FX discriminates strongly between document structures and content structures.
Document structures are the dominant section separators in any specific document genre that interrupt the flow based on some generally agreed and understood rules for different document genres. Any document must have at least one section with a group value by default. It is common for some types of documents to have only a single structural section. Eg: A simple contract, a web-page, a printable form etc.
Headings (or sub-section headings) are not part of the document structure as defined in FX. FX regards them as flow elements that are part of any content structure. Sections are always flow breaks.
For example a Chapter in a book may start on a recto (right-hand) page. A major section break in a business document may also result in a new page in a print presentation or online web page context. In addition there are separately maintainable structures such as document Title, Copyright, Contents, Dedication, Appendix, etc. that signify clear semantic document breaks which must be made obvious to a user by some mechanism.
Content structure consists of items such as title blocks, headings, lists, noteboxes, tables and other items that describe and qualify the semantic or presentation purpose of their enclosed content.
The FX approach make it relatively easy to create new works from old by combining different combinations of Document structures. For example a book can be released under a reprint imprint only by changing the frontmatter sections and applying an appropriate presentation stylesheet.
FX provides content production and ownership strategies for all types of content and all distribution and fulfilment methods. A section may be created specifically for a packaging context other than print and e-books.
FX is primarily concerned with high value structural and semantic content that has value for immediate application and use into the future. The document section structure strategy is the most important component in this strategy.
FX supports a very wide range of document section structures which can be extended at any time, which can be mixed and remixed in any manner and can be transposed from one section type to another if required.
Generally sections are used in commonly agreed and accepted patterns for specific document genres, for example in a book. However in reuse scenarios the assembly of documents with significantly different structures is possible and likely. For example a book chapter, an instructional topic and a magazine article may be combined in a new document where there is no inherent prior defined and agreed document structure. This means there must be harmony in the basic XHTML structure and naming conventions across a complete range of content genres. FX addresses this requirement at XML creation time.
A simple, but very real example is eBook re-structuring where half title pages are removed, the copyright is moved to the back of the book, template pages are inserted and other arbitrary changes are made.
Within FX, structure indicators are acting as primary Processing Instructions. If a processor needs to break a document apart for packaging, or any other kind of use, it can use its own processing rules to create the required output.
FX wraps all major document sections into named <div> elements, but keeps them in a flat structure. Book contents are contained in a single galley <div class="galley-rw"> element. The XHTML <body> element is not used.
A book structure is illustrated with the following FX fragment. Note there are two class attributes with each statement allowing a processor to understand the part and its membership. Each document section also has an ID which omitted for clarity.
<div class=“galley-rw”> <div class="metadata-rw MetadataWork-rw"> .... </div> <div class= “frontmatter-rw Title-rw”> .... </div> <div class= “body-rw Chapter-rw”> .... </div> <div class= “backmatter-rw Index-rw”> .... </div> <div class= “specials-rw Advertisement-rw”> .... </div> <div class= “processor-rw ConfigurationFixedLayout-rw”> .... </div> </div>
The section block tagging pattern looks like this:
Book
Metadata
Frontmatter
Body
Backmatter
Specials
Processors
Each document section is a container of tagged content which adds container pragmatic value added to their semantic value by design. That means all content inside a specific named section selector can be processed independently of all other named section selectors, even though the internal content structures are similar. A strong example of the use of block inheritance in FX is the title block.
/* Preface Reader Titleblock Style */ .Preface-rw .title-block-rw h1{ font-size: 1.5em; line-height: 1.2em; font-weight: bold; font-style: italic; padding: 0 0 0.25em 0; color: rgb(0,100,150); } /* Chapter Reader Titleblock Style */ .Chapter-rw .title-block-rw h1{ font-size: 2em; line-height: 2.2em; font-weight: bold; padding: 0 0 0.25em 0; color: rgb(0,100,150); }
The pragmatic approach allows the change of a single selector value to empower content reuse and remixing rather than relying on complex and confusing semantics.
There are many other types of documents which have their own section structure vocabulary. FX is unique in that it can handle any type of section. The physical layout of these can also be considerably different from a book. For example an article in an academic work, magazine and newspaper are all quite different in scope, size, complexity and purpose. FX has the depth to allow NITF to be generated for a news article and NLM for an academic journal article.
Following are some major document types and their document sections. Many of these are significantly different to books and do not represent page breaks.
The primary structures of periodicals are columns (department), articles and advertisements. Columns and articles are of many types and often have content blocks that continue on different pages in the print context (an invidious and evil mechanical process).
They have difficult to handle structures such as:
This list can be extended a lot further to include specific named sections and content blocks such as departments, classifieds (which should be able to go seamlessly in and out of a database), and of course the cartoon section!
Manuscripts may or may not be turned into text. Where the source document is maintained as an image, the metadata tools of FX can be used effectively to create preliminary and expanding metadata about a document image. Metadata is very important in this work as often there is no text involved or the text is not easily extracted, and certainly not with OCR. Examples include hand-written manuscripts in many languages and on many mediums including palm leaves.
Commercial documents can have parts, but generally don't have chapters. They contain sections and topics which are often arbitrary, author defined section types.
Legal documents do not break down into a significant number of parts. They are similar in style to corporate documents,but have some significant internal structures that can be beneficially maintained in template format. Many contracts can be highly repetitive and strictly maintained (such as the IGP Reseller agreement or various software licenses). Maintaining these in template form is very beneficial, productive and ensures there is no dilution or un-warranted modification of essential clauses.
Templated documents may not be ideal for "creative" Marcom. It can be very powerful when used with repetitive marcom that uses and reuses similar specification items such as product fact sheets.
This is a massive set of information which utilizes features of documents and more formal publications such as books.
Each of these content domains is a specialist area in themselves. But all of them can be treated with a considered XHTML strategy with appropriate metadata, and still be able to interface with specific domain standards. Just some interchange standards that can be easily processed from FX are:
NewML - News Markup Language
NITF - News Industry Text Format
EAD - Electronic Archive Description Language
METS - Metadata for Encoding and Transmission Standard
NLM - National Library of Medicine
TEI - Text Encoding Initiative