APEX@IGP-FX

Infogrid Pacific-The Science of Information

13

F0: Sections

Updated: 2012-10-20

Overview

Documents of all types have sections. Sections are document structures that interrupt the content flow and break a document into parts based on some mutually agreed and understood rules for different document genres.

Any FX document must have one or more document sections classified into a section group contained in the galley-rw element.

Sections are essential in all XML publisher content encoding, and possible more so in IGP:FoundationXHTML (FX). In FX they are regarded as both structures and  Processing Instructions. There is emphasis on accuracy of section semantics for easy processing and future content reuse.

A high-quality approach with the section elements yields a number of benefits such as:

  1. Section type identification
  2. Primary document navigation
  3. Section styling and presentation variations
  4. Packaging into different types of packages (ePub, Kindle, AZD, SCORM, etc.)
  5. Splitting a document into separate sub-documents.
  6. Extracting and reusing content for any new context.
  7. Generating named lists such as TOC, LOT and LOF.
  8. Section, header and page number counter controllers.
  9. Note and footnote processing and positioning.
  10. Rights tracking.

FX defined section selectors can be viewed in the next topic.

Section Selector Rules

  1. All sections must be a first-child of galley-rw.
  2. Sections must be part of a section group within a document. Section groups are currently frontmatter-rw, body-rw, backmatter-rw, specials-rw, metadata-rw, processor-rw.
  3. Section names are always expressed fully and keyed in camel-case. Eg: ByTheSameAuthor-rw. This enables use of the section value for presentation and strongly discriminates section names from all other CSS selectors which are lowercase.
  4. Named special sections can appear in frontmatter-rw, body-rw, backmatter-rw or their own specials-rw place before or after front and back matter.
  5. If included metadata sections metadata-rw must always be at the start of a document.
  6. If included processor instruction sections processor-rw must always be the last sections in a document.
  7. A section selectors can be extended with custom values.

The use of special sections in body-rw is allowed but discouraged as by definition they are not body content. There is no rule to prevent (for example) advertisements being processed between chapters in a generated format. It is preferrable for special sections to be listed after backmatter-rw and to be processed into position at format generation time.

Parent section elements must have a section group and section name value. The sequence is of no importance to processors or parsers, but for human readability they should be maintained in the group - section sequence.

The advantage of the dual selector approach is that a simple processor can use the group selector when required. For example the group selector is all that is required for most format processing. A more specific processor can target exact section groups, values or custom values.

XHTML

Tagging Pattern: galley and section

<div class="galley-rw">
    <div class="body-rw Section-rw">
        <!-- Section Content -->
    </div>
</div>

HTML5

HTML5 introduces the optional use of the <section> element which processors must understand. There are no other changes.

Tagging Pattern: Section group and section

<div class="galley-rw">
    <section class="body-rw Section-rw">
        <!-- Section Content -->
    </section>
</div>

Document Structure vs. Content Structure

FX discriminates strongly between document structures and content structures.

Document structures are the dominant section separators in any specific document genre that interrupt the flow based on some generally agreed and understood rules for different document genres. Any document must have at least one section with a group value by default. It is common for some types of documents to have only a single structural section. Eg: A simple contract, a web-page, a printable form etc.

Headings (or sub-section headings) are not part of the document structure as defined in FX. FX regards them as flow elements that are part of any content structure. Sections are always flow breaks.

For example a Chapter in a book may start on a recto (right-hand) page. A major section break in a business document may also result in a new page in a print presentation or online web page context. In addition there are separately maintainable structures such as document Title, Copyright, Contents, Dedication, Appendix, etc. that signify clear semantic document breaks which must be made obvious to a user by some mechanism.

Content structure consists of items such as title blocks, headings, lists, noteboxes, tables and other items that describe and qualify the semantic or presentation purpose of their enclosed content.

The FX approach make it relatively easy to create new works from old by combining different combinations of Document structures. For example a book can be released under a reprint imprint only by changing the frontmatter sections and applying an appropriate presentation stylesheet.

Document Sections

FX provides content production and ownership strategies for all types of content and all distribution and fulfilment methods. A section may be created specifically for a packaging context other than print and e-books.

FX is primarily concerned with high value structural and semantic content that has value for immediate application and use into the future. The document section structure strategy is  the most important component in this strategy.

 FX supports a very wide range of document section structures which can be extended at any time, which can be mixed and remixed in any manner and can be transposed from one section type to another if required.

Generally sections are used in commonly agreed and accepted patterns for specific document genres, for example in a book. However in reuse scenarios the assembly of documents with significantly different structures is possible and likely. For example a book chapter, an instructional topic and a magazine article may be combined in a new document where there is no inherent prior defined and agreed document structure. This means there must be harmony in the basic XHTML structure and naming conventions across a complete range of content genres. FX addresses this requirement at XML creation time.

A simple, but very real example is eBook re-structuring where half title pages are removed, the copyright is moved to the back of the book, template pages are inserted and other arbitrary changes are made.

Sections are implied processing Instructions

Within FX, structure indicators are acting as primary Processing Instructions. If a processor needs to break a document apart for packaging, or any other kind of use, it can use its own processing rules to create the required output.

FX wraps all major document sections into named <div> elements, but keeps them in a flat structure. Book contents are contained in a single galley <div class="galley-rw"> element. The XHTML <body> element is not used.

A book structure is illustrated with the following FX fragment. Note there are two class attributes with each statement allowing a processor to understand the part and its membership. Each document section also has an ID which omitted for clarity.

A book example

Tagging Pattern: Book example

<div class=“galley-rw”>
  <div class="metadata-rw MetadataWork-rw"> .... </div>
  <div class= “frontmatter-rw Title-rw”> .... </div>
  <div class= “body-rw Chapter-rw”> .... </div>
  <div class= “backmatter-rw Index-rw”> .... </div>
  <div class= “specials-rw Advertisement-rw”> .... </div>
  <div class= “processor-rw ConfigurationFixedLayout-rw”> .... </div>
</div>

The section block tagging pattern looks like this: 

Book

Metadata

Frontmatter

Body

Backmatter

Specials

Processors

  

Using Document Sections Pragmatically

Each document section is a container of tagged content which adds container pragmatic value added to their semantic value by design. That means all content inside a specific named section selector can be processed independently  of all other named section selectors, even though the internal content structures are similar. A strong example of the use of block inheritance in FX is the title block.

CSS: Custom section styling by selector

/* Preface Reader Titleblock Style */
.Preface-rw .title-block-rw h1{ 
  font-size: 1.5em; 
  line-height: 1.2em; 
  font-weight: bold; 
  font-style: italic;
  padding: 0 0 0.25em 0; 
  color: rgb(0,100,150); 
  }
/* Chapter Reader Titleblock Style */
.Chapter-rw .title-block-rw h1{ 
  font-size: 2em; 
  line-height: 2.2em; 
  font-weight: bold; 
  padding: 0 0 0.25em 0; 
  color: rgb(0,100,150); 
  }

The pragmatic approach allows the change of a single selector value to empower content reuse and remixing rather than relying on complex and confusing semantics. 

Other Document Genre Section Structures

There are many other types of documents which have their own section structure vocabulary. FX is unique in that it can handle any type of section. The physical layout of these can also be considerably different from a book. For example an article in an academic work, magazine and newspaper are all quite different in scope, size, complexity and purpose. FX has the depth to allow NITF to be generated for a news article and NLM for an academic journal article.

Following are some major document types and their document sections. Many of these are significantly different to books and do not represent page breaks.

Periodicals (Magazines & Newspapers)

The primary structures of periodicals are columns (department), articles and advertisements. Columns and articles are of many types and often have content blocks that continue on different pages in the print context (an invidious and evil mechanical process).

They have difficult to handle structures such as:

  • Page-rw
  • Section-rw
  • SectionContinued-rw
  • Article-rw
  • ArticleContinued-rw
  • Column-rw / Department-rw (synonym)
  • ColumnContinued-rw, DepartmentContinued-rw
  • Advertisement-rw

This list can be extended a lot further to include specific named sections and content blocks such as departments, classifieds (which should be able to go seamlessly in and out of a database), and of course the cartoon section!

Historical Manuscripts

Manuscripts may or may not be turned into text. Where the source document is maintained as an image, the metadata tools of FX can be used effectively to create preliminary and expanding metadata about a document image. Metadata is very important in this work as often there is no text involved or the text is not easily extracted, and certainly not with OCR. Examples include hand-written manuscripts in many languages and on many mediums including palm leaves.

  • covers
  • Folio-rw
  • Item-rw

Business Documents

Commercial documents can have parts, but generally don't have chapters. They contain sections and topics which are often arbitrary, author defined section types.

  • TitlePage-rw
  • DocumentControl-rw
  • Contents-rw
  • Topic-rw
  • Section-rw
  • Appendix-rw
  • References-rw

Legal documents

Legal documents do not break down into a significant number of parts. They are similar in style to corporate documents,but have some significant internal structures that can be beneficially maintained in template format. Many contracts can be highly repetitive and strictly maintained (such as the IGP Reseller agreement or various software licenses). Maintaining these in template form is very beneficial, productive and ensures there is no dilution or un-warranted modification of essential clauses.

  • Parties-rw
  • Recitals-rw
  • Signatures-rw
  • Schedules-rw
  • Appendices-rw

Marketing documents (brochures)

Templated documents may not be ideal for "creative" Marcom. It can be very powerful when used with repetitive marcom that uses and reuses similar specification items such as product fact sheets.

  • outside-front
  • outside-back
  • inside-left
  • inside-right
  • inside
  • branding-block
  • product-block
  • contact-block

Learning Collateral

This is a massive set of information which utilizes features of documents and more formal publications such as books.

  • Unit-rw
  • Topic-rw
  • Lesson-rw
  • LessonPlan-rw
  • AssignmentRequirement-rw
  • Assignment-rw
  • SelfStudy-rw

Social Web

  • WebPage-rw
  • BlogPage-rw
  • ForumTopic-rw
  • WikiPage-rw

Each of these content domains is a specialist area in themselves. But all of them can be treated with a considered XHTML strategy with appropriate metadata, and still be able to interface with specific domain standards. Just some interchange standards that can be easily processed from FX are:

NewML - News Markup Language

NITF - News Industry Text Format

EAD - Electronic Archive Description Language

METS - Metadata for Encoding and Transmission Standard

NLM - National Library of Medicine

TEI - Text Encoding Initiative

comments powered by Disqus