IGP:FoundationXHTML (FX) has a set of inline elements to tag inline structure, presentation, semantic, interactive and linking elements. FX uses a combination of HTML/5 defined elements and the <span> element with controlling selectors.
FX inline elements and selectors are defined at these levels:
Semantic. There is a set of semantic selectors to allow inline elements to describe their purpose to a processing environment. These are termed descriptive paragraphs.
Pragmatic. There are semantic selectors where the same term is used on block, paragraph and inline elements. The semantic has the same meaning but the structural context extends the usability.
Structure. There is a set of selectors (applied or inherited) to allow inline elements to descibe their structural purpose within a document to a processing environment. These are termed structural paragraphs.
Presentation. There is a set of selectors that allow inline elements to be modified for presentation, layout and typographic requirements. These are termed presentation paragraphs.
Linking. There is a set of selectors that allow processable links to be created anywhere in a document.
Fonts. There is a set of selectors available for the application of font-features on a design profile specific basis.
Processing Instructions. Selectors which have the purpose of instructing a specific processing requirement on an inline element.
There is a special set of inline elements that are used for content generation and counter generation. The FX grammar for these is num-*-rw and ref-*-rw, where a number is the target, and the reference is a point to that target.
Special purpose Content Blocks may contain additional semantic and processing inline styles for block specific use which are not addressed in this topic.
HTML5 has defined or redefined many inline or "phrasing" elements and added new ones. There is a strong "webbishness" to many of the elements, the grammar is restricted and in some cases rigid.
This section contains an HTML5 reference list for phrasing content with recommendations on use in FX. Please note that there is nothing to stop any HTML5 elements being used in an FX document, these recommendations have been made on the basis of future-value and trustworthiness of the content as valuable publisher content, not web content. As this is reviewed and discussed this list will inevitably change.
FX uses a wide set of relevant HTML/5 inline or "phrasing elements" (as they are grandly called in HTML5). The use is based on an element being non-specific and have class attributes applied where required.
a Anchor recommended. Use with HTML5 attribute rules.
abbr An abbreviation or acronym.
b Bold text. A span of text offset from its surrounding content without conveying any extra emphasis or importance, and for which the conventional typographic presentation is bold text.
bdo A direction override of the Unicode BiDi algorithm
bdi A span of text that is isolated from its surroundings for the purposes of bidirectional text formatting
br A line break.
del A range of text that has been deleted from a document.
i Italic text. A span of text offset from its surrounding content without conveying any extra emphasis or importance, and for which the conventional typographic presentation is italic text.
img An image.
ins A range of text that has been inserted (added) to a document.
s Strikeout. Content that is no longer accurate or no longer relevant and that therefore has been “struck” from the document
span A generic wrapper for phrasing content that by itself does not represent anything
u Underlined text. a span of text offset from its surrounding content without conveying any extra emphasis or importance, and for which the conventional typographic presentation is underlining.
The following phrasing elements are used for digital content products that are born digital and are explicitly designed to be interactive. Where elements have HTML5 attributes, all attributes are allowed.
area The area element represents either a hyperlink with some text and a corresponding area on an image map, or a dead area on an image map.
audio Represents an audio stream.
button A multipurpose element for representing buttons.
canvas A resolution-dependent bitmap canvas, which can be used for dynamically rendering of images such as game graphics, graphs, or other images.
datalist A set of option elements that represent predefined options for other controls.
iframe The iframe element introduces a new nested browsing context.
input A multipurpose element for representing input controls.
label A caption for a form control.
map The map element, in conjunction with any area element descendants, defines an image map.
noscript The noscript element is used to present different markup to user agents that don’t support scripting, by affecting how the document is parsed.
object Represents external content.
script Enables dynamic script and data blocks to be included in documents.
select A control for selecting among a list of options.
sup Superscript. Use with care. Named spans are recommended as sup is over-loaded.
sub Subscript. Use with care. Named spans are recommended as sub is over-loaded.
textarea A multi-line plain-text edit control for the element’s raw value.
video Element represents a video or movie. Used where content is HTML5 and media explicit. Eg. All digital interactive products.
cite The cited title of a work mentioned within the main text flow of a document.
code A fragment of computer code. Not currently used in FX. Use <span class="code-rw">
command a multi-purpose element for representing commands.
dfn the defining instance of a term.
em Emphatic stress. Web accessibility specific. Too general and a format value. Not used in FX. Use <i>
embed An integration point for external content.
kbd Represents user input. Not used
keygen A control for generating a public-private key pair and for submitting the public key from that key pair.
mark A run of text in one document marked or highlighted for reference purposes. Usage too web specific. Not used in FX. Best used as a dynamic runtime property.
meter A scalar gauge providing a measurement within a known range, or a fractional value.
output the result of a calculation.
progress An element that represents the completion progress of a task.
q Quoted text. Phrasing content quoted from another source. Not used in FX.
ruby The ruby element allows spans of phrasing content to be marked with ruby annotations.
samp Represents (sample) output from a program or computing system. Not used in FX as too domain specific.
small represents “small print”. Too general. Not used in FX.
strong Strong stress. Web accessibility specific. Too general and a format value. Not used in FX, Use <b>.
time The element represents a date and/or time.
var A variable in a mathematical expression or programming context, or placeholder text that the reader is meant to mentally replace with some other literal value. Not used in FX.To general.
wbr represents a line-break opportunity. No end element so therefore not well-formed in an XHTML context. Not used.
Hyperlinking is an essential property of digital content. The better linking is carried out the more valuable content can be. Linking can be labour intensive in the production stage so if linking and tagging strategies can reduce that labour burden with the assistance of processes that should be employed.
Linking targets are defined in the pattern num-*-rw or term-*-rw. These are used in two different contents. The * represents the FX selector value.
Linking pointers are defined in the pattern ref-*-rw.
<p>.... (See Fig.<span class="ref-figure-rw">xx</span>)....</p> <div class="media-rw figure-rw"> <img /> <p class="caption-rw"><span class="variable-rw">XX</span>The caption</p>
<p>.... (See Fig.<span class="ref-note-rw">xx</span>)....</p> <div class="note-rw"> <p class="note-rw"><span class="num-note-rw">XX</span>The note text</p> </div>
See the IGP:FX Note and Footnote Tagging Test Book for comprehensive details on tagging notes and footnotes, including processing for roll-over pop-ups using the HTML5 aside block.
Well constructed XML should be created with correct text case formatting independent of any presentation requirement. Font transforms and variants are applied to assist processors and display formats to present content with the additional information that is conveyed through font styling.
There is also significant content that specifically requires font presentation to allow it to make sense to the consumer.
Font-variants are tagged with span.class statements. They can be processed and displayed according to the abilities of various rendering devices. The common inline styles have both verbose (recommended for future interpretation) and terse forms.
Some eBook reader formats do not support font-variant: small caps (and its clones) or even other font transforms even though these are used extensively in publishing. Transforms and variants can be simulated by a processor, for example with small-caps, by converting the tagged content to capitals. The stylesheet can then reduce the font-size by an appropriate percentage. Small Caps Title Case exists so a format processor can do a good job of simulating a title-case string with small caps applied.
For higher quality production such as print output the approach is to convert the content to lower case (or title case), from the normal form, and the correct font-face is applied during processing.
Where content is sourced from OCR document pagination and lineation can be maintained. This enables production of formats such as the IGP:LBL PDF (Line by Line) digital restoration PDF to be produced. It is also important for maintaining and analysing source document soft-hyphenation extraction errors during retrodigitization production.
There are a number of critical issues surrounding retrodigized line-ends expressed in XML as there are with page-breaks, especially correct end-of-line space processing.
<p>.... <span class="linebreak-rw">&160;</span> ....</p> <p>page break <span class="pagebreak-rw">1</span> in the middle of a paragraph</p> <p>page break at the end of a paragraph. </b>It must be the last item.</b><span class="pagebreak-rw">1</span></p>
This span contains a non-breaking space as an empty <span> is not allowed in XHTML. This is hidden by the CSS when required, or processed out during format generation.