HTML 4.1

Jump to: navigation, search

In relation to previous versions of HTML, this draft focusses on an HTML document at its highest levels. Whereas the the cHTML draft deals with markup in the sense of opening tags and closing tags; tag markers such as < and >; and quoted attribute values, this draft instead deal only with elements, and element contents; attributes, and attribute values; DOM attributes, and DOM methods. In other words this portion of the HTML 4.1 specification deals with what is sometimes called the infoset for HTML and the DOM attributes and methods used to mutate that infoset. Other peices of HTML relate to other specifications such as the serialization of HTML that is sometimes called “HTML” also, but which we call “cHTML” to differentiate this one of many serializations of the HTML infoset. Finally the parsing algorithm which is sometimes referred to as “HTML” parsing we call “tHTML parsing” to again keep clear the separate pieces that have all come to be called “HTML”. Therefore in thise specification “HTML” unqualified refers to the HTML infoset alone or sometimes the DOM methods and attributes that accompany that infoset.

This separation of serialization and parsing on the one hand, from the resulting HTML document on the other, implies that well-formedness issues do not exist for such HTML documents. It is not that this draft eliminates the problem of well-formedness violations, but rather that well-formedness violations are an issue for the various serialization and parsing recommendations to deal with. By the time a document has been parsed the result is by definition a well-formed tree of elements, though it may still be invalid in that it does not conform to the requirements of this infoset specification (also in the case of XML and other serializations the well-formedness errors may result in fatal errors in-turn resulting in only a partial representation of the serialized document reaching the HTML object graph addressed by this draft).

For processing UAs dealing only with HTML documents themselves, and not encoding (serializing) and decoding (parsing) those HTML documents, well-formedness is not an issue. By moving such fragile and complex code to a well-behaved parsing/serialization layer and focussing on HTML document infosets, processing applications can avoid dealing with many of the fragility issues associated with text processing of HTML. For example a content management system that allows users to edit HTML document directly (not in a serialized source code form) will be much simpler to maintain than one that presents a source code text editing interface and then sanitizes, parses and transforms that text into a form suitable for subsequent tHTML or XML parsing. For applications exposing a source editing interface, a tHTML parser is more suitable for presenting a preview of the user's composed HTML followed by a serialization to XML for compatibility with XML processing tools.

Note: Often times, these HTML objects will be referred to as “markup”, but that should not be confused with markup tags in the sense of the text source serialization of an HTML document. In this sense “markup” is used to designate a hierarchical markup infoset like SGML (and its cousins/descendants XML and HTML) enables.

Layer Description HTML4All Initiatives
Presentation Presentation is left to CSS as the reference presentational layer as much as possible. In other words the default presentation of HTML documents is almost entirely expressible by CSS (HTML4All will separately propose enhancements to provide all default HTML presentational needs through CSS). CSS Enhancements for HTML
HTML Infoset Defines the set of object graphs representable by the HTML vocabulary along with: the document object model interfaces that allow the dynamic mutation of an object graph; and the the events fired when processing an HTML document by an interactive UA. This draft also includes: HTML Data Types, HTML QNameTypes, and HTML QName Vocabulary, HTML Events HTML (this draft)
DOM and Namespace enhancements for HTML
Parsing / Serializing Algorithms which define how to encode an HTML Object Graph into a specific serialization and decode a serialization into an HTML DOM. HTML has its own parsing algorithm that thoroughly defines error-recovery for a wide variety of cHTML-like text based serializations. Other specifications define their own parsing such as HTML5, XML, and EXI. tHTML parsing / cHTML serializing
Serialization cHTML (Canonical HTML), XML, EXI, HTML 4.01 cHTML


Contents

Introduction

The vocabulary for HTML builds heavily on the work of the W3C – in particular the previous work of W3C Work Groups in defining HTML, XHTML, XForms, XSD and Namespaces in XML. In many cases, this draft simply adopts entire W3C recommendations in whole as part of a new HTML. In other instances, an XHTML2 module may be referenced, for example, but then a new module created specifically for this draft: particularly where important features are missing from the pre-existing module or when the module needs other adjustments to add compatibility with legacy content and tHTML parsing.

An element is defined by 1) its structure or meaning, 2) its name, 3) the attributes permitted and required on the element and 4) the content model of the element —whether the element permits non-whitespace text and the elements permitted as child elements; 4) the interactive processor requirements associated with the element.

An attribute is similarly defined by 1) its meaning, 2) its name, 3) the data types permitted as the value of that attribute; 4) the interactive processor requirements associated with that attribute.

A DOM interface is defined by its methods and attributes (a one argument setter method and zero argument getter method) which are in turn defined by: 1) their name, 2) the arguments passed with the method, and the return value for that method (or DOM attribute). Some DOM attributes are defined as 'readonly'—meaning the DOM allows the author to get the value, but not set the value. Otherwise the DOM attribute serves as both a setter and a getter.

For authors, it is important to understand each element, attribute, and DOM interface in terms of these general properties. What does the element or attribute convey to the consumer of this document: either the user directly or through the user agent (UA). Automated conformance checkers can verify that elements are used in their proper context, that their content conforms to the specified content model and that they include valid attributes with valid attribute values. However, the authors must ensure that the element actually conveys the meaning the author intends. For example the 'em' element element conveys emphasis of the containing phrase which is often displayed as oblique or italicized text in visual user agents or spoken with more stress in speech based user agents. However, the use of emphasis is not to simply italicize the text or stress it aurally, but to indicate the text is emphasized independent of these various presentational mechanisms. The text might instead be underlined or painted green to stand out from the surrounding text in another context. By ensuring the correct element, authors ensure that their content integrates into other environments without requiring a meticulous reworking of the markup. Also by making use of a rich vocabulary of elements, authors need not concern themselves with the final presentation of their content. A particular style guide might call for the titles of books and emphasized text to be presented as italicized text. However another style might instead use italics for emphasis, but bold for book titles. By using the precise semantic markup for the intended meaning, the change in style requires no necessary changes to the HTML document. Otherwise updating to the new style requires careful examination of the HTML document to differentiate book titles from emphasized text, from any other semantic originally conveyed through italicized text.

For implementations, conformance is defined in terms of support for several other related specifications needed to process an HTML document and then, added to that, HTML-specific processing requirements. For a non-interactive HTML consuming implementation, most of the requirements are covered simply by support for either an XML or tHTML conforming parsing application, DOM support including the DOM attributes and methods in this specification, limited support for events, and CSS support for presentation. For example a laser printer might fully conform to this specification by meeting only those requirements. On the other hand an interactive application requires adherence to more HTML-specific processing—including a complete list of interactive events—than the non-interactive printer engine UA.

However, like previous versions of HTML, this version is designed to allow documents to include graceful fallback mechanisms for partially compliant UAs. Various features allow authors the flexibility to provide document-customized content that also participates in that graceful fallback. Using the laser printer example from above, a partially compliant UA might not provide sufficient DOM support or no DOM support at all. Authors could provide static content that made it clear some content was missing yet provided as much information as possible relying on a static document. Similarly, a UA might not support all of the CSS features necessary to present the document's content, however HTML provides authors mechanisms to cope with these non-conforming UAs whenever possible. As an example, HTML adds the 'marks' attribute to provide better support for UAs that do not provide sufficient CSS :before and :after pseudo element selector or generated content support. Providing graceful fallback is not the ideal way authors want their content presented, but it is an important ingredient in creating device-independent documents. As UA support for essential features improves, the loss of presentational options becomes less and less crucial to the task of conveying the meaning inscribed in HTML documents.

Conformance Definitions

Defines conformance for documents and general user agent conformance.

Document Conformance

[edit] Document Conformance

  • No HTML vocabulary element contains HTML vocabulary elements or non-whitespace text not permitted by its content model as defined in this specification
  • No HTML vocabulary element contains HTML vocabulary attributes not permitted by this specification and any required attributes are included as well
  • No HTML vocabulary attribute is assigned a value that is not the assigned data type for the attribute or a data type derived by restriction from the data type assigned by this specification

HTML Hosting Document Conformance

In addition to the requirements for document conformance:

  • The root element must be the 'html' element as defined in this specification and its namespace URI must be the namespace designated by this specification
  • The document conforms to Namespaces in XML in terms of the declaration of namespace prefixes and the use of QName and NCName for elements, attributes and attribute values.
Note: the default namespace (the namespace without a prefix URI mapping) need not be the HTML namespace, but rather the root element defines a compound document as an HTML hosted document.
  • The 'head' element, the 'body' element and the 'title' element must be included.

HTML Hosted By Another Vocabulary

In addition to the requirements for document conformance, except that authors may optionally omit the 'html', 'head', and 'body' elements:

UA Conformance

General Conformance:

  • Support for either tHTML or XML parsing or both
  • CSS2 support or equivalent to achieve default presentation—including the CSS2 speech-related properties from the CSS2 aural stylesheets for speech-capable browsers.[1]
  • Level 2 DOM Core, and DOM HTML [Other DOM?]
  • The DOM methods and attributes specified in this draft
  • XML Namespaces for XML parsing UAs and tHTML Namespaces for tHTML parsing UAs
  • XML Events
  • Those Events specified in this draft except for the user-interaction events

Interactive UA conformance

  • Interactive processing norms specified in this draft including:
    • the user-interaction events specified in this draft
    • user interface to inspect the HTML properties of elements independent of the presentation

Conformance checkers

  • validity in terms of:
    • content models
    • permitted and required attributes
    • attribute values data types
    • cross attribute validity (i.e., ensuring values for attributes such as 'content', 'datatype', 'units' and others for an element are consistent with one another)
    • alternate text requirements
  • warnings
    • always warn of potentially inappropriate use of the 'processas' attribute (this attribute should not be used to correct mistaken metadata from servers and other content-type conveying protocols; authors should instead correct the metadata elsewhere but may use this attribute for this purpose only as a last resort.)
    • alternate text composition
    • table summary composition
    • providing abbr for table columns
    • header/data cell association for complex tables
  • suppression of errors and warnings (requirements and recommendations)

HTML Graphical/Visual editors

  • User interface labels must convey the semantics of the element (e.g., To toggle an 'em' element the UI should indicate "em" or "emphasis" and not "italics", or to toggle an 'i' element the user interface should indicate “italics” and not 'emphasis').

Converters and Automatic Generators

This category includes word processors that convert a non-HTML format to HTML, OCR software that outputs HTML, Software that extracts content from a disk or another device and wraps it in HTML, etc.

  • UAs must not fabricate content to meet other conformance criteria (e.g., don't add a charset value when converting plain text to HTML unless the charset can be determined with reasonable certainty; and do not add an alt attribute or dummy alternate text simply to “fool” other conformance checking UAs)
In other words, automated HTML converters and generators are exempted from whichever document conformance norms require human intervention (though UAs are welcome to develop sophisticated new methods to substitute for human intervention, if possible). Automated UAs should also consider, as part of an overarching design, junctures for user interaction where user can fill in missing information such as alternate text or character encoding to make automatically generated HTML content fully conforming.

Definitions and Conventions

Specifies the syntactic conventions, and a glossary of terms

There is much confusion and poorly developed vocabulary surroudning HTML. Several concepts are designated by the term HTML and this leads to confusion. At times other specifications use HTML to refer to 1) a serialization of a DOM tree (e.g., cHTML); 2) the HTML vocabulary utilized in a DOM tree (HTML); 3) a specific parsing algorithm for deserializing a DOM tree serialization into a DOM tree (tHTML); 4) a DOM tree flagged as HTML (flagged as non-XML). In this specification we try to keep these terms clearly separated (with the terms in parenthases). Here, only the HTML DOM tree vocabulary schema retains the moniker of HTML.

A typical scenario might avoid all of these intracies. For example a pure HTML document might be:

  1. a cHTML file handed to a UA processor with a Content-Type 'text/html'
  2. a non-XML flag is set
  3. a tHTML parser is used to parse the file into an HTML DOM tree
  4. HTML DOM attributes and methods are used to mutate the DOM tree according to the non-XML flagged document rules (often ASCII lower case conversion of attribute arguments and case insensitive comparisons)
  5. the mutated DOM tree is serialized back to a cHTML file properly marked with 'text/html' content-type metadata

On the other extreme a document might be:

  1. a cHTML file is handed to a UA processor with a Content-Type 'application/xhtml+xml'
  2. no non-XML flag is set
  3. an XML parser is used to parse the file into an HTML DOM tree
  4. HTML DOM attributes and methods are used to mutate the DOM tree according to the XML DOM rules (e.g., case sensitive comparisons and case preserving mutations)
  5. the mutated DOM tree is serialized back to a cHTML file properly marked with 'application/xhtml+xml' content-type metadata

The two prior examples mark the two extreme scenarios: the pure cases. However it is also possible for some of these steps to muddy the difference between XML processing and non-XML processing. For example, once the document is in-memory as a DOM tree, it is possible to add content that cannot be serialized back to a tHTML parser compatible serialized form. Likewise the in-memory DOM tree might mutate into a form where the a non-cHTML serialization might be incompatible with parsing by an XML parser (requiring noscript {display:none} for example having invalid xml invalid html elements that are also not well-formed element names in xml).

[Some discussion of the various confusions applied when describing a document as an HTML document. In some sense everything here refers to HTML documents. However, even in the DOM HTML adds its own DOMHTML*Element with specific interfaces for each.

  • delivered to the processor as a 'text/html' or 'application/xhtml+xml' Content-Type.
  • Created programmatically using implemntation.createDocument() (XML) and subeqently calling openDocument() (non-XML this is a one-way method that cannot be reversed [is that right?; what does that affect? perhaps only the default serialization in many implementaitons]).
  • In all other ways a parsed document once parsed is largely indistinguishable whether XML or non-XML, however the parsing and serialization limits the vocabulary that can be used in memory and the case-sensitivity of selectors and DOM calls and the case of elements.
    • [following HTML5, noscript elements should not be in documents about to be serialized to XML; changing the implementation stylehseet for HTML to noscript {display:none} should solve the problem]
    • [following HTML5 a 'charset' attribute should not be in XML serializations but in cHTML a charset attribute value would be required to match the encoding of the document so it does no harm if it is there
    • Many DOM attributes have their arguments converted to ASCII lowercase when a document is flagged as non-XML
    • HTML 5 calls for altered XPath/XSLT behavior for non-XML flagged documents
    • non-XML flagged documents generally must have both the default namespace set to the html namespace URI and the root element as the html element in the html namespace
    • xml:lang
    • xml:base
    • uppercase DOM versus case preserved DOM
    • case insensitive selectors (and DOM attributes and methods?) versus case sensitive
    • Tree Tables may be unachievable with tHTML parsing
    • when flagged non-XML, document.write(), document.writeIn(), open(), and close() are available API use, whereas for XML documents these are not available.

HTML Foundation

This draft defines most of its features and capabilities through modules. However, there are some features and capabilities that are defined here to serve as reusable features for modules. These include the common DOM interfaces which other DOM interfaces can include as arguments or return values and from which other DOM interfaces might inherit methods and DOM attributes.

Content sets provide a shorthand for specifying commonly grouped elements in the content models of an element. Likewise, attribute collections provide a shorthand for specifying common attributes permitted or required on an element. Finally, vocabulary serves as a place to define the data types permitted for attribute values and the QName types and the associated QName values defined within HTML.

Parsing and Serialization

The parsing and serialization of HTML documents is not defined in this specification. Authors are free to use any of several existing serialization/parsing specifications such as: XML, EXi and the tHTML/cHTML parsing and serialization companion specifications from HTML4All. Implementors might also define other serialized forms of HTML that use their own serialization algorithms and either make use of existing parsing specifications or a completely new parsing specification.

Content Sets

Attribute Collections

Vocabulary

DOM and Common DOM Interfaces

Events and Interactivity

Core Events

Interactive Events

Inherent Accessibility States

Describes how various elements map to accessibility states based on the element name, attached attributes and attribute values.

Presentation

Whenever possible, default normative and suggested presentation of HTML documents is defined in terms of the latest CSS recommendation and its capabilities [CSS2 as of now]. Some exceptions are necessary when CSS lacks the capabilities to define the presentation of some HTML semantics. In those cases a prose description will be provided instead. HTML UAs are not required to also adopt CSS, but the default presentation must be equivalent to the presentational norms expressed as CSS.

Presentational recommendations/requirements where CSS is insufficient:

  • Document title presented as the title of a window
  • Link navigation
  • Rich interactive user interface access to document data (metadata, monikers, etc)

Identifiers and Data Types

Discussion of:

  • Identifier production including (Including ID, QIDRef, QNIDRef, and HName identifiers)
    • the relation of HTML identifiers to XML Names, Namespaces in XML NCNames, xsd:Name, xsd:NCName, xsd:QName
    • the varying uniqueness constraints for different identifier types
  • non-identifier names including xsd:NCNames, xsd:QNames, xsd:CURIEs, etc.
  • xsd:NMToken production and use
  • Other external Names (ExtNames)
  • xsd:boolean versus boolean (HTML boolean) data types

In a sense, all names identify. However, identifiers in HTML refer to names used to identify fragments, specific marks (or points), or hierarchically arranged elements within the HTML document. Also Names (as opposed to names) have specific definitions: they are comprised of a very specific set of characters. Identifiers as a type of a Name have the additional restriction of specific uniqueness constraints depending on the identifier type.

Identifiers are somewhat complicated by legacy processing issues and the desire to accomodate compound documents both with HTML as a host and HTML as a hosted sub-content-type. Due to this heterogenous environment in which HTML is designed to function, there are four different attributes used to uniquely identify an element within a document (one is a fairly redundant legacy attribute and the others serve different identifying functions).

  1. the id attribute which accepts a document-unique ID or QIDRef data type identifier
  2. the xml:id attribute which accepts a document-unique ID or QIDRef data type identifier
  3. the legacy name attribute which accepts a specialized legacy HName identifier data type which must be document unique or form element unique depending on the circumstances
  4. the nid attribute which accepts a node-unique QNIDRef data type identifier

[Differentiating ID and QNIDRef on the one hand and IDREF and NIDREF on the other hand. Here we use 'Ref' in terms of a reference to a text URI QNameRef as a flat string reference to compound QName (namespace name; local name). IDREF and NIDREF on the other hand are references to a targeted attribute with a particular ID/QIDRef or QNIDRef respectively. Perhaps 'Rep' for representation is more appropriate in that a prefix:localname string is a representation of a compound QName structure with a namespace name and a local name.]

The legacy name attribute which accepts this HName identifier is valid only on a few elements. These elements with this name attribute are:

  1. object (document-wide uniqueness)
  2. img (document-wide uniqueness)
  3. a (document-wide uniqueness)
  4. map (document-wide uniqueness)
  5. frame (document-wide uniqueness)
  6. frameset (document-wide uniqueness)
  7. iframe (document-wide uniqueness)
  8. form (document-wide uniqueness)
  9. input (form-wide uniqueness)
  10. select (form-wide uniqueness)
  11. textarea (form-wide uniqueness)
  12. button (form-wide uniqueness)

While using 'name' for their name, attributes these attributes accept a NMToken for their value and do not have any conflict with the id and nid attribute values. These are not identifiers in the HTML sense nor ID values in the XML sense. These may act as identifiers for other protocols or specifications but not within HTML. In particular, the prohibited characters for HTML Name, NCName, NMToken, and so forth do not apply to the data types for these elements.

  1. param (this name is a parameter name intended for a object handler)
  2. meta (this name for metadata schema, and has some overlap with http-equiv and property attributes. Authors may specify all three on a meta element, but UAs should treat the property as authoritative over name over http-equiv)
    • In addition the http-equiv attribute takes a name of some sort, but it too is not governed by the character restrictions in HTML names.
Understanding identifier types

For document conformance, all ID and HName identifiers must match the requirements of Name in XML 1.0 and should match the requirements for NCName as defined by Namespaces for XML. All QNIDRef and QIDRef must match the NCName requirements for their local name and namespace prefix as defined in Namespaces for XML.

In addition to conforming to the XML and Namespace in XML specification for Names and NCNames, valid identifiers in HTML must exhibit uniqueness according to the identifier type.

Uniqueness↓Must conform to
xsd:NCName
Must conform to
xsd:QName
Should conform to
xsd:NCName
Details
document-wideIDQIDRefany identifier used as a ID or QIDRef value must not be used as a QNIDRef within the same document. QIDRefs may only be used within xml:id attributes and must not be used within xml:id attributes
node-wideQNIDRefany identifier used as a QNIDRef must not be used as an ID or QIDRef within the same document though it may be used any number of times as a QNIDRef so long as it adheres to the node-unique requirement.
document or formHName

HNames adhere to the same rules as IDs, though authors may include the same redundant value on either an 'id' or 'xml:id' attribute as the author includes on the element’s 'name' attribute (for the 'name' attribute’s which take an HName data type, i.e., not 'param' nor 'meta').

objectExtName for 'param@name'ExtNames are defined external to HTML and the production of these names is not defined nor restricted by HTML. However those defining such names for use as ExtNames within HTML should consider adhering to the xsd:NCName restrictions for the production of such names.
n/aExtName for 'meta@name'
n/aExtName for 'meta@http-equiv'
IDREF and NIDREF in contrast to QIDRef

The all caps form REF indicates an attribute conforming to the ID or NID (QNIDRef) production, but referencing a targeted identifier. Target NIDs are referenced as an absolute or relative XPath. While IDs are referenced as a literal string matching either the ID itself, NIDREFs match a modified XPath expression involving "/#" delimited QIDRef, or QNIDRef path component strings.

ID, QIDRef, and QNIDRef interaction
  • Any value used as an ID or QIDRef should appear only once in a document – i.e., the value cannot match any other ID, QIDRef, or QNIDRef within the document, where value equality is defined in terms of:
    • a case-sensitive string comparison of the full ID, QIDRef, and QNIDRef values; and
    • for a QIDRefs and QNIDRefs both
      • a case-sensitive string comparison of the associated namespace URI/name; and
      • a case-sensitive string comparison of the local name (i.e., the portion after the colon for a prefixed name or the entire string for un-prefixed references)

Though IDs and HNames both: 1) permit colons and 2) identifier uniqueness is defined in terms of comparisons sensitive comparisons, authors are encouraged to: A) avoid colons in all Names and to treat all Names as NCNames; and B) to avoid using identifiers that differ only by case.

Producing Names and NCNames

[Note: XML 1.1 introduced a confused definition of the Name data type, which should be addressed in some way. XML 1.0 has since been revised to adopt the 1.1 confused definition of NameChar and NameStartChar characters. Unicode has adopted XID and XIDContinue properties so in terms of machine processing we can easily state that NameChar must be a character that has the XIDContinue Unicode property true and NameStartChar must be a character that has the XID property true. ]

[Note: In terms of authoring the best way to understand which characters are allowed and which are not is to understand that the main approach is designed to exclude characters that may be reserved for special syntax (ASCII and ISO8859-1 commonly encountered keyboard enterable punctuation)]

The preferred approach is to use the id attribute in non-compound HTML documents or compound documents where every intra-document content-type supports its own 'id' ID typed attribute named 'id' (or even when the foreign content uses an ID typed attribute that has another name). When ever legacy targeted UAs require the use of the name attribute for certain circumstances (such as in forms), authors should add the name attribute and may also include one of the other identifier attributes so long as they have the identical case-sensitively compared string values.

All Unicode characters are allowed a characters in Names, and NCNames – for use as IDs, QIDRefs, QNIDRefs, HNames – except for those few the following table. The subsequent table lists a few more characters and types of characters that are prohibited as the initial start character for a Name or NCName (Names and NCNames only differ in their production in terms of the colon character).

CharCode PointName
Excluded ASCII and 8859-1 derived symbols and punctuation
!U+0021EXCLAMATION MARK
"U+0022QUOTATION MARK
#U+0023NUMBER SIGN
$U+0024DOLLAR SIGN
%U+0025PERCENT SIGN
&U+0026AMPERSAND
'U+0027APOSTROPHE
(U+0028LEFT PARENTHESIS
)U+0029RIGHT PARENTHESIS
*U+002AASTERISK
+U+002BPLUS SIGN
,U+002CCOMMA
/U+002FSOLIDUS
:U+003ACOLONNot prohibited by XML but by Namespaces in XML. HTML recommends authors follow this production for Names as well as prohibiting colons from NCNames
;U+003BSEMICOLON
<U+003CLESS-THAN SIGN
=U+003DEQUALS SIGN
>U+003EGREATER-THAN SIGN
?U+003FQUESTION MARK
@U+0040COMMERCIAL AT
[U+005BLEFT SQUARE BRACKET
\U+005CREVERSE SOLIDUS
]U+005DRIGHT SQUARE BRACKET
^U+005ECIRCUMFLEX ACCENT
`U+0060GRAVE ACCENT
{U+007BLEFT CURLY BRACKET
|U+007CVERTICAL LINE
}U+007DRIGHT CURLY BRACKET
×U+00D7MULTIPLICATION SIGN
÷U+00F7DIVISION SIGN
U+0000–U+001F, U+007F, U+0080–U+009FAll control characters (C0, C1)
¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ° ± ² ³ ´ µ ¶ ¸ ¹ º » ¼ ½ ¾ ¿U+00A0–U+00B6, U+00B8–U+00BFSome symbols and punctuation, and other characters from the Latin-1-Supplement Block (excepting · MIDDLE DOT U+00B7)
Other characters prohibited from Name production
;U+037EGREEK QUESTION MARKThe canonical decomposition for this character is the SEMICOLON U+003B which must be excluded for syntactical reasons
‐ ‑ ‒ – — ― ‖ ‗ ‘ ’ ‚ ‛ “ ” „ ‟ † ‡ • ‣ ․ ‥ … ‧ ‰ ‱ ′ ″ ‴ ‵ ‶ ‷ ‸ ‹ › ※ ‼ ‽ ‾ ⁁ ⁂ ⁃ ⁄ ⁅ ⁆ ⁇ ⁈ ⁉ ⁊ ⁋ ⁌ ⁍ ⁎ ⁏ ⁐ ⁑ ⁒ ⁓ ⁔ ⁕ ⁖ ⁗ ⁘ ⁙ ⁚ ⁛ ⁜ ⁝ ⁞U+2000–U+206FGeneral Punctuation block (including Word Joiner, Fraction Slash and many space, bidi-control, math-invisible, and deprecated characters)

All but a few exceptions in the General Punctuation block. exceptions are:

  1. UNDERTIE: ‿ (U+203F)
  2. CHARACTER TIE: ⁀ (U+2040)
  3. ZERO WIDTH NON-JOINER (U+200C)
  4. ZERO WIDTH JOINER (U+200D)

(Note this leaves punctuation from the Supplementary Punctuation block and punctuation within each specific script block)

U+2100–U+2BFFSymbol blocksAll characters from the BMP dedicated Symbol blocks
U+2FF0–U+2FFFIdeographic Description Character blockAll characters from the Ideographic Description Character (IDC) block
U+D800–U+DFFFSurrogate code pointsAll Surrogate Code Points (not characters by definition and a name must be comprised of characters only)
U+E000–U+F8FFPrivate Use CharactersAll BMP Private Use Characters (note, this leaves all Private Use Characters outside the BMP according to XML, but authors are discouraged by this HTML specification from using any Private Use characters for names and identifiers)
U+FDD0–U+FDEF, U+FFFE, U+FFFFNonCharacter code pointsAll BMP NonCharacters (not characters by definition and a name must be comprised of characters; [note that the XML definition neglects to explicitly exclude non-BMP NonCharacters but authors should avoid those too in producing names]...)
Though not prohibited by XML and Namespaces in XML authors should also avoid the following characters and other code points as a Name Character
U+FEFFZERO WIDTH NO-BREAK SPACE (Byte-order Mark)The use of this character as a Zero width no-break space is deprecated and should only ever appear as a byte-order mark at the very beginning of an encoded string. Instead authors must use U+2060, however this character is already prohibited by XML Name production, so the Word Joiner semantics are simply not allowed in Names
any U+xxxFE, any U+xxxFE, U+1FFFFE, U+1FFFFFNonCharactersAll code points outside the BMP ending in U+xxFE and U+xxFF. These are NonCharacter code points. (within the BMP, U+FFFE and U+FFFF and the other BMP NonCharacters are already prohibited by XML).
U+FFF0–U+FFFDSpecials BlockInterlinear Annotation, Object Replacement Character, and Replacement Character
U+3000IDEOGRAPHIC SPACE
U+3001IDEOGRAPHIC COMMA
U+3002IDEOGRAPHIC FULL STOP
U+3300–U+33FFCJK Compatibility block

BMP Compatibility Zone Blocks with some notable exceptions (1 block of 16 plus 40 additional [or perhaps 286 additional] characters): This indented portion of the table are exceptions to the BMP Compatibility Zone exclusion meaning authors are welcome to use these excepted characters in the Compatibility Zone.

Though officially Unicode has no designation of a Compatibility Block Zone, all of the BMP blocks beyond the Surrogate code point and Private Use character blocks (1,775 code points) constitute almost entirely unambiguous compatibility characters in the sense of characters only encoded for round-trip compatibility with other non-Unicode/nonISO10646 character set encodings. In addition these characters are generally forms of other characters best handled at the presentation layer using fonts, glyphs and rendering engines capable of separating the character layer from the presentation layer. In general these block have the term "Compatibility" or "Form" in their name and the Combining Half Marks block.

In general authors can then exclude from Names and NCNames all characters in the BMP above the Surrogate blocks from D800 (decimal 55,296) to the end of the BMP FFFF (decimal 65536) for a total of 10,241 code points at the end of the BMP, though with the following 302 exception code points which are available for use.

U+FE00–U+FEFFVariation Selectors block16 Variation Selectors
﨎 U+FA0ECJK COMPATIBILITY IDEOGRAPH-uuuu
﨏 U+FA0F
﨑 U+FA11
﨓 U+FA13
﨔 U+FA14
﨟 U+FA1F
﨡 U+FA21
﨣 U+FA23
﨤 U+FA24
﨧 U+FA27
﨨 U+FA28
﨩 U+FA29
ﬞ U+FB1EHEBREW POINT JUDEO-SPANISH VARIKA
﴾ U+FD3EORNATE LEFT PARENTHESIS
﴿ U+FD3FORNATE RIGHT PARENTHESIS
﷽ U+FDFDARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
﷼ U+FDFCRIAL SIGN
﹅ U+FE45SESAME DOT
﹆ U+FE46WHITE SESAME DOT
[NOTE: should probably add exceptions for 12 other Arabic ligatures <isolated> decomposition keyword in this compatibility blocks range (3 decomposing to 3 characters, seven decomposing to 4 characters, one decomposing to 8 characters, and one decomposing to 16 characters): ﷰ U+FDF0, ﷱ U+FDF1, ﷲ U+FDF2, ﷳ U+FDF3, ﷴ U+FDF4, ﷵ U+FDF5, ﷶ U+FDF6, ﷷ U+FDF7, ﷸ U+FDF8, ﷹ U+FDF9, ﷺ U+FDFA, ﷻ U+FDFB
⓿ U+24FFNEGATIVE CIRCLED DIGIT ZERO
⓫–⓴ U+24FF–U+24F4NEGATIVE CIRCLED NUMBER ELEVEN through NEGATIVE CIRCLED NUMBER TWENTY
⓵–⓾ U+24F5–U+24FEDOUBLE CIRCLED DIGIT ONE through DOUBLE CIRCLED NUMBER TEN
[NOTE: should probably add exceptions for all 234 other <circle> decomposition keyword characters in this compatibility blocks range
Note that U+FE73 ARABIC TAIL FRAGMENT (ﹳ) has no compatibility mapping but is not excepted here since it is intended for legacy text rendering systems that do not support the separation between characters and glyphs
Unicode Properties to consider for excluded characters used in producing and validating Name Characters
property XID_Continue=NOOstensibly created for XML and HTML support, but it is incomplete in terms of characters which should be excluded
property ID_Continue=NO
property Pattern_Syntax=YESSuch characters potentially inhibit the use of processing syntax and differentiating name and identifier characters from syntax characters.
property Pattern_Whitespace=YES
property Whitespace=YES
property Default_Ignorable_Code_Point =YESSuch characters potentially inhibit the use of processing syntax and differentiating name and identifier characters from syntax characters.
General_Category: Zs, Zl, Zp, Cc, Cf, Cs, Co General Categories: Space Separator (Zs), Line Separator (Zl), Paragraph Separator (Zp), Control (Cc), Format (Cf), Surrogate (Cs), Private Use (Co)

[Note: Possibly recommend all Name Characters for any Name come from the same script potentially supplemented by characters from Common and Inherited scripts]

Note Names do not support word breaks, sentence breaks, line-breaking controls, interlinear annotation (in HTML, though XML 1.0 technically does support it), or bidirectional text. They do support joining controls, variation selectors and COMBINING GRAPHEME JOINER.

Additional characters prohibited as the initial start character for a Name
CharCode PointNameDetails
-U+002DHYPHEN-MINUS
.U+002EFULL STOP
[0-9]U+0030–U+0039Basic Latin block decimal digitsNote this only applies to the Basic Latin block decimal digits so all other decimal digits from other scripts and Mathematical Symbols outside the BMP are suitable for Name start characters
U+203FUNDERTIE
U+2040CHARACTER TIE
·U+00B7MIDDLE DOT
U+0300–U+036FCombining Diacritical Marks block
Though not prohibited by XML and Namespaces in XML authors should also avoid the following characters as a Name start character
Unicode Properties to consider for excluded characters used in producing and validating Name Characters
property XID_Start=NO - [XID_Continue=YES] [XID_Continue=NO]Ostensibly created for XML and HTML support, but it is incomplete in terms of characters which should be excluded
property ID_Start=NO - [ID_Continue=YES] - [ID_Continue=NO]
Any character with the property Grapheme_Extend=YESSuch characters have no meaning as the start of a string.

Core Modules

[edit] Document Module

This module provides the overarching elements to serve as a host document vocabulary. When used within another host document, these elements are not required. The attributes included here are all defined elsewhere and their inclusion here is only to 1) define the inherited default values on elements in the absence of author overrides and 2) the fixed values of attributes that are implied for any HTML document written to this specification.

[xmlns attribute as a required attribute to serve as a namespace processing switch indicated by the author?]

[edit] Elements

Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
html manifest, version 'head', 'body' none (as root element) | anyForeignElement HTMLHtmlElement The 'html' element is required except in compound document where it may be omitted.
head profile 'meta@charset' 'title', ['base']? 'meta'*, 'link'*, 'style'*, 'script'*, 'handler'*, 'object'*, 'eventsource'*, 'menu'*, 'access'*, 'x:model'* 'html' | anyForeignElement (along with a next adjacent sibling 'body' element | as the first child element of a 'section' or 'article' element)* HTMLHeadElement The 'head' element is required except in compound document where it may be omitted. In documents targeted for tHTML parsing, when using a non-UTF encoding or a UTF encoding with no BOM, authors should include a 'meta' element with a 'charset' attribute indicating the character encoding for the document. For XML serializations the 'meta' element with a 'charset' attribute has no meaning.
body Structural 'html' | anyForeignElement HTMLBodyElement The 'body' element is required except in compound non-HTML hosted document, where it may be omitted.


[edit] Attributes

'html' element attributes
Attribue name Type Default DOM Other UA behavior Details
version CDATA [FIXED] "4.1" (or 4.2, 4.3, 5.1 as the case may be, where the either the UA indicates the version applied to processing the resource or the author indicates the version targeted.) R If this attribute is set explicitly by authors, the UA must return the value as indicated by the author. If this attribute is not set explicitly, then a UA must return the value for the latest version the UA targets for conformance.)
manifest URIRep The manifest attribute gives the address of the document's application cache manifest, if there is one. If the attribute is present, the attribute's value must be a valid URL reference.
xmlns:idns URIRep [FIXED]"html4all.org/ns/idns#" R The 'idns' prefix defines its own namespace which supports attributes of any arbitrary Name an author chooses. All attributes prefixed with the 'idns' prefix define a namespace declaration for Name namespaced unique identifier values for use with the 'id' attribute. Regarding Namespaces in XML 1.0 terminology these are no longer IDs as unique XML Name identifiers, but instead unique QName identifiers or QIDS. However these namespaced IDs can also be used as node-unique identifiers or QNIDRefs which, when namespaced, are Qualified Node-unique IDentifiers or QNIDRefs.
xmlns:classns URIRep [FIXED]"html4all.org/ns/classns#"R The 'classns' prefix defines its own namespace which supports attributes of any arbitrary Name an author chooses. All attributes prefixed with the 'classns' prefix define a namespace declaration for NMToken namespaced values for use with the 'class' attribute. Parallel with Namespaces in XML 1.0 terminology these are no longer simply NMTokens for the 'class' attribute, but instead qualified NMTokens, or QNMTokens. It is important to note that the classns prefix itself is a Name as defined in XML 1.0, while the vocabulary defined by the NCName prefix are all NMTokens. In other words the NMTokens defined by the vocabulary do not restrict the starting character the way XML Names do, but the prefix must conform to the NCName production of Namespaces in XML 1.0.
'html' element attributes defined elsewhere with fixed values in HTML regardless of serialization and regardless of parsing specification
xmlns:xmlns URIRep [FIXED] "http://www.w3.org/2000/xmlns" R imposed by the parser in all cases to bootstrap the namespace processing
xmlns:xml URIRep [FIXED] "http://www.w3.org/XML/1998/namespace" R imposed by the parser in all cases to bootstrap the namespace processing
xmlns:xsiURIRep [FIXED] "http://www.w3.org/2001/XMLSchema-instance" R imposed by the parser in all cases to bootstrap the namespace processing [is this the case or if not, should we make it part of a fixed or default declaration for the HTML schema?]
xsi:schemaLocation URIRep [FIXED] "http://www.w3.org/1999/xhtml
[TBD]"
R

As the above attributes all have fixed values (with the exception of 'manifest'), authors may omit them in most cases. Along with the DocType declaration these attributes can participate in identifying the markup vocabulary as HTML for generic XML and SGML processors. For example with a DocType declaration authors might eliminate all of these attributes since the systemID of the DocType declaration will identify the schema which populates all of the attributes with fixed and default values. Such schema awareness does not require any retrieval of the schema, since UAs can hard-code awareness of the association between the DocType systemID and the HTML schema. On the other hand authors might instead eliminate the DocType declaration and instead rely on the namespace mechanism to identify the content as HTML content in HTML aware UAs. Or including the xsi:schemaDefinition or xml:schemaDefinition attributes can provide generic XML and CSS aware UAs with all of the schema necessary to process the document. In any event the HTML markup vocabulary defined in this specification works alongside other hierarchical markup vocabularies so it must be uniquely identified as HTML for user agents to properly process the content.

Proposed new 'xml:schemaDefinition' global and scoped attribute

Support whether parsed from XML and tHTML parsers for xml:schemaLocation (to provide schema-independent schema location information). A semi-colon delimited list of schema location declaration triplets. With whitespace-separated triplet: schemaTypeNamespaceURI namespaceURI schemaURL. If the triplet has only two URIs, then those URIs represent the schemaTypeNamespaceURI is presumed to be XSD (i.e., "http://www.w3.org/2001/XMLSchema-instance"). If only one URI is in the triplet then it represents the schemaURL location of the schema for the document’s only vocabulary (regardless of its namespace).

If the URIs themselves have any literal semi-colons then those semi-colon must all be percent encoded (i.e., replaced with %3B). Likewise if the URIs themselves have any literal spaces, then those spaces must all be percent encoded (i.e., replaced with %20)

xml:schemaLocation="
	schemaTypeNamespaceURI namespaceURI schemaURL;
	schemaTypeNamespaceURI namespaceURI schemaURL;
	...
	schemaTypeNamespaceURI namespaceURI schemaURL;
"
Equivalents

<html xsi:schemaLocation="
http://www.w3.org/1999/xhtml 
http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd
" >

<html xml:schemaLocation="
http://www.w3.org/2001/XMLSchema-instance
http://www.w3.org/1999/xhtml 
http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd
" >

In addition to these fixed values above, these namespaces are all declared by default for any HTML namespace hosted document:

'html' element
Attribue name Type Default DOM Other UA behavior Details
xmlns URIRep X

default: "html4all.org/ns/html#" or "http://www.w3.org/1999/xhtml" if the W3C cooperates with our work. Namespace declarations are the best way to differentiate one markup from another in both compound documents and homogenous documents. Without some indication of the markup type there is no way to determine that the markup is HTML and therefore that the 'xmlns' has a default value already set on the root element. Various alternatives exist to signal to a processor that the markup it encounters is the HTML defined by this specification.

  1. For tHTML parsers the xmlns attribute actually has a default value and therefore the parser assumes HTML for all unprefixed names which is the result of its parsing process.
  2. A DocType declaration can indicate to SGML processors (including XML processors) that the content is HTML according to either a public identifier or system identifier
  3. An xsi:schemaDefinition can associate schema with default attribue values with the markup even if it does not have a namespace declared
  4. The namespace itself can signal that the content is HTML to processors with a hard-coded association between the HTML namespace and the HMTL schema and so recognize the HTML namespace URI as HTML (i.e., "html4all.org/ns/html#" or "http://www.w3.org/1999/xhtml" or even both where both are treated as synonyms for this HTML and all preceding and succeeding HTMLs of the same abstract namespace)
xmlns:math URIRep X default: "http://www.w3.org/1998/Math/MathML"
xmlns:svg URIRep X default: "http://www.w3.org/2000/svg"
xmlns:xlink URIRep X default: "http://www.w3.org/1999/xlink"
xmlns:rdf URIRep X default: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:aria URIRep X default: "http://www.w3.org/2005/07/aaa/"
xmlns:aaa URIRep X default: "http://www.w3.org/2005/07/aaa/"
xmlns:x URIRep X default: "http://www.w3.org/2002/xforms"
xmlns:xforms URIRep X default: "http://www.w3.org/2002/xforms"
xmlns:xsd URIRep X default: "http://www.w3.org/2001/XMLSchema)"
xmlns:ev URIRep X default: "http://www.w3.org/2001/xml-events"
xmlns:f URIRep X default: "http://www.w3.org/2002/06/xframes/". [Perhaps this is not necessary if HTML frames are available in HTML regardless of parsing/serialization. The most important thing XFrames contributes is a new URI syntax for specifying a complete frame state. The other contributions are minimal or could be accomplished by simply deprecating the HTML Frames presentational attributes and encouraging the use of CSS for presentation instead].
xmlns:foaf URIRep X default: "http://xmlns.com/foaf/0.1/"
xmlns:dc URIRep X default: "http://purl.org/dc/terms/"
xmlns:cc URIRep X default: "http://creativecommons.org/ns#"

[Issue: xforms and aria are mapped to two prefixes. Is this permitted and is it advisable or useful?]

[edit] Attribute Root element default values

The following attributes are defined elsewhere, but their value is inherited and both implementors and authors should be conscientious about their value. Except for 'lang', all of these values have an initial setting on the root element that is inherited throughout the document unless and until author overridden. Authors should provide a valid 'lang' value whenever possible.

'html' element (default declarations; fully conforming UAs must impute these default mapping even when omitted from the document)
Attribue name Type Default DOM Other UA behavior Details
dir "ltr" | "rtl" | "lro" | "rlo" implementation dependent W Authors should set this value on the root 'html' or any other root element in a compound document to ensure proper dir handling by UAs. UAs must not use language information to determine the value for dir so authors should indicate either 'ltr' or 'rtl' as needed on every document.
lang LanguageCode implementation dependent W Authors should set this value on the root 'html' or any other root element in a compound document to ensure proper language handling by UAs
phoneticsystem URIRep "unicode"W ["unicode", if phonemes are approved in Unicode, otherwise "ipa"]
xml:base URIRep "" xmlBase This null default value indicates a base equivalent to the location of the document or the base set by the 'base' element’s 'href' attribute.
URI Character Set Encoding attributs
These defaults match the most prevelant behavior among current UAs, however sometimes authors may require more fine-grained control over such encodings. Authors who are able to use UTF-8 for documents, server-side processing, and other treatments of URIs will generally enjoy much greater ease of authoring.
scheme-charset Encoding "utf-8" W
authority-charset Encoding "utf-8" W
path-charset Encoding "utf-8" W
query-charset Encoding "" W The null value indicates the encoding of the present document (percent encodings should always be sent as their literal octet encoding)
fragment-charset Encoding "" W The null value indicates the encoding of the present document (percent encodings should always be sent as their literal octet encoding)

[edit] DOM Interface

  • Legacy 'body' DOM interface
  • Legacy 'head' DOM interface [or HTMLElement if we discontinue 'profile']

[edit] UA Processing

In terms of the default values and fixed attributes, HTML UAs must treat the 'html' element as implied even when omitted by authors. However, in compound documents hosted by another vocabulary, only the FIXED attribute values should be imputed and then it is only relevant for validation purposes.

[edit] Presentation Considerations

  • 'head' should be display: none in most circumstances
  • In tHTML parsed documents, the body element background determines the canvas background, while in the XML parsed documents, the root element determines the canvas background.

[edit] Authoring Considerations

Due to the presentation considerations raised above and for consistency sake, authors should include the same background style declaration for both the 'html' and the 'body' element. This ensures the same presentation regardless of the content-type processing of the document.

[edit] Command and Event Module

The Command and Event Module provides declarative markup facilities to dispatch events, issue user commands, and handle events. It includes the legacy HTML 'script' element as well as the newly introduced 'handler' element which is introduced to ease the learning curve for future authors.

The module includes two declarative mechanisms create establish specific sockets for events and dispatch those events in clearly defined ways. One is the eventsource element which serves as a URI response event declarative structure, allowing authors to create connections to external processing applications which can fire events which, for example, mutate the DOM. The second is the 'access' element which serves as a declarative structure for input events such as keyboard events, mouse events, and other standard input devices. The 'eventsource' and 'access' elements provide the archival state information to establish common event responders and then to dispatch events to existing handlers within a document.

The 'command' element and other commands allow highly reusable and non-localized 'eventsource' and 'access' elements to make use of localized commands within the document which can include localized content for help, hints, labels, and titles.

[edit] Elements

Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
access Target (targetid, targetnid, targetrole), Action (handler, command, action, dispatch), important, key, inputevent, modifierkeys EMPTY HTMLAccessElement
eventsource Target (targetid, targetnid, targetrole), Action (handler, command, action, dispatch), src, pulse EMPTY HTMLEventsourceElement
command Common, Action (handler, command, action, dispatch) help?, hint?, label?, alert? HTMLCommandElement
handler Common, execution, type (CDATA)* HTMLHandlerElement
script async, defer, type, charset, (CDATA)* HTMLScriptElement


[edit] Attributes

The following table defines the attributes for the access, eventsource, command, handler, and script elements. Target Specifying Attributes are used on access and eventsource elements while the Action Specifying Attributes are used on the access, eventsource, and command elements.

Target Specifying Attributes (Target) (One of these attributes may be specified, but authors must use only one of these attributes)
Attribue name Type Default DOM Other UA behavior Details
targetid IDREF W indicates the id of an element to preform an action specified by the 'action' attribute (only one of targetid, targetnid, or targetrole should be specified)
targetnid NIDREF W indicates the nid of the next element in document order to perform an action specified by the 'action' attribute (only one of targetid, targetnid, or targetrole should be specified)
targetrole QNameRef W indicates the role of the next element in document order to perform an action specified by the 'action' attribute (only one of targetid, targetnid, or targetrole should be specified)
Action Specifying Attributes (Action) One of these attributes may be specified, but authors must use only one of these attributes)
handlerURIRep | NIDREF W Indicates the URI of a handler to invoke when an event occurs.
commandIDREF | NIDREF W Indicates the id or nid of a command element or another element expressing a command. The command’s corresponding action must be invoked when the response event occurs. The intervening command allows some separation of event handling and localizable command associated text (e.g., help, hint, label, title). Since IDs and QNIDRefs share the same namespace, it is an authoring error for both an 'id' attribute and a 'nid' attribute to share the same value within the same document. In the case of such a name conflict error, UAs must treat the value as a QNIDRef (ignoring any 'id' attribute with the same value).
action ScriptExpression focus() W indicates an action to perform on the targeted element
dispatch QNameRef (event) focus W indicates an action to perform on the targeted element; the default is the same as the script expression focus()
'access' element
important xsd:boolean W indicates an event binding is important for cascade resolution of conflicting bindings.
key Character W indicates a character to serve as a proxy for a matching keyboard event
inputevent QNameRef W indicates a QName for a input device event
modifierkeys QNameRefs W indicates QNames for a single or combination of modifier keys on an input device
'eventsource' element
Attribue name Type Default DOM Other UA behavior Details
src URIRep W
pulse Integer W If present, indicates a value in seconds that a comment should be sent to the agent identified by the URI to keep a connection open. If absent, the connection is considered persistent and no pulse is required.
'handler' element
Attribue name Type Default DOM Other UA behavior Details
src URIRep W
execution ("async" | "defer" | "immediate")"immediate"W
'script' element
Attribue name Type Default DOM Other UA behavior Details
async boolean W
defer boolean W
type ContentType W
charset Encoding W

[edit] DOM Interface

[edit] UA Processing

[edit] Presentation Considerations

[edit] Authoring Considerations

[edit] Structural Module

[introductory explanation of module]

[edit] Elements

Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
address Common ( PCData | Structural )* | l+ Structural HTMLElement
article Common ( PCData | Text )* Structural HTMLElement
blockcode Common ( PCData | Text | Heading | Structural | List )* | l+ Structural HTMLElement
blockquote Common ( PCData | Text | Heading | Structural | List )* Structural HTMLElement
div Common ( PCData | Flow )* Structural HTMLDivElement
figure Common legend?, ( PCData | Flow )* Structural HTMLElement
h Common ( PCData | Text )*, h? Structural HTMLHeadingElement
h1 … h6 Common ( PCData | Text )*, Heading? Structural HTMLHeadingElement
l Common ( PCDATA | Text )* Structural | Text HTMLElement
p Common ( PCData | Text | List | blockcode | blockquote | pre | table )* | l+ Structural HTMLElement
pre Common ( PCData | Text )* | l+ Structural HTMLElement
section Common ( PCData | Structural )* Structure HTMLElement
separator Common EMPTY Structural HTMLElement

[edit] Attributes

Attribue name Type Default DOM Other UA behavior Details
W

[edit] DOM Interface

[edit] UA Processing

[edit] Presentation Considerations

[edit] Authoring Considerations

[edit] Text Module

This module provides elements to express core text/phrase markup.

[edit] Elements

Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
a Common ( PCData | Structural | Text )* Structural | Text HTMLElement Each 'a' element defines an anchor which is a document fragment subresource that participates in a link relation either as a source or a destination in the link relation or both.
  1. The 'a' element's content defines the position of the anchor.
  2. The 'id' attribute provides an ID for the anchor so that it may be the destination of zero or more links.
  3. The 'href' attribute makes this anchor the source anchor of exactly one link
  4. The 'rel' attribute provides further details on the type of link relation.
em Common ( PCDATA | Text )* Text HTMLElement
q Common ( PCDATA | Text )* Text HTMLElement
bdo Common (dir required) ( PCDATA | Text )* Text HTMLElement Much like the 'span' element except it changes the embedding level within the Unicode bidirectional text algorithm. Authors can accomplish the same results by using the 'dir' attribute on an existing element if the content embedded or override bidirectional text is already within another element.
span Common ( PCDATA | Text )* Text HTMLSpanElement

Authors will also find elements suitable for marking up phrases in other modules. For example:

  • the 'subtext' element from the Note Module;
  • several elements in the Moniker Module:'t', 'pn', 'abbr', 'iabbr', 'hom', 'icon', 'var', 'dfn' and 'define'
  • elements in the Software Documentation Module: 'samp', 'kbd', 'code', and 'blockcode'
  • elements in the Presentational Module: 'b', 'i', 'strike', 'u', 'strong', 'sub', sup', and 'pre'
  • the 'cite' element from the Citation and Attribution Module
  • the 'mark' element from the Bookmark and Clipping Module

These elements along with the Text Module elements are collectively defined as the Text Element Collection (i.e., "Text" without the quotation marks).

While somewhat of a hybrid, the 'l' element from the Structural module is also similar to the Text Module elements.

[Note: should we also add other text elements to convey semantics such as irony, emphatic quotation, sarcasm, etc. Except for emphatic quotations which generally get wrapped in quotation marks these other often do not traditionally get any specific typographical presentation.]

[edit] Attributes

[edit] DOM Interface

[edit] UA Processing

[edit] Presentation Considerations

[edit] Authoring Considerations

[edit] List Module

The list module supports unordered, ordered and definition lists.

[edit] Elements

Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
ul Common, type(QNameRef), start(Integer), sortable(boolean), reorderable(boolean), numbered(boolean), descending(boolean) label?, ( li )*Structural | Text HTMLListElement
ol Common, type(QNameRef), start(Integer), sortable(boolean), reorderable(boolean), numbered(boolean), descending(boolean) label?, ( li )* Structural | Text HTMLListElement
dl Common, type(QNameRef), start(Integer), sortable(boolean), reorderable(boolean), numbered(boolean), descending(boolean) label?, ( li )+ | ( dt, dd )* Structural | Text HTMLListElement The definition list allows for a
nl Common, type(QNameRef), start(Integer), sortable(boolean), reorderable(boolean), numbered(boolean), descending(boolean) label, ( li )* Structural | Text HTMLListElement Indicates a list of items to navigate around a document or to other related documents.
menu Common, type(QNameRef), start(Integer), sortable(boolean), reorderable(boolean), numbered(boolean), descending(boolean) label, ( li )* Structural | Text HTMLListElement Like nl, but for non-navigational menu of list items for users to select among (also offers greater compatibility with tHTML parsing). The menu element can also be used for menu barmenus, toolbars, contextual menus.
li Common, value(Integer) ( Structural | Text )* | (dt, dd)+ ul | ol | dl HTMLLIElement
dt Common ( Text )* dl | li HTMLElement
dd Common, Moniker ( Structural | Text )* 'dl' | 'li' HTMLDefineElement

[edit] Attributes

Attribue name Type Default DOM Other UA behavior Details
type QNameRef W
start Integer W
value Integer W
sortable boolean W
reorederable boolean W
descending boolean W
numbered boolean W

[edit] DOM Interface

[edit] HTMLListElement

interface HTMLListElement : HTMLElement {

attribute DOMString type;
attribute unsigned long start;
attribute boolean sortable;
attribute boolean reorderable;
attribute boolean numbered;
attribute boolean descending;
sortList([Optional] in DOMString nid);

};


[edit] HTMLLIElement

interface HTMLLIElement : HTMLElement {

attribute unsigned long value;

};

[edit] UA Processing

[edit] Presentation Considerations

[edit] Authoring Considerations

[edit] Table Module

The table module is quite similar to the long familiar HTML 4.01 support for tables. However, there is a change in the content model for XML and other serializations (other than tHTML). For those serializations, any table row can also support a table body (tbody). Together with the 'disclosure' attribute this enhanced content model provides for hierarchical outline tables. The table body for each row must have the same number of columns as the parent table and UAs much impute the same columns for non-conforming child table bodies.

[edit] Elements

Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
table Common, summary(string), sortable(boolean), reorderable(boolean) ( caption?, summary?, (col* | colgroup*), thead?, tfoot?, tbody+ ) Structure | Text HTMLTableElement
caption Common ( PCDATA | Text | Structure ) table HTMLElement
summary Common ( PCDATA | Text | Structure ) table HTMLElement
tbody, thead, & tfoot Common, valign, align, char, sortable(boolean), reorderable(boolean) tr+ table | tr HTMLTableSectionElement
colgroup, Common, sortable(boolean), reorderable(boolean), width(Length), span, align, char(token) col* table HTMLTableColElement
col Common, sortable(boolean), reorderable(boolean), width, span, align, char(token) EMPTY table | colgroup HTMLTableColElement
tr Common, sortable(boolean), reorderable(boolean), valign, align, char(token), (th | td)+, tbody? tbody HTMLTableRowElement the tbody must have the same number of columns as its parent table body (the parent table body where each table body is understood as hierarchically arranged in a system of table body ancestors and descendants and not the parent element which is a tr or table element)
td Common, headers, colspan, rowspan, width, align, valign, char Text | Structure tr HTMLTableDataCellElement
th Common, scope, extent, abbr, colspan, rowspan, width, align, valign, char Text | Structure tr HTMLTableHeaderCellElement

[edit] Attributes

Attribue name Type Default DOM Other UA behavior Details
Purely Semantic Attributes
abbr String W
axis CDATA W
summary String W Useful for the tHTML parsed documents which cannot use the SUMMARY element within the Table content model
scope vertical | horizontal | both W Data cell / header cell association algorithm Separates the multiple roles for the 'scope' attribute so that while scope only indicates the direction of header cell association, extent indicates the extent of the header cell association. A local extent indicates the header cell is only associated with the immediately adjacent span of contiguous data cells. The association ends when the next header cell after one or more data cell is reached. The value 'table' indicates the header cell is associated with the entire table such that every header cell in the direction indicated by scope is a data cell associated with this header cell. The value 'group' indicates the header is in effect for all data cells in the same group as the header cell (the group is the tbody, tfoot, or thead for scope='vertical' and the colgroup for scope='horizontal')
extent local | group | table W
headers IDREF W
sortable boolean W Indicates an interactive UA should provide sorting capabilities on the table where the entire table is sorted based on a column or row. When sortable=, the sorting takes place on the column based on a basic string comparison for the locale represented by the lang attribute for the column or row. When a NID is indicated, the sort should take place on only the string included within the descendant element of the cell indicated by the indicated NID.
reorderable boolean W Y allows a drag and drop reordering of a column, row, or cell within a column or row
Column Presentational (while presentational, these attributes have not yet been adequately replaced by CSS and also reflect some varied semantics of the table)
align left | center | right | justify | char W On all table elements (except caption and summary) descending from table to the table cell.
char token W On all table elements (except caption and summary) descending from table to the table cell. The char accepts a string value where the start of the first glyph corresponding to the first character in the string (before any horizontal advance) is the point that should be aligned for each row in the table Whenever the string does not appear in the cell any particular row, that row does not participate in the alignment and the cell in that row's alignment is determined by the UA
width MultiLength W on col and colgroup elements
span Integer 1 W on col and colgroup elements
colspan Integer 1 W (colSpan) on th, and td elements
Row Presentational (while presentational, these attributes have not yet been adequately replaced by CSS and also reflect some varied semantics of the table)
valign top | middle | bottom | baseline W On all table elements (except caption and summary) descending from table to the table cell.
rowspan Integer 1 W (rowSpan) on th, and td elements

[edit] DOM Interface

[edit] Conceptual Background

While an HTML table is made up of a hierarchical tree of table related elements, its dual DOM representation is both as a tree of elements and as a 2-dimensional grid of cells. Cells can occupy one or more rows vertically (where the number of rows occupied is the rowspan of the cell) and can occupy one or more columns horizontally (where the number of columns occupied is the colspan of the cell). Regardless of the colspan or rowspan of a cell it can be said to have a columnStartIndex and a rowStartIndex.

The 2-dimensional relation of the tables cells is not always symmetrical so while an HTMLTableSectionElement can return its TableRowElement, a TableColumnSectionElement can have, but is not required to have, a HTMLTableColElement. However, it will always have one or more HTMLTableColumns. That is a HTMLTableColumn exists regardless of the existence of an element hook (HTMLTableColElement or HTMLTableColSectionElement) used to control their properties.

[edit] HTMLTableElement

interface HTMLTableElement : HTMLElement {

attribute HTMLTableCaptionElement caption;
HTMLElement createCaption();
void deleteCaption();
attribute HTMLTableSectionElement tHead;
HTMLElement createTHead();
void deleteTHead();
attribute HTMLTableSectionElement tFoot;
HTMLElement createTFoot();
void deleteTFoot();
readonly attribute HTMLCollection tBodies;
HTMLElement createTBody();
insertTBody([Optional] in long index);
deleteTBody(in long index);
readonly attribute HTMLCollection rows;
HTMLElement insertRow([Optional] in long index);
void deleteRow(in long index);
readonly attribute HTMLCollection columns;
HTMLTableColumn insertColumn([Optional] in long index);
void deleteColumn(in long index);
readonly attribute HTMLCollection tColGroups;
HTMLElement createTColGroup();
insertTColGroup(in long index, in long span);
deleteTColGroup(in long index);
HTMLElement insertColgroup([Optional] in long index);
void deleteColgroup(in long index);
transposeTable();

};

[edit] HTMLTableColumn

interface HTMLTableColumn : HTMLCollection {

readonly attribute long columnIndex;
readonly attribute long columnSectionColumnIndex;
readonly attribute HTMLCollection cells; // returns an HTMLCollection of all cells whose columnStartIndex is equal to the columns columnIndex
HTMLElement insertCell([Optional] in long index); // inserts a cell at the index indicated
//and moves each cell whose columnStartIndex is equal
//to the columns columnIndex to the next row as the same index
// in the row and with the same span (starting form the last
//row and working back to the row for the inserted cell. If the
// last row requires a moved cell for this column, the last row
//must be duplicated up to the cell element (the contents of
//the TD and TH must not be duplicated, but their attributes
//must be identical for the duplicated row).
void deleteCell(in long index); // when deleting a cell, the subsequent cells must be moved to the prior nth row (wher n = span of the deleted cell) starting from the index + n row until the last n row(s) where an empty cell should be created (with colspan='1' and rowspan='1') in place of the cell moved up n row(s).

};

[does this symmetry work for all tables? what about cells whose columnStartIndex is less than the columns columnIndex, but the cell spans the columnIndex of the HTMLTableColumn?] [Perhaps we need to specify how to build this HTMLTableColumn and to insert it into a table by using a 'br' element or some other placeholder for rows with no table cell in the particular column. In this way the column can be round-tripped from the table and back into the table exactly as it was. Especially when moving columns this needs to precisely spec how to handle cells that span columns (probably in a way analogous to HTMLTableRowElement with missing cells due to rowpan)]

[edit] HTMLTableColElement

interface HTMLTableColElement : HTMLElement {

attribute unsigned long span;
readonly attribute long columnElementIndex;
readonly attribute long colGroupColumnElementIndex;
readonly attribute HTMLColumnCollection columns; // an HTMLCollection of HTMLTableColumn objects equal to the 'span' of the element
long insertColumn([Optional] in long index);
void deleteColumn(in long index);

};

[edit] HTMLTableSectionElement

interface HTMLTableSectionElement : HTMLElement {

readonly attribute HTMLCollection rows;
HTMLElement insertRow([Optional] in long index);
void deleteRow(in long index);

};

[edit] HTMLTableRowElement

interface HTMLTableRowElement : HTMLElement {

readonly attribute long rowIndex;
readonly attribute long sectionRowIndex;
readonly attribute HTMLCollection cells;
HTMLElement insertCell([Optional] in long index);
void deleteCell(in long index);
attribute HTMLTableSectionElement tBody;
HTMLElement createTBody();
void deleteTBody();

};

[edit] HTMLTableCellElement

interface HTMLTableCellElement : HTMLElement {

attribute long colSpan;
attribute long rowSpan;
attribute DOMString headers;
readonly HTMLCollection headerCells; // headerCells according to the data/header cell association algorithm in their canonical order
readonly HTMLCollection rowHeaderCells; // headerCells in their canonical order
readonly HTMLCollection columnHeaderCells; // headerCells in their canonical order
readonly attribute long cellIndex;
readonly attribute long columnStartIndex; // this is a convenience method of the sum of the span values for all preceding cells
readonly attribute long rowStartIndex; // this is a convenience method of the rowIndex of the parent row element

};

[edit] HTMLTableDataCellElement
[edit] HTMLTableHeaderCellElement

interface HTMLTableHeaderCellElement : HTMLTableCellElement {

attribute DOMString scope;
attribute DOMString extent;

};


  • sortable
  • sort([Optional] in DOMString nidref)
  • reorderable
  • column and row moves
  • abbr
  • axis
  • summary
  • summary (for attribute or element)
  • align
  • valign
  • width
  • char
  • dealing with columns, column elements, and column group elements
    • index of the element in the table
    • index of the column element within the group
    • count of columns for table, colgroup, and col
    • columns for table, colgroup and col
    • representation of a column as a HTMLCollection descending interface (possibly with placeholder elements for rows without cells in that column)
    • setting and getting a column (read/write attribute?)
  • moving groups, rows, and columns
  • transpose algorithm

[edit] UA Processing

  • DOM attributes and methods (with 2D symmetry)
  • Sorting Tables
  • Tree tables
  • Relation between HTML and CSS alignment
  • Relation between HTML and CSS widths
  • Descendant selectors and columns
  • HTML attribute inheritance and columns
  • CSS property inheritance and columns
[edit] 2D descendant selectors, property inheritance, and attribute inheritance
           ---> colgroup -> col --    
table <                                         > cell
            ---> rowgroup -> tr  --
  • all inherited HTML attributes and CSS properties are inherited by the colgroup, col, thead, tfoot, and tbody elements
  • the inheritance of interest is the inheritance from colgroup -> (th | td) or col -> (th | td)
  • when applying inheritance of HTML attributes and CSS properties
    • column presentational attributes and properties (e.g, align, char, etc) from col and colgroup override values from tr
    • row presentational attributes and properties (e.g, valign, etc.) from tr override values from col and colgroup
  • cells only inherit attributes and properties from the col, colgroup, or tr where their startIndex is equal to the startIndex of the column or row element

[edit] Presentation Considerations

[edit] Authoring Considerations

  • Layout tables should be avoided
  • Hierarchical tables
  • 2D Tables and vertical / horizontal axis differences
  • Restraint in authoring table summaries
    • summary information should include information available to visual users at a glance, but not easily apparent for non-visual users
    • no redundant information with the caption or with table structure easily discerned from the table markup
  • provide 'abbr' attribute values whenever heading contents are longer than a few words (length of uttered syllables is more important than string length)

[edit] Handler Module

XHTML2 Handler Module

[edit] Stylesheet Module

XHTML2 Stylesheet Module

[edit] XML Events Module

[edit] Global Attribute Modules

[edit] Core Attributes Module

The attributes provide core for elements to uniquely identify elements, classify elements, indicate the relevance of whitespace layout characters, provide human-readable titles, and indicate other core properties of elements.

[edit] Attributes

Attribue name Type Default DOM Other UA behavior Details
id ID | QID W The 'id' and 'xml:id' attribute takes an ID or Qualified ID depending on the state of the 'idns' and 'idns:*'attributes. If either of these attributes has a valid non-null value, the 'id' or 'xml:id' attribute takes a qualified ID (QID). Otherwise it takes a conventional ID. When using a QID, the value should generally be a unprefixed QID for conformance with Namespaces in XML 1.0. Only one of either 'id' or 'xml:id' may be used on any element. Either 'id' or 'xml:id' may be used regardless of the serialization of the document (i.e., xml:id may be used on non-xml serializations and should still be in the xml namespace).
xml:id ID | QID W
nid QNIDRef W a qualified node unique, rather than document unique, identifier (QNIDRef) where uniqueness is required among all siblings. QNIDRef types share the same namespace within a document as QID and ID types, so documents cannot have the same value on an 'id' attribute and on a 'nid' attribute. However the formal meaning of namespace in this sense is slightly different than the namespaces defined by Namespaces in XML. In this sense 1) the actual string values of all 'id' attributes must be unique; and 2) 'nid' attributes are prohibited from carrying a string value that case-insensitively matches the value of any 'id' attribute.
roleURIRep WAccessibility Processing AlgorithmMost HTML elements have an inherent role mapping however, authors may provide additional role information using this attribute. In some cases additional role keywords are required.
class NMTOKENS | QNMTOKENS W The 'class' attribute takes a space separated list of NMTokens or Qualified NMTokens depending on the state of the 'classns' and 'classns:*'attributes. If either of these attributes has a valid non-null value, the 'class' attribute take a space separate list of qualified NMToken (QNMTokens). Otherwise it takes a space separated list of NMTokens. When taking qualified NMTokens, the QNMTokens have the additional constraint that they must not contain a colon (U+003A) except as a delimiter between the NCName namespace prefix and the local NMToken portion.
title token W
layout "relevant" | "irrelevant" "irrelevant" W
declared "true" | "false" "false" W indicates an element should not be displayed in the normal flow of the document but is available as document data for inspection and for subsequent or alternative presentation (such as in the UAs user interface).
contenteditable "true" | "false" "false" W Y
xml:base URIRep INHERITED

[edit] DOM Interface

[edit] UA Processing

[edit] Presentation Considerations

  • contenteditable / visual editing presentation
  • mapping 'layout' attribute values to whitespace handling presentational properties (wrap, nowrap, etc).

[edit] Authoring Considerations

[edit] Disclosure Attribute Module

The attributes includes the 'disclosure' attribute to control the hierarchical display of a document’s content.

[edit] Attributes

Attribue name Type Default DOM Other UA behavior Details
disclosure ( "disclosed" | "replaced" | "undisclosed" | "collapsed" ), (<U+0020>, ( "disclosed" | "replaced" | "undisclosed" | "collapsed" ) )+ W Y Indicates an element should include UI for disclosure / non-disclosure and the UI should cycle through the listed states in their author specified order. The initial state for the document is the first state in the list. The meaning of each state is:
  1. disclosed: the elements contents are displayed normally including the last child element. A disclosure control should be included at the top leading edge of the element indicating the 'disclosed' state;
  2. replaced indicates the element’s last child should not be displayed, normally suh an element will have two elements: a first child legend element and a second (last) child element of any type. All but the last child element should not be displayed (as if the CSS 'display' property for the legend element was set to 'none'). A disclosure control should be included at the top leading edge of the element indicating the 'replaced' state;
  3. undisclosed: indicates all elements should be displayed except for the last child element which should not be displayed (as if CSS rule display: none were in effect for that last child element). Normally only two child elements will be included with the first child 'legend' element displayed. A disclosure control should be included at the top leading edge of the element indicating the 'undisclosed' state;
  4. collapsed: indicates no part of the element's contents are displayed and only the element's border and margin are rendered with no height to the element other than that required to include the required disclosure control which should be included at the top leading edge of the element indicating the 'collapsed' state.

Generally authors should use disclosure states in pairs such as: disclosed/undisclosed (the legend and all but the last element are displayed in both cases), replaced/undisclosed, replaced/collapsed (only the last child element is ever displayed), or disclosed/collapsed. When used on a hierarchical table row ('tr'), all cells in the row occur before the tbody and the tbody is disclosed or undisclosed (NOTE: Hierarchical tables are not available in tHTML parser compatible serialization, but can be added through DOM calls and used for any other parser compatible serializations).

[edit] DOM Interface

[edit] UA Processing

[edit] Presentation Considerations

While this module involves a disclosure control which can cycle through four states (disclosed, replaced, undisclosed, and collapsed), we say nothing here about the presentation of the control other than it should appear in the upper leading (left in left-to-right text) corner of the element frame. The actual appearance of this control should be abstracted from its model states. The default appearance of the control should be based on the implementation environment and conventions. Later CSS and other styling mechanisms may provide authoring hooks for the appearance of this control in its various states.

A CSS property could be specified which largely mirrored this attribute so that a 'disclosure' property would take the keywords: 'disclosed', 'replaced', 'undisclosed', and 'collapsed'. An HTML default implementation stylesheet could then add the following CSS rule to its universal ('*') declaration:

	disclosure: attr(disclosure);

[edit] Authoring Considerations

Since the Disclosure Attribute Module makes the distinction between the last child element of an element and all prior sibling elements of the last child element it is designed primarily for two specific cases in HTML. However authors can easily structure their content to fit this design whenever such a disclosure feature is desirable. The first scenario is to create an element with exactly two child elements. Typically the first element will be a 'legend' element and the second element can be any element the author chooses, for example a 'div' element. In this way the 'legend' element serves as a abbreviated indication of the contents of its only sibling element. The second situation supported by this design is for the presentation of hierarchical tree tables. The rows ('tr') in these tables allow an optional last child 'tbody' element. The row then serves as the 'undisclosed' displayed content while the last child 'tbody' element is available for disclosure (recursively then for the rows in that table body)

[Would it be desirable to provide two different approaches: one for the last child and one for the first child. The first child could then be used with a section -> heading – to 'undisclosed' sections down to just a heading and then disclose the section for details. An fdisclosure atribute could be used for final disclosure while an idisclosure could indicate initial disclosure. Both would indicate the display of a disclosure control in the proper state and then whether the initial element is treated as the element for disclosure or whether the first element is treated as the element for summarization of what is disclosed. For an element such as 'p' the text itself would be displayed in both states but a last child element would be displayed on disclosed and replaced. For the idisclosure attribute the paragraph text itself would not be displayed except in the replaced and disclosed state while an initial child element would be displayed in the undisclosed and undisclosed state]

[edit] I18N Module

XHTML2 I18N Attribute Module

The i18N Module from XHTML includes the xml:lang attribute. For simplicity and backwards and forwards compatibility, HTML supports the same semantic interpretation for either lang or xml:lang attributes, or both if they have the same exact value. When UAs encounter conflicting values between the two attributes, the UA must treat the xml:lang value (the lang value in the xml namespace) as the overriding language declaration.

[edit] Bi-directional Text Attribute Module

XHTML2 Bi-directional Text Attribute Module

[edit] Style Attribute Module

XHTML2 Style Attribute Module

[edit] Hypertext Attributes Module

Attribue name Type Default DOM Other UA behavior Details
href URIRep W
charset Charset W
hreflang LanguageCode hrefLang
hrefmedia MediaQuery hrefMedia
hreftype ContentType hrefType
target Name W including target='_download, to indicate the linked resource should be downloaded on activation and target='_utility' to indicate the resource should be loaded in a utility window and not in a tab (as well as '_parent', '_self', '_blank', or '_top') and any other author-named browsing-context associated with a frame, object, or tab/window.

[edit] Phonetic Attribute Module

The phonetic attributes module provides facilities for authors to provide phonetic pronunciation information for the contents of an element: particularly for the use of neologisms, jargon, abbreviations and homographs. The phonemes can either be treated as graphemes and displayed as corresponding graphemes using a specified font or according to the implementation's user defaults or used with speech synthesizers to provide the necessary phonetic metadata. In addition, even UAs that do not support full speech synthesis—which requires complete substantial localized pronunciation dictionaries—should provide speech synthesis for the phonetic attributes which does not require UAs to compile such specialized locale information and provides users a valuable service for those less familiar with deciphering phonetic graphemes.

[edit] Attributes

Attribue name Type Default DOM Other UA behavior Details
phonemes token W A Token of phonemes that comprise the author specified pronunciation for the entire text contents of an element interpreted by the norms defined by the phonetic system specified in the phoneticsystem attribute (where the default is 'unicode' [or 'ipa' if Unicode rejects phonemes])
phoneticsystem QNameRef INHERITED but either 'unicode' or 'ipa' as the default on the root element W (phoneticSystem) A QName indicating which phonetic system is in use. HTML specified values include 'ipa' for the International Phonetic Alphabet, 'apa' for the American Phonetic Alphabet or 'unicode' for Unicode phonemes (proposed) [if Unicode accepts the introduction of phoneme characters, then this attribute will not really be needed. Instead all phonetic alphabets will be expressed in the phoneme characters with the graphemes left to the font-glyph layer instead of the character layer].

[edit] Unicode Proposed Phonemes

  • Encoding phonemes along with graphemes
  • Phonetic writing systems mapping phonemes to graphemes
  • Mapping graphemes to phonemes for aural rendering through language specific mappings
  • Operating systems and other environments then permit the user to select the preferred phonetic writing system for display of phonemes just as they select the preferred language for the rendering of speech or other functions
Benefits
  • Provides greater support for aural rendering of text
  • Provides greater support for internationalization through modularized support for phonetic writing systems
  • Better support for non-grapheme characters to avoid the awkward attempts to add more graphemes to accomodate non-graphemic character needs
Written CharactersSpoken CharactersDetails
Basic unitgraphemephoneme
Rendered withglyphphone
Collected infontvoiceBoth serve as a container for multiple rendering units and metadata for each unit and by range
Rendered variationsserif, sans serifmale, female
Rich Text Attributesweight, obliqueness, colorprosody, richness, stress
Rendering supporting character propertiesJoining_Type, Joining_Group, Bidi_Classstress, intonation

ISO defines a character as a member of a set of elements used for the organisation, control, or representation of data. While Unicode has been properly focussed initially on grapheme characters and algorithms and properties to abstractly process graphemes, the time has come for Unicode to address phonemes as well as another set of elements [...] for the organisation, control, or representation of data.

Encoded as →
Rendered as ↓
GraphemePhoneme
Grapheme
  • Unicode encoded abstract grapheme with Unicode properties supporting visual rendering combined with
  • a font’s glyph mapping (one-to-many, many-to-one, or many-to-many)
  • Proposed Unicode Phonetic Writing Systems mapping grapheme properties to abstract phonemes combined with
  • a Phonetic Writing System font to map the abstract phoneme characters to glyphs (phonemes which are now augmented with Unicode grapheme properties using the Phonetic Writing System data).
Phoneme
  • language specific:
    grapheme to phoneme mappings (with newly proposed Unicode data)
    word to phonemes mappings and
    phrase to phonemes mappings (CLDR or proprietary)
  • and then the same phoneme rendering for abstract phoneme characters
  • Proposed Unicode phoneme to phone mapping through a voice (one-to-many, many-to-one, or many-to-many)

IPA is by far the leader in Phonetic Writing Systems in terms of International adoption. However it is not the role of Unicode to endorse leaders, but to facilitate the encoding of data as characters.

Typically a Phonetic Writing System may tend to map phoneme clusters to grapheme clusters, but no such mapping is required. A single phoneme character might be mapped to a sequence of graphemes. Alternatively several phonemes comprising a phoneme cluster may be mapped to a single grapheme. For example, [an example with rounded vowel and breathy phonetic modifier/extender mapped one-to-one to graphemes or mapped many-to-one to a single grapheme].

To associate multiple graphemes with the same phoneme or phoneme sequence, the Canonical_Combining_Class property can be used to indicate repeated phoneme code point members in the array are for graphemes which combine to with a base grapheme to produce a single phoneme. In general, the phonemes in Unicode should be finer-grained than most phonetic writing systems, but it is impossible to predict the future of phonetic writing systems.

Phoneme Characters
Blocks
  • Phoneme Consonants
  • Phoneme Vowels
  • Phonemic Prosody
Script
Nonscript (indicating the characters are not intended directly as graphemes, but only become visually renderable as graphemes through a Phonetic Writing System phoneme to grapheme mapping)
General_Category=Ap
A new General_Category Ap for Aural phoneme
New Phoneme Properties
  • Phoneme_Combining_Class (N)
  • Phoneme_Base (B)
  • Phoneme_Extend (B)
  • Phoneme_Cluster_Break (B)
  • Phoneme_Accent (B)
  • Phoneme_Syllable_Break (B)
Phonetic Writing System phoneme to grapheme mapping
A separate compound record for each Phonetic Writing System with the following properties
Identifier (S)
A persistent stable identifier for the phonetic writing system suitable for use in user interface after localization
Direction (E)
indicates either LtR or RtL for the mapped grapheme to support right-to-left phonetic writing systems
Members (Array )
An array with an array member for each grapheme associated with a phoneme code point and each member includes the following properties:
Code_Points (S)
the code point of a phoneme or a sequence of phoneme code points
Alias_Code_Point (S)
a delimited set of code point sequences which also map to the same grapheme (some phonetic writing system may map multiple Unicode phonemes to the same grapheme)
Grapheme_Glyph (S)
indicates a space-separated code point sequence for a suitably mapped grapheme glyph, if one exists (not all graphemes needed for phonetic writing systems will necessarily be suitable for encoding as a grapheme and this data provides the grapheme specific properties needed for text processing associated with visual rendering of graphemes; other text processing properties are provided by the phoneme character properties themselves)
PWS_Grapheme_Name (S)
a name for the grapheme within this Phonetic Writing System
General_Category (E)
of the mapped grapheme which will generally be ‘Lo’
Canonical_Combining_Class (N)
Bidi_Class (E)
of the mapped grapheme
Bidi_Mirrored (B)
in general this should be NO since these are not punctuation and other mirrored glyph
Grapheme_Base (B)
[indicates the phoneme character is treated as a grapheme base in the phonetic writing system]
Grapheme_Extend (B)
[indicates the phoneme character is treated as a grapheme extender in the phonetic writing system]
  • Extender (B)
  • Soft_Dotted (B)
    indicates the grapheme is a dotted grapheme that when combined with another phoneme mapped diacritic grapheme, that diacritic should replace the dot
    Joining_Type Joining_Group (E)
    East_Asian_Width (E)
    Line_Break (E)
    generally phoneme characters will not effect line-breaking since syllables are delineated and word breaks are delineated by the phonemes, but there could be subtle exceptions
    Grapheme_Cluster_Break (E)
    Hangul_Syllable_Type (E)
    Jamo_Short_Name (M)
    Related_Script (S)
    indicated the related script of the mapped grapheme (indicates the relation of to a phonetic writing system to a graphic writing system, potentially null; for example IPA would map many characters to Latin and some to Greek, etc.)

    It is important to acknowledge that this proposal represents a significant departure from past Unicode practice. However, with the BMP nearing capacity and with nearly all of the commonly used languages already encoded, the time has come to consider what other roles characters can serve in computing environments. [perhaps an example of key irrational numbers] which otherwise are impossible to encode in the other native data types. Characters thus become a data type for any important and basic unit of representation. For example a non-grapheme character could be encoded to represent the ratio of the circumference of a perfect circle to its diameter: what we have termed pi π. While the grapheme π is already encoded in Unicode, the geometric ratio we call π is something other than the grapheme conventionally used to represent it. The complaint often arises that if we were to encode every irrational number then the Universal Character Set would quickly fill to capacity. Yet the number of irrational numbers used as basic units of representation in mathematics is actually quite small compared even to the number of superfluous graphemes which have already been encoded ostensibly for the representation of mathematics in character strings. Similarly with the IPA and other phonetic wiring systems. Unicode has tried to shoehorn the needs of computer users to store and manipulate a rather limited repertoire of basic units. For π and other irrational numbers and even rational numbers with infinitely repeating decimal representations, imagine a Double_Double character property which would be a very small supplementary data feil for Unicode that provided the serialized value for the few characters whose numeric value would benefit from a double double representation (and double, and long long, and long). 1/3 and

    Traditionally phonemes are represented by grapheme characters that correspond to the phonetic letters and marks from a phonetic alphabet such as the International Phonetic Alphabet. However, the phonemes represented in many phonetic alphabets are mostly those from a repertoire of long established phonemes from linguistics and related disciplines. Phonetic alphabets do evolve over time, in particular, by occasionally identifying new phonemes, but more often, by reassigning new graphemes to represent the phonemes in a more intuitive mnemonic fashion. Adding new phonemes is not a problem for computer processing of phonemes, but changing grapheme assignments tends to undermine the ability of comparisons of character strings from one vintage to those of another.

    Unicode has until now focussed on the encoding of graphemes only with the occasional exception of a limited set of supplementing formatting characters and another set of characters accommodating legacy control characters. However, to facilitate I18N and accessibility both, it would be better to provide another exception to grapheme encoding by encoding phonemes directly rather than as graphemes assigned to phonemes by another third-party organization.

    HTML4All proposes a block of code points set aside for phoneme rather than grapheme assignment. A block of approximately 300 or fewer code point would likely be sufficient for most recognized phonemes. While rare, the identification and acceptance of new phonemes does occur in linguistic communities and so some block of code points should be allocated for future expansion. Speech synthesizers can make use of these phonemes. For visual display of the phonemes the basic features of fonts standard such as OpenType and AAT allow the assignment of sequences of characters to one or more glyphs so that an IPA font can display the phonemes according to the International Phonetic Alphabet while an APA font would display the phonemes according to the American Phonetic Alphabet. However, by encoding phonemes, Unicode facilitates other localized phonetic systems with mnemonics suitable to the specific locale. Unicode and therefore HTML could then support phonetic systems any alphabet, syllabary, abugida, abjad, featural writing system and using mnemonics from any locale. For example a Spanish international phonetic alphabet might use different mnemonic graphemes to represent phonemes than an English international phonetic alphabet. By encoding phonemes, we facilitate locale, script and language independent document and data encoding of phonetic information that can be presented in locale, script and language specific ways though simple font assignment.

    Ideally, these phonemes would be encoded in Unicode's Basic Multilingual Plane (BMP) however few blocks remain for a code block assignment of this size. The BMP roadmap. Some rearrangement of roadmap blocks might facilitate phonemes, but that may be unduly cumbersome. As an alternative the Supplemental Special Purpose plane might be another suitable place for such assignment. While such an assignment outside the BMP require 4 octets to encode in either UTF-8 or UTF-16, the use of phonemes is likely much less common than the use of other graphemes. In any given document or other data source, the proportion of phonemes to other characters will be typically quite small. The only significant concern with an assignment outside the BMP is with inadequate non-BMP support in some Unicode implementations.

    Shifted responsibilities from character layer to font and rendering layer

    In terms of the usual Unicode character assignment and character properties, this proposal represents a fundamental shift. Therefore it makes sense to encode these characters within Unicode’s Supplementary Special Purpose Plane. That plane has only a few hundred out of 65 thousand code points assigned. A block of 512 code points for phonemes would provide a very forward looking buffer of reserved characters, where about 212 would be assigned in this proposal. The reassignment of responsibilities implies that the phoneme characters assign properties appropriate to characters in general and phoneme characters in particular, but leave more of the grapheme relevant properties to the font and glyph layer. There Phonetic Writing System creators would also create fonts that defined grapheme behaviors corresponding to the Unicode phonemes. In terms of Unicode character properties and algorithms:

    • Collations: The characters could be collated according to their code point assignments and phonetic writing system-specific collations could be defined just as Unicode does by language for other character block/script assignments.
    • Combining: grapheme combining into clusters could be defined in a phoneme specific way. For example, a reduction in code points could be achieved by adding a “breathed” or “breathless” combining character with its own combining canonical class added for that property. However in terms of grapheme combining, these phoneme characters should not be allowed to combine nor serve as base characters for other combining characters (nor be extended by the grapheme extending characters). For phonemes represented by a combining character grapheme, this would instead be represented through a character-to-glyph mapping within a font. For example, the phoneme character PLOSIVE LABIO-DENTAL U+E0??? might be represented in a phonetic writing system by the same grapheme as p̪: LATIN CAPITAL LETTER P U+0070 COMBINING BRIDGE BELOW U+032A. However, this grapheme cluster combination would be produced by mapping the phoneme character to a glyph – within a phonetic writing system font – where the glyph produced was an instance of the same grapheme designated by the phonetic writing system to represent the phoneme.
    • Grapheme Cluster Boundaries: In general grapheme cluster boundaries would be much the same as the current Unicode grapheme cluster boundary algorithm. However, it would be possible a writing system might represent a single phoneme by a sequence of graphemes so that, again, some sort of font introspection would be needed to determine grapheme cluster boundaries in that case. For many implementations, the character itself would be a sufficient marker of a grapheme cluster for any simplified grapheme cluster boundary algorithm .
    • Phoneme Boundaries: Phoneme boundaries would likely be a one-to-one mapping, unless modifier phonemes are added in which case, for example, breathed or breathless combining phoneme would have some sort of phoneme canonical combining class. Accents, syllables and other phonetic characters would need to be differentiated from phoneme characters themselves as a sort of phonemic punctuation.
    • Syllable Boundaries: syllable boundaries could be provided by existing characters such as the HYPHEN-MINUS U+002D or new phoneme syllable boundary characters.
    • Accents: accents could be provided by existing characters or new phonemic accent characters.
    • Other Boundaries: other boundaries would remain much the same as the current algorithm with SPACE U+0020 serving as the boundary between phoneme words (other boundaries such as sentence, paragraph, line-breaking would need no phoneme-specific changes)
    • Ligature Joining: since this too would vary from one phonetic writing system to another, the joining would be controlled at the font layer (such as in AAT fonts), if needed by the phonetic writing system (consider an Arabic mnemonic phonetic writing system where grapheme joining would be important)
    • Direction and bidi algorithm: direction too would be determined by the font, though new font extensions would need to be introduced to support any right-to-left phonetic writing systems. With new font properties, direction could be added to a font by glyph range or character range. Rendering systems would then use direction properties from the font to determine the direction of graphemes for visual rendering of the phoneme characters. Again, consider the example of an Arabic derived mnemonic phonetic writing system. As glyph layout progressed for a line of text, the bidirectional algorithm would be used, but the character direction would come from the font in the case of the phoneme characters rather than the Unicode Character Database. The default would be left-to-right, but for the phoneme character range, fonts would be permitted to override that character property.
    • A phoneme folding property could be added to the existing characters used for the IPA the APA or any other participating phonetic writing systems, to map the existing grapheme characters to their equivalent phoneme character (and then discourage or deprecate the use of the grapheme characters for phoneme use).

    Phoneme character thus provide an improved abstraction for phonetic writing systems: simplifying, easing, and improving support for both internationalization and speech rendering of text. A single set of characters provide all of the character level information needed to handle phonemic text. Visual rendering then is achieved through phonetic writing system-specific fonts. In this way any phonetic writing system can change the phoneme associated grapheme simply by releasing an updated font. New phonetic writing systems can be created also by releasing a font that associates grapheme-inspired glyphs representing each phoneme in a mnemonic manner for the language and locale targeted by the phonetic writing system.

    Within a user’s operating system or other environment, users would select which phonetic writing system they preferred and the font associated with that phonetic writing system would be used for visual rendering of phonemes everywhere those characters required visual rendering.

    Proposed character phonemes

    The assignment of phoneme code points would include:

    type subtype code points
    voiced consonants places of articulation ×
    manner of articulation =
    112 theoretically possible phonemes
    place of articulation 7
    manner of articulation 16
    voiceless consonants places of articulation ×
    manner of articulation =
    112 theoretically possible phonemes
    place of articulation 7
    manner of articulation 16
    rounded vowels places of articulation ×
    manner of articulation =
    35 theoretically possible phonemes
    place of articulation 7
    manner of articulation 5
    unrounded vowels places of articulation ×
    manner of articulation =
    35 theoretically possible phonemes
    place of articulation 7
    manner of articulation 5
    Total (theoretically possible phonemes though only about 212 are recognized today) 294
    [edit] Total block assignment size

    A block of 288 - 320 code points should be more than adequate even for future assignments (with perhaps 82 – 108 in reserve) if allocating within the Supplemental special purpose plane. If trying to encode in the BMP the code points could easily be reduced to 212 or even close to as few as 108 by resorting to combining characters such as voiced or rounded modifier characters.

    [edit] DOM Interface

    [edit] UA Processing

    [edit] Speech synthesis

    For visual UAs such as visual-only browsers, there is no need to process these attributes, however, general purpose HTML engines must parse these attributes and include them in the DOM infoset presented to engine-using applications. For implementations with aural presentation capabilities, these attributes indicate the author preferred pronunciation for the contents of an element or in the case of a definition referenced element the pronunciation of the referring moniker. While CSS also provides a parallel mechanism for pronunciation, the inclusion in HTML allows HTML authors (some of which may never touch a stylesheet) to include the essential pronunciation information for neologisms, homographs, variables, data, and other monikers.

    In resolving words (as defined by Unicode word boundaries) into aural pronunciation UAs should use the following steps:

    1. For HTML documents with author provided pronunciation or HTML documents supplemented by CSS provided prounciations, the UA should use those pronunciations [the default implementation stylesheet should map the HTML attributes to CSS speech presentation so this would participate in the usual CSS cascade]
    2. The UA should include a comprehensive localized dictionary for the language of the user containing a mapping of words to phonemes
      1. For words that have multiple pronunciation variants, UAs may provide a mechanism for users to select the preferred pronunciation and persistently store such selection in the user's defaults
    3. For any remaining words, the UA should rely on heuristics to sound out the pronunciation of a word
      1. In such cases, the UA may provide a mechanism for the user to add a suitable word to phonemes mapping for the missing word
    [edit] Visual
    • Font assignment where the font designed to work with a particular phonetic system by assigning sequences of one or more phoneme characters to a specific glyph according to the phonetic system's phonetic grapheme assignment
    • User defaults system where the user selects the preferred phonetic system and the font fallback mechanism uses that font when another font is not specified for the phoneme characters

    [edit] Presentation Considerations

    [edit] Authoring Considerations

    Authors should only specify phonemes for cases where the author can reasonably expect the word or phrase will not appear in the targeted users’ phonetic dictionary. There may be times when an author’s UA phonetic dictionary is more comprehensive than the some of the targeted users’ dictionaries. In this case the author’s testing will suggest no need for specifying the phonemes attribute when some users will need the phonemes attribute specified. In other cases an author’s environment may lack the phonemes when most users will have such pronunciations available in their own dictionaries so that the author specifies phonemes that are not necessary for most users. In general it is better to err on the side of caution and specify phonemes when in doubt. However, authors should understand that most common dictionary words do not need the 'phonemes' attribute specified.

    For markup that provides a highly reusable mechanism for specifying phonemes, authors should use the Moniker Module which supports author dictionaries for phonetic pronunciation (as well as hyphenation, definition, and abbreviation). The Moniker Module also provides a mechanism to differentiate between various homographs that differ in pronunciation (like the ‘bow’ of a ship and the ‘bow’ made with a ribbon.)

    [edit] XForms Attributes Module

    The XForms attributes permit the binding of HTML and XForms elements to XForms data as well as controlling the input method, incremental keyboard events and including some appearance hints. These attributes can be used on any HTML element to bind the contents of the element to XForms instance data. Authors may also use the 'appearance' attribute on the 'data' element to provide an appearance hint for various data types. For example:

    <data datatype=xsd:date appearance='full'>2010-01-14</data>
    (<data datatype=xsd:date appearance='minimal'>2010-01-14</data>)
    

    might indicate a UA should present the date as “14 January 2010 (14/1/10)” according to user defaults and stylesheet cascade.

    For XForms elements see the XForms module below. For XForms elements authors may use the XForms namespace in namespace aware UAs (both XML or tHMTL namespace aware UAs already exist). [should we also strive to incorporate the XForms elements into the HTML namespace; this will not work for 'input', 'select', and 'textarea' elements with tHTML parsers and considering the default namespace mapping using 'x:select' rather than 'select' for example is not an undue burden for authors (two extra characters 'x:')].

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    ref LocationPath W
    repeat-nodeset LocationPath W
    model IDREF W
    repeat-model IDREF W
    bind IDREF W
    repeat-bind IDREF W
    startindex Integer W
    repeat-startindex Integer W
    number Integer W
    repeat-number Integer W
    incremental xsd:boolean W
    inputmode xsd:string W
    appearance "full" | "compact" | "minimal" | QNameRef "full" W

    [edit] DOM Interface

    [edit] UA Processing

    UA processing follows the XForms Processing Model.

    [edit] Presentation Considerations

    • CSS enhancements needed to provide variations in presentation and styling
      • presentation variations could then be tied to selectors for the 'appearance' attribute

    [edit] Authoring Considerations

    [edit] Keyboard Focus Attributes Module

    These attributes provide authors with fine-grained control over the keyboard focus cycle.


    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    nextfocus IDREF | NIDREFAllows implementation discretion and user default variations, but should be something like next sibling or recursively (the parent element’s next sibling), and the focus should proceed to focusable descendants first in tree-depth-first order.W (nextFocus) FocusAlgorithm Since IDs and QNIDRefs share the same namespace, it is an authoring error for both an 'id' attribute and a 'nid' attribute to share the same value within the same document. In the case of such a name conflict error, UAs must treat the value as a NIDREF (ignoring any 'id' attribute with the same value).
    prevfocus IDREF | NIDREFAllows implementation discretion and user default variations, but should be something like the previous focusable element – applying the reverse order of the focus cycle determined by the focus algorithm. W (prevFocus) Like nextfocus, NIDREF takes precedence over IDREF in the case of authoring errors.
    firstfocus IDREF | NIDREF Allows implementation discretion and user default variations, but should be something like the first focusable ancestor. If the element itself is focusable, focus should advance to its descendants only after the element itself has gained focus.W (firstFocus) It is an error if the value of the attribute is not a reference to a descendant element and in such cases, UAs should treat the value as if it were absent. Like nextfocus, NIDREF takes precedence over IDREF in the case of authoring errors. However, in this case both 'id' and 'nid' attributes on descendant elements take precedence over any 'nid' or 'id' attributes in ancestor elements (which must be ignored by the UA for firstfocus references).
    focusable "true" | "nocycle" | "false" | "auto" "auto" W

    [edit] DOM Interface

    [edit] UA Processing

    • FocusAlgorithm

    Focus Algorithm

    • if the 'focusable' attribute exists on an element and is set to a valid value, the 'focusable' attribute takes precedence over any non-positive 'tabindex' attribute values and UAs must ignore the 'tabindex' attribute (positive tabindex values declare they want to receive focus from another element with the next lowest 'tabindex' value while focus attributes indicate which element will next receive focus, so a positive tabindex value is not ignored even when a 'nextfocus' or 'prevfocus' or any other focus attribute is set; however, a positive tabindex value on an element with a valid 'nextfocus' attribute value will be ignored in determining the next focussed element.)
    • a 'tabindex' value < 0 is equivalent to focusable='false'
    • a 'tabindex' value = 0 is equivalent to focusable='nocycle'
    • HTML5 focus algorithm forms the basis for focus algorithm. The HTML4.1 focus attributes override the HTML5 defined behavior. Moreover, UAs may deviate from the recommended behaviors in the HTML5 algorithm to provide users more fine-grained preferences over focus behavior such as the list of auto focussed elements, focus of elements with specific attribute values, etc.
    • Positive tabindex attribute values alter the auto sequencing of focus attributes according to the HTML5 Sequential Focus Algorithm
    • Any element that has a 'nextfocus', 'prevfocus', or 'firstfocus' attributes with a valid value, that value overrides the HTML5 behavior.

    [edit] Presentation Considerations

    [edit] Authoring Considerations

    • authors should use these attributes as sparingly as possible and instead allow the default UA focus algorithm to handle the focus cycle – only assigning these attributes to alter the default behavior.
    • Authors are discouraged from using the legacy 'tabindex' attribute in favor of using the new focus attribute module attributes.

    Using the focus attributes module allows authors to easily copy and paste content within the same document and get consistent focus behavior (whereas 'tabindex' creates repeated conflicting attribute values)

    [edit] URI Encoding Attributes Module

    For specialized cases, these attributes allow authors to control the mapping of query and fragment identifier parts of a URI into encoded bytes.

    Since a document has one encoding in which all text in the document is encoded and a server may expect query components in a different encoding, these attributes allow authors to control the encoding of query compoments sent to a server. Similarly, the UA may be expected to retrieve another document or direct another separate UA to retrieve a document in another encoding, the 'fragment-charset' attribute likewise allows authors to specify an alternate encoding in those circumstances.

    Existing protocols call for the use of UTF-8 in the other URI components sent over the network. However, these other two components of a URI remain ambiguous for non ASCII characters. When using percent-escapes, the literal byte corresponding to the percent escape should be used (e.g., %E2%80%94 for a Unicode/UCS 'EM DASH' ensures those actual UTF-8 bytes are sent between client UA and server UA).

    [The idea behind these attributes is to allow UAs to push towards UTF-8 as the interchange for all parts of URI encoding, yet still allow a simple opt-out method for authors who need another encoding for legacy processing. Should we have a keyword for the equivalent of the documents encoding? This could be the default without the attribute, but that undermines the goal of pushing towards UTF-8 as the future default. UTF-16 escaped encoding may also be an appropriate option for UTF-16 documents]

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    Defaults match the current behavior of most current UAs. As UAs add this HTML 4.1 conforming features authors will gain some control over the encoding of URIs which currently only work for authors who control the encoding on both the URI prouducer and URI consumer end of any particular resource referenced.
    scheme-charset Encoding INHERITED W Default of utf-8 declared on the root 'html' element by this specification’s schema definition.
    authority-charset Encoding INHERITED W Default of utf-8 declared on the root 'html' element by this specification’s schema definition.
    path-charset Encoding INHERITED W Default of utf-8 declared on the root 'html' element by this specification’s schema definition.
    query-charset Encoding INHERITED W Default of current document encoding character set declared on the root 'html' element by this specification’s schema definition.
    fragment-charset Encoding INHERITED W Default of current document’s encoded character set declared on the root 'html' element by this specification’s schema definition.

    [edit] DOM Interface

    [edit] UA Processing

    The encoding must be used to transcode a URIRep into a URI for transmission. In the case of a fragment identifier, the issue is often moot since the initiating and the receiving UA are typically identical, though there are circumstances when this may not always be the case.

    [edit] Presentation Considerations

    [edit] Authoring Considerations

    [edit] Namespace Declaration Module

    These attributes allow authors to declare the namespaces for elements, attributes and attribute values (for QNameRef/CURIE values). Moreover, by using the 'classns' and 'idns' attributes authors can declare the use of QName participating values for class and id values.

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    classns URIRep Inherited W

    Declares the namespace mapping for 'class' attribute values with no prefix. Without this attribute any class attribute values in scope are presumed to be opaque NMToken data types with no meaning except for the author.

    The 'classns' prefix defines its own namespace which supports attributes of any arbitrary Name an author chooses. All attributes prefixed with the 'classns' prefix define a namespace declaration for NMToken namespaced values for use with the 'class' attribute. Parallel with Namespaces in XML 1.0 terminology these are no longer simply NMTokens for the 'class' attribute, but instead qualified NMTokens, or QNMTokens. It is important to note that the classns prefix itself is a Name as defined in XML 1.0, while the vocabulary defined by the NCName prefix are all NMTokens. In other words the NMTokens defined by the vocabulary do not restrict the starting character the way XML Names do, but the prefix must conform to the NCName production of Namespaces in XML 1.0.

    classns:* URIRep Inherited W Declares the namespace mapping for 'class' attribute values QNames with an NCName prefix (represented by the asterisk "*"). Without this attribute any class attribute values in scope are presumed to be opaque NMToken data types with no meaning except for the author.
    idns URIRep Inherited W

    Declares the namespace for mapping 'id' attribute values with no prefix. Without this attribute any 'id' attribute values in scope are presumed to be opaque unique Name identifier data types (as either ID or NID) with no meaning except for the author. HTML QName values for the 'id' attribute include: "header", "footer", "nav", "main", "sidebar", "lsidebar", and "rsidebar".

    The 'idns' prefix defines its own namespace which supports attributes of any arbitrary Name an author chooses. All attributes prefixed with the 'idns' prefix define a namespace declaration for Name namespaced unique identifier values for use with the 'id' attribute. Regarding Namespaces in XML 1.0 terminology these are no longer IDs as unique XML Name identifiers, but instead unique QName identifiers or QIDS. However these namespaced IDs can also be used as node-unique identifiers or NIDs which, when namespaced, are Qualified Node-unique IDentifiers or QNIDRefs.

    idns:* URIRep Inherited W Declares the namespace mapping for 'id' attribute values with an NCName prefix (represented by the asterisk "*"). Without this attribute any 'id' attribute values in scope are presumed to be opaque unique Name identifier data types (as either ID or NID) with no meaning except for the author.
    Namespace declaration attributes defined elsewhere (in Namespaces for XML and the tHTML parser specification)
    xmlns URIRep Inherited W Declares the default namespace for the document, regardless of parsing and serialization, as defined by Namespaces in XML. Since xmlns is an unprefixed attribute, it must be declared as part of the HTML vocabulary to allow authors to specify a default namespace declaration on the root and descendant elements. HTML declares the default value for this attribute on the root 'html' element, but it is a global attribute whose value is inherited until overridden by a descendant declaration. Many implementations already treat tHTML and XML parsed HTML as having a default namespace declaration of "http://www.w3.org/1999/xhtml" as if it is declared in the document schema definition.

    Note however that some tHTML parsers may only support a fixed value for the xmlns attribute. For those parsers the default namespace may only be 'http://www.w3.org/1999/xhtml' and cannot be overridden by authors. However even for such tHTML parsing, authors are free to specify other namespaces using the 'xmlns:' prefix on other author-coined attributes (i.e., their own prefix declarations but not an un-prefixed declaration).

    xmlns:* URIRep Inherited W Declares the namespace / prefix mapping for the present element and all descendant elements, attributes and attribute values until a descendant elements' xmlns:* attributes override these declarations (unprefixed attributes either inherit their namespace from their parent elements or are in the null namespace depending on implementation; unprefixed attribute values always inherit their parent attributes namespace) Strictly speaking it is unnecessary to include the xmlns and other namespaced attributes and elements in the HTML vocabulary since the mixing of namespaces is more appropriately handled by the namespace specification (XML Namespaces or Namespaces in tHTML). However, the lack of namespace awareness in DTD processing applications and other schema definition processing applications (and not the schema definitions themselves) has led to the misconception that local schema need to allow foreign namespaces rather than the higher protocol handling the mixing of vocabularies.

    [edit] DOM Interface

    [edit] UA Processing

    These attributes do not require any special processing for interactive UAs. However, in general the 'xmlns' and 'xmlns:*' attributes indicate the document is a namespaced document and elements and attributes should be placed in the appropriate namespaces (including the xml prefixed attributes and the xmlns attribute itself) regardless of serialization. However, namespace unaware processing applications should still be able to process the document since even without namespace awareness the NCNames simply become SGML/XML Names that happen to contain colons. [CSS3 may be in danger of undermining this treatment with its change in non-namespace selectors, but appears to be fine for any UA that is unaware of namespaces (including unaware for a specific serialization of HTML). Namespace aware/unaware comparisons may need further research.]

    For non-namespace-aware UAs, the namespace declaration attribute still provide an organizational mechanism for authors and other UAs to extend vocabularies while avoiding name collisions.

    [edit] Presentation Considerations

    These attributes have no special presentational needs.

    [edit] Authoring Considerations

    • perhaps an example using a microformat with CURIE class names (e.g., hcard).

    [edit] Marks Attribute Module

    Attribue name Type Default DOM Other UA behavior Details
    marks "provided" | "needed" "needed" W Used on quotation and narrative dialog elements, where the enumerated value 'needed' indicates that the element does not include quotation marks according to customary styling methods and so the presentation layer should provide such quotation marks as desired. The enumerated value 'provided' indicates the HTML author has included those quotation marks and the presentation layer should not add them.

    [edit] Edit Attributes Module

    Attribue name Type Default DOM Other UA behavior Details
    edit "inserted" | "deleted" | "changed" | "moved" W Indicates the type of edit last made to the element.
    editedby URIRep editedBy A URI that references a person, organization or other entity/resource responsible for the edit last made to the element. [this replaces the 'cite' attribute in XHTML2 which is an overloaded attribute as currently proposed by XHTML2]
    datetime dateTime dateTime (DOMTimeStamp) Indicates a timestamp when the edit was last made to the element.

    [edit] Embedding Attributes Module

    These global attributes allow authors to treat any element as a replaced element for the purpose of embedding resources whether textual or non-textual, or whether referenced or resource embedded. Specialized elements from the Embedding Module provide legacy UA support while also allowing authors to incorporate semantic differentiation among their embedded content. Authors may use features from the Resource Embedding Module to embed resources directly in a document rather than by way of reference.

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    src URIRep W
    srctype ContentType W (srcType)
    encoding Encoding W XHTML2 used this attribute for character encoding, however that fails to clear up confusion and needlessly introduces a new attribute. There may be some need for authors to have control over compression encodings and that makes some sense within these embedding attributes. If that is not needed then this attribute should simply be dropped in favor of the continued 'charset' attribute (which adequately connotes as “character encoding”. An 'encoding' attribute serves pedantically, but does little or nothing to clear up confusion over character set and character set encoding). The 'encoding' attribute should be used to make http and other protocol requests for the specified encoding.
    processas "" | ContentType W (processAs)
    alt W
    description token | ( "URI(", URIRep, ")" ) W

    [edit] DOM Interface

    [edit] UA Processing

    [edit] Presentation Considerations

    [edit] Authoring Considerations

    [edit] Image Map Attributes Module

    This module provides some global attributes to define image maps on embedded images. The module also provides rich image map capabilities to facilitate better accessibility and interactive document behavior

    [edit] Global Attributes

    for map elements with descendant mapped area elements ('map' element for legacy support)
    Attribue name Type Default DOM Other UA behavior Details
    noqueryxsd:boolean W

    When an mapped area is activated, and that area has a valid 'id' or 'nid' attribute, the UAs must append a query component to the URI with the name-value pair mapName=areaName, inserting any '?' or '&' necessary to produce a valid query component for the URI. the mapName is the value of either the 'name' attribute if a valid value is given, or else the value of the nid, id, or xml:id attributes if one of those is given and valid.

    The noquery xsd:boolean attribute, if true, indicates the UA should not append the &mapname=areaname to the request href URI when activating an image map area, but instead to revert to the legacy behavior even when 'area' elements have valid identifiers. This is included to allow an easy fix for authors who simultaneously rely on server-side image maps and implement poor server-side name-value parsing of request URIs.
    [Note: instead of an opt-out noquery, this could be an opt-in addquery boolean depending on the predominance of errant processors]

    for mapped area elements ('area' and 'a' element for legacy support)
    Attribue name Type Default DOM Other UA behavior Details
    shape "default" | "rect" | "circle" | "poly" W
    coords Coordinates W
    nohref boolean W [do we need this?]
    for associating embedding elements with maps
    Attribue name Type Default DOM Other UA behavior Details
    usemap IDREF useMap To associate mapped areas of an embedded image with various behaviors usually only provided for an entire element – for example, displaying the title attribute on hover, firing events, or activating links.
    ismap boolean isMap (boolean) When an embedded element is associated with a server-side image map, this attribute can be used to at least indicate users in non-graphical contexts are missing a part of the user interface.
    selfmap boolean selfMap (boolean) Indicates the current element (for a replaced element) serves as its own image map, signaling descendant elements will define area shapes, coordinates and other map/area properties.

    [edit] DOM Interface

    [edit] HTMLMapElement

    Legacy interface (usable for authoring purposes but not recommended for implementor strategy where interfaces should be included from the global attributes module):

    interface HTMLMapElement : HTMLElement {

    attribute DOMString name;
    readonly attribute HTMLCollection areas;
    readonly attribute HTMLCollection images;

    };

    [edit] HTMLAreaElement

    Legacy interface (usable for authoring purposes but not recommended for implementor strategy where interfaces should be included from the global attributes module):

    interface HTMLAreaElement : HTMLElement {

    attribute DOMString shape;
    attribute DOMString coords;
    attribute DOMString alt;
    attribute DOMString href;
    attribute DOMString target;
    attribute DOMString rel;
    readonly attribute DOMTokenList relList;
    attribute DOMString media;
    attribute DOMString hreflang;
    attribute DOMString type;

    };

    [edit] UA Processing

    • UAs should append &mapname=areaname to the query portion of the image map
    • areas should be focusable for each associated image along with:
      • applied CSS
      • 'area' element 'title' attribute view on hover (if that is the UAs presentation of title attributes)


    [edit] Presentation Considerations

    • CSS box model applied broadly to non-rectangular boxes: not only a rectangle, but also a circle and a polygon

    [edit] Authoring Considerations

    [edit] Metainformation Attributes Module

    [introductory explanation of module]

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    property QNameRef W Serves as the predicate of a triplet RDF statement: subject-predicate-object
    about URIRep W Serves as the subject of a triplet RDF statement: subject-predicate-object. When absent, the RDF statement subject is instead this element's nearest ancestor sectioning element or the entire document if this element is in the document’s 'head' element.
    content CDATA W Serves as the object of a triplet RDF statement: subject-predicate-object. When absent, the RDF object is the contents of the element instead. When the 'datatype' attribute is present along with the 'content' attribute, the value of the 'content' attribute should correspond to the data type indicated by the 'datatype' attribute. Authoring UAs and validating UAs may provide alerts to users informing them of non-conforming 'content' attribute values and providing assistance to ensure conforming attribute values.
    datatype QNameRef dataType Required on 'data' elements
    units QNameRefs W The units attribute provides more detailed metadata about the content of the element when the 'datatype' attribute has a valid and UA recognized value. The units attribute applies to the element through either the value of the 'content' attribute when present, or otherwise the contents of the element itself. The unit QName must be one suitable for the data type expressed in the 'datatype' attribute and can include a calendar QName when the data type expressed in the 'datatype' attribute permits dates in various calendars. Authoring UAs and validating UAs may provide features to ensure such conforming 'units' and 'datatype' attribute values.
    rel LinkRelation QNameRef W Indicates the type of relation from the current element anchor to the destination anchor referenced by the 'href' attribute.
    rev LinkRelation QNameRef W Indicates the type of relation from the destination anchor referenced by the 'href' attribute to the current element anchor.

    [edit] DOM Interface

    [edit] UA Processing

    Interactive UAs may provide an inspector or sidebar to display the data type, units, RDF statement, and other metadata information related to each element to users.

    [edit] Presentation Considerations

    • CSS to provide fine-grained control over the display of structured data in a localizable manner
    • Possibly presentation that allows unit conversions

    [edit] Authoring Considerations

    [edit] Supplemental Modules

    [edit] Notes Module

    The note module provides facilities for authors to structure annotations, notes or subordinate text to the main text. Authors can then use CSS3 capabilities (including CSS3 generated content for paged media) to display subordinate text within a separate frame or iframe or as footnotes, or endnotes. UAs should also provide options to display subordinate text in accessory windows or sidebars.

    The Note Module supports two different modes of markup for subordinate text. One is to include subordinate text within the element it annotates. This is helpful when source markup text is cut or copied and pasted into other documents or otherwise rearranged. In these circumstances, authors can easily maintain the subordinate text along side the main text it supports. Unfortunately, such markup will not necessarily work in legacy UAs that do not parse the element correctly which needs to have its CSS 'display' property set to 'none' in order to at least degrade gracefully in legacy UAs.

    The other mode of annotation provides better support for legacy UAs, by including the annotations in a separate element (for example, at the end of a document) while the 'sref' attribute provides a link from the main text to the subordinate text.

    Both modes should find CSS3 support for footnotes and endnotes. Authors are free to choose the mode that best serves their needs.

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    subtext Common ( PCDATA | Text | Structure )* Text | Structure HTMLSubtextElement Viewport Scroll Synchronization


    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    sref URIRep W a URIRep [or possibly an IDRef] to an element (of any type) containing the subordinate text, annotation or note for the text contents of this element or the immediately preceding text if the attribute is on a void element.

    [edit] DOM Interface

    [edit] UA Processing

    Viewport scroll synchronization of main text and subordinate text

    [edit] Presentation Considerations

    For CSS, authors may want to present subordinate text either as:

    • endnotes (visual or tactile CSS)
    • footnotes (visual, paged CSS)
    • frame notes, where subordinate text appears along side the main text in a frame or iframe (interactive visual); ideally, either CSS or HTML or the UA themselves will provide a mechanism to maintain the scroll-to of the main frame and subordinate text frame in proper alignment.
    • interactive query (aural CSS), where the UA provides some aural queue that subordinate text is available

    [edit] Authoring Considerations

    The capabilities of the note module are largely provided through proper paring of the one element and one attribute for the module. Interactive UAs should provide user interface to view the subordinate text through interaction with the main text. This could be through an inspector accessory window, a sidebar or a view that display on hover over the main text. Also when viewing the subordinate text, interactive UAs should provide a mechanism to view and navigate to the primary text associated with the subordinate text (note that when using the subtextref attribute this involves maintaining a reverse link index or another mechanism to resolve the URI references to a specific remote subordinate text element).

    [edit] Citation and Attribution Module

    Citation and attribution features allow authors to embed citation and attribution metadata within HTML documents. It builds upon the HTML 4.01 features by separating attribution into its own attribute and adding URN resolution, citation grouping (with the 'cite' and 'subcite' attributes), further citation details (also with 'subcite'), attribution and reference list presentation capabilities and other UA norms. By adding either URLs, URNs, or any URI to the 'cite' and 'subcite' attributes authors can provide citation and attribution information within their documents in the simplest manner, yet still take advantage of a rich diversity of citation and attribution presentation mechanisms.

    Authors use attributions to provide a clear URIRep indicating the person, organization or other entity to which the they ascribe the quotation, idea, or work within the element's contents. Authors use the 'cite' and 'subcite' attributes to provide a URIRep indicating the resource where the quotation or idea can be found or for a citation to the work itself. HTML includes processing norms to allow URNRef values within the 'cite' and 'subcite' attribute to be resolved into URLs for retrieval of metadata regarding the resource or an instance or facsimile of the resource itself.

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    cite Common, citetype(QNameRef) ( PCDATA | Text )* Text HTMLCiteElement The 'type' attribute provides a mechanism to categorize citations contained in the cite element, for example as a book, or journalarticle. However, for a machine-readable content, authors should use the specialized 'attributeto', 'cite' and 'subcite' attributes rather than the generic 'content' attribute. The datatype='book' therefore indicates that the contents of the 'cite' element refer to a book and the 'cite' attribute provides a URI that identifies the book while the 'attributeto' attribute may be used to identify the author(s) of the book.

    The Citation and Attribution Module also makes use of the Moniker Module’s 'pn' element for attribution to persons and organizations and the List Module for source / reference lists including the 'type' attribute with the value 'reference' and the definition list’s 'dd' element for annotations of bibliographic references.

    [edit] Attributes

    [edit] Cite Element Attributes
    Attribue name Type Default DOM Other UA behavior Details
    citetype CiteType QNameRef W (citeType)
    [edit] Global Attributes
    Attribue name Type Default DOM Other UA behavior Details
    cite URIRep W
    subcite token | ( "URI(" & URIRep & ")" ) W (subCite) The URIRep contained within "URI()" is interpreted as a relative URI reference with the 'cite' attribute serving as a base URIRep for resolving to an URI. Any other base mechanism is also applied to the URIRep value of the 'cite' attribute.
    attributeto token | ( "URI(" & URIRep & ")" ) W (attributeTo) The URIRep should serve as an identifier for a person, organization or other entity responsible for a quotation, cited work, or idea within the element.

    [edit] DOM Interface

    [edit] HTMLCiteElement

    interface HTMLCiteElement : HTMLElement {

    attribute DOMString type;
    readonly attribute DOMString type;

    };


    [edit] Relevant HTMLElement accessors

    interface HTMLElement :Element {

    attribute DOMString cite; //reflects the cite content attribute
    attribute DOMString subCite; //reflects the subcite content attribute
    attribute DOMString attributeTo; //reflects the attributeto content attribute
    readonly attribute HTMLCreativeWork creativeWork;
    readonly attribute DOMString workTitle;
    readonly attribute HTMLAttributedAuthorities workCreators;
    readonly attribute DOMTimeStamp workPublishingDate;

    };

    [edit] HTMLCreativeWork

    interface HTMLCreativeWork : DOMObject {

    readonly attribute DOMString title;
    readonly attribute HTMLAttributedAuthorities creators;
    readonly attribute DOMTimeStamp publishingDate;

    }

    [edit] HTMLAttributedAuthority

    interface HTMLAttributedAuthority : DOMObject {

    readonly attribute DOMString indexedName;
    readonly attribute DOMString fullname;
    readonly attribute DOMString entityType;

    }

    [edit] HTMLAttributedAuthorities

    interface HTMLAttributedAuthorities : DOMObject {

    readonly attribute unsigned long length;
    [IndexGetter] HTMLAttributedAuthority item(in unsigned long index);

    }

    [edit] Presentation Considerations

    Citations are presented in many ways and we recommend developing mechanisms with CSS to address the most common modes of presentation. These include:

    • parenthetical references
    • footnote or endnote references separate from other notes
    • footnote or endnote references mingled with other notes
    • bibliographic source lists

    However, since the HTML approach involves embedding only the essential information for a citation within the document and relying on presentational mechanisms to provide a more human-readable form for these citations, CSS also requires new functions to extract the salient properties of a cited resource. Moreover, the HTML Data module includes a 'cached' attribute to embed more detailed bibliographic properties within a document for offline support. The cached attribute indicates the citation information is only for offline caching reasons and the element and all its descendants can be safely removed from the document without any loss in meaning (only presentational loss). When the cached attribute indicates a matching URI for the 'cite' or 'subcite' attributes URIRep, the element should contain properties for use in author or user designated presentation of citations.

    [some example markup here]

    [edit] UA Processing

    When the 'cite' or 'subcite' attributes include a URIRep value that is an URN rather than an URL, UAs should provide a mechanism to resolve recognized URNs into:

    • URLs for resource metadata
    • optionally URLs for resource instance or facsimile retrieval

    Interactive UAs, especially general purpose UAs should provide a mechanisms through the user defaults system for users to select alternate systems for URN to URL resolution for either of the two locatable resources (metadata and instance/facsimile). Many users may have no access to instance/facsimile retrieval but metadata retrieval is quite widespread and readily available already.

    [some examples; perhaps wikipedia examples]

    • URN resolution two separate resolutions: 1) URN to URL metadata for a structured metadata representation of the resource 2) URN to URL document for the full text document [many users might only have online access to resource metadata while others might have access to complete fulltext resources.]
    • Presentation layer
      • presentation of citations and attributions from: 1) online metadata, 2) document cached metadata, 3) placeholder
      • presentation of reference lists from: 1) online metadata, 2) document cached metadata, 3) placeholder
    • UI norms to display additional information about resources.

    Citation grouping where all elements with identical 'cite' attributes are grouped together. The 'subcite' attribute can then be used with relative path syntax to provide specific citations within a more general source represented by the 'cite' attribute.

    [edit] Authoring Considerations

    • including cached metadata for presentational purposes
    • using 'cite' and 'subcite' to unite references to the same resource and provide more specific citations
    • attributing quotation, ideas and works to persons, organizations, and other entities which are:
      • notable/famous
      • online colleagues
      • non-famous offline authorities

    Attribution using URIs. Examples of URI sources:

    • A notable creator (for William Shakespeare attributeto='http:​//en.wikipedia.org/wiki/William_Shakespeare'; persistency issues)
    • a colleague (ldap: or mailto: schema URI),
    • my grandmother (document local IDRef)
    • A blogger, a blog commenter (blog site URL)
    [edit] Reference lists

    Using nId(nid)


    <ul sortable='true' type='reference' >
    <li nid='DCC0BDD7-2520-483C-BEDA-34121BFB56F1' >
    <span nid='title' ></span>
    <span nid='creators' ></span>
    <span nid='pubdate' datatype='DateTime' content= ></span>
    </li>
    <li type='reference' nid='4FA78CBD-8EBD-4B35-88EA-20E4609185E9' >
    <span nid='title' ></span>
    <span nid='creators' ></span>
    <span nid='pubdate' datatype='DateTime' content= ></span>
    </li>
    </ul>

    [edit] Data Module

    The data module provides an element to contain specialized data structure such as integers, floating point numbers, dates, etc. The module also includes specialized attributes which authors may use on any element, but which are required on the DATA element. While providing

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    data Common ( PCDATA | Text )* Text HTMLDataElement datatype(QNameRef), content(CDATA) are both required while units(QNames) is recommended. The default units can be implied by the datatype’s QName definition or by the lexical entry of the content(CDATA). Normally the data element contains the same information encoded in the 'content' attribute but presented for the main target audience. Authors may then use CSS, scripts or other presentation methods to localize the presentation of the data canonically represented in the 'content' attribute in place of the element”s contents.

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    cache URIRep W Indicates that the author regards the element and all of its content (including all descendants), as merely a cache of data for the indicated URIRep. This content can be discarded without diminishing the data integrity of the document although the presentational aspects of the document might be detrimentally affected. The cached data might include cached resources for 'attributeto', 'cite', or even 'src', and 'href' attributes. The data can be used to display the information when the user is offline.

    [edit] DOM Interface

    • Conversion of content values to different units
    • Components of content values and accessors returning non-DOMString representations of content values

    [edit] UA Processing

    [edit] Presentation Considerations

    • Presentation of data (numbers, dates, times, etc.)
    • Presentation of alternate styles of data (unit/calendar/era conversion, alternate characters, base conversion, etc.)

    [edit] Authoring Considerations

    [edit] Clipping and Bookmark Module

    Clippings and bookmarks allow authors to mark important points (bookmark) or ranges (clipping) within a document independent of the hierarchy of the document. This can be useful for documents editable by a user or workgroup, or for electronic instances of archival documents where authors might use these features to markup physical pages, lines, important passages, etc. CSS provides complimentary features for these semantics to allow various presentation idioms for these features.

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    m Common EMPTY Any element HTMLElement Used as a bookmark, usually with an 'id', 'nid', or 'clipping' attribute.
    mark Common ( Text )* Text HTMLElement This element provides a way to highlight phrase or text content within a document (see HTML5 mark element).

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    clipping NMTOKENS W

    [edit] DOM Interface

    interface HTMLElement {

    attribute DOMString clipping;
    readonly DOMRange clipRange; // need to refine this, but the idea is to get all the clipping ranges including the element
    readonly DOMCollection allBookmarks;

    };

    interface HTMLDocument {

    readonly DOMCollection allBookmarks;
    readonly DOMCollection allClippings; // need to refine this, but the idea is to get a collection (of some kind of DOMRange interfaces for a document

    };

    [edit] UA Processing

    [edit] Presentation Considerations

    Clippings also complement some corresponding presentational enhancements in CSS3. Clippings might also be used with the CSS 'break-before' or 'break-after' properties to preserve conventional pagination from archival documents: especially since conventional page breaks are typically orthogonal to the hierarchy of a document. However, under normal circumstances the 'm' element receives no visible presentation, but is meant to be used as a void (empty) anchor for URI fragment identifiers or other bookmarking purposes.

    [edit] Authoring Considerations

    Pagination might be handled in this way:

    <m clipping='p221 p222'/>
    <h1>Chapter Heading</h1>
    <p>A paragraph …</p>
    <p>Another paragraph …</p>
    
    <p>A paragraph which breaks <m clipping='p222 p223' />across a page …</p>
    <p>A paragraph …</p>
    <p>Another paragraph …</p>
    
    <p>A paragraph which breaks <m clipping='p223 p224' />across a page …</p>
    
    …
    

    [edit] Alternate Content Module

    The alternate content module provides facilities to embed alternate content within a single HTML document. This provides an alternative to other mechanisms such as HTTPs alternate representation support or using the link element's rel='alternate' attribute-value pair.

    [edit] Elements

    While this module defines no elements itself, it makes use of the XForms elements: switch, case, and toggle.

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    media MediaQuery W (altCollection) Indicates a media query when, if matching the current UA properties and user defaults, this element’s content has relevance.
    altcollection NMTOKENS W (altCollection)
    altdeclare xsd:boolean W (altDeclare)
    contenttype ContentType W (contentType) the default when null is the equivalent of the document's own ContentType
    altlangs LanguageCodes W (altLangs)
    illustratedby URIRep W (illustratedBy)

    [edit] DOM Interface

    [edit] UA Processing

    UAs apply a function to each element participating in the embedded alternates algorithm to return a boolean result. If 'true' the element is displayed normally .If 'false', the element is not displayed as if its CSS 'display' property is set to 'none'.

    • altdeclare must be false
    • ContentType must be displayable meaning the UA or an available plugin has the ability to process and display the content type and the user has indicated a desire to process such content types.
    • at least one language in the altlangs attribute value is the user's current primary browsing language

    If the element meets all of the above criteria:

    • the altcollection attribute is:
      1. either empty;
      2. not empty, but the element is the only element in the document with that altcollection NMToken; or
      3. the element is the first element in document order in at least one of the alternate collections represented by an altcollection NMToken

    Embedded alternate content and document states.

    [edit] Presentation Considerations

    [edit] Authoring Considerations

    [edit] Software Documentation Module

    For technical documents that need to clearly differentiate user input, sample output, and inline and block code. Includes 'samp', 'kbd', 'code', and 'blockcode' elements along with the 'var' element from the Moniker Module if that module is not already included.

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    samp Common ( PCData | Text )* Text HTMLElement
    kbd Common ( PCData | Text )* Text HTMLElement
    code Common ( PCData | Text )* Text HTMLElement
    blockcode Common ( PCData | Text | Heading )* | l+ Structural HTMLElement
    The following elements from the monikers element is also useful for this module [contingently added if the moniker module is missing?].
    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    var Common, monikerType ( PCData | Text )* Text HTMLVarElement
    define Common ( PCData | Structural | Text )* Structural | Text HTMLDefineElement

    [edit] Moniker Module

    The Moniker Module provides elements and attributes allowing for extensive reuse of shared information about names, terms, abbreviations, variables, neologisms, homophones, icons, and other monikers. Using these facilities, authors can provide information about a moniker within the same document or in a separate reusable shared linkable document and then reference the information from any other HTML document. This means authors can introduce monikers once for a document or even an entire social network and then provide a definition, hyphenation information, and phonetic information for every occurrence throughout a site.

    The full form of abbreviations, pronunciation information, hyphenation information, biographical information, background information, and definitions can all be provided in a single moniker referenced set of elements. Then only minimal markup is needed to use the monikers repeatedly throughout a document or even an endless collection of documents referencing the moniker information. This module includes long-standing HTML elements 'dfn', 'var', and 'dd' and introduces several other elements and attributes.

    The use of monikers eases the creation of glossaries, indexes and tables of authorities for printed documents, but also provides interactive exploration of such information for interactive media.

    [edit] New 'rel' attribute keyword

    This module adds a new keyword for the 'rel' attribute: 'monikers'. For example:

    <link rel='monikers' href='monikers.html' />

    The linked monikers.html might include a definition list using the 'string', 'applytoall', monikerphonemes, and other moniker attributes to provide complete hyphenation, definition, biographical, phonetic and other more elaborated information for each moniker. For each moniker a UA would make use of the monikers.html data to present augment its built-in hyphenation dictionaries and pronunciation dictionaries. The monikers.html document would also provide information that UAs present to users directly such as the definition of a term, the full form for an abbreviation, or the IPA phonetic expression for a term.

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    Definition referencing elements
    abbr Common, full(IDRef), variantof(Token), define(Token), monikerfor(IDRef), pronounceas(Enumerated) ( PCDATA | Text )* Text HTMLAbbrElement Moniker matching algorithm
    iabbr Common, full(IDRef), variantof(Token), define(Token), monikerfor(IDRef), pronounceas(Enumerated) ( PCDATA | Text )* Text HTMLAbbrElement
    icon Common, full(IDRef), variantof(Token), define(Token), monikerfor(IDRef), pronounceas(Enumerated) ( PCDATA | Text )* Text HTMLIconElement The 'icon' element serves as a moniker that is typically replaced with a graphic or icon through presentational methods.
    var Common, variantof(Token), define(Token), monikerfor(IDRef) ( PCDATA | Text )* Text HTMLVarElement
    t Common, variantof(Token), define(Token), monikerfor(IDRef) ( PCDATA | Text )* Text HTMLTermElement
    hom Common, variantof(Token), define(Token), monikerfor(IDRef) ( PCDATA | Text )* Text HTMLHomophoneElement
    pn Common, variantof(Token), define(Token), monikerfor(IDRef) ( PCDATA | Text )* Text HTMLPNounElement
    Definition referencing and definition referenced element
    dfn Common, variantof(Token), define(Token), monikerfor(IDRef), string(Token), applytoall(Boolean), casesensitive(Boolean) ( PCDATA | Text )* Text HTMLDfnElement Moniker matching algorithm
    Definition referenced elements
    define Common, string(Token), applytoall(Boolean), casesensitive(Boolean) ( PCDATA | Text )* Structural | Text HTMLDefineElement Moniker matching algorithm
    dd Common, string(Token), applytoall(Boolean), casesensitive(Boolean) ( PCDATA | Structural | Text )* dl | li HTMLDefineElement The string attribute should provide the precise string of the moniker being defined or described while the associated peer dt element can contain either the precise string or other moniker related information such as variants, hyphenation, and phonetic information (though these may also be provided through CSS generated content capabilities).

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior ReferencingReferenced Details
    Self-Describing Referencing Monikers
    For self-describing monikers, authors may use a soft (&shy; &#x00AD;) hyphen within the elements contents to define hyphenation for a moniker that is a neologism or otherwise non-dictionary moniker.
    define token W Y a plain text single paragraph definition of the contents of the element
    fullform token W Y The fullform attribute value is a token that is used to express the full form corresponding to the abbreviated form which are an element's contents. Authors may use a soft (&shy; &#x00AD;) hyphen within the token to define hyphenation for a full form that is a neologism or otherwise non-dictionary moniker.
    Matching (from referencing to referenced)
    abbrfor IDRef W Y a reference to an element containing the full expanded form counterpart to the abbreviation contained in this element
    monikerfor IDRef W a reference to an element containing the full definition or other descriptive information of the moniker contained in this element
    Matching (from referenced to referencing; provides the greatest reuse with the minimum of markup especially when using a link element to link to a reusable moniker document)
    variantof token W Y Used to match the moniker when the moniker used is a variant of the token used for matching
    string token W Y The string attribute value is a token that is used to match the contents of an element to this elements moniker defining attributes and contents. Authors may use a soft (&shy;) hyphen within the token to define hyphenation for a neologism or otherwise non-dictionary moniker.
    applytoall boolean W Y Indicates the moniker referenced information for this element should be matched to any matching token even those not contained within an element. This is particularly useful for providing distinguishing pronunciation information for homographs
    casesensitive boolean W Y indicates the monikers should be matched case sensitively as would be the case of a variable in many programming environments
    Categorization
    type QName W Y Indicates the type of moniker, variable, abbreviated form, icon, or term. Each kind of moniker has its own associated QNames. For example, for abbreviated forms the types are: 'abbreviation', 'initialism', 'camelcase', 'alphanumeric', and 'acronym'.
    Pronunciation
    pronounceas characters | word | full W Y for abbreviated forms to provide the authors recommended pronunciation of the abbreviated form or in the case of 'full' to pronounce the full expanded counterpart to the abbreviated form.
    monikerphonemes token W Y provides the phonemes for pronouncing the moniker instances referencing this moniker referenced element
    phoneticsystem token W Y As defined in the Phonetic Attributes Module

    [edit] Moniker Matching Algorithm

    Definition: A moniker referencing element is any element with the following names within HTML’s namespace: 'abbr', 'iabbr', 'hom', 'acronym', 'icon', 't', 'pn', 'var', or 'dfn'

    Definition: A moniker referenced element is any element with the following names within HTML’s namespace: 'dfn', 'define', or 'dd' or any element whose ID matches the IDRef value of a moniker referencing element's 'abbrfor' or 'monikerfor' attributes ['abbrfor' is meant to replace XHTML2s proposed 'full' attribute in a manner more consistent with the moniker module, but it may not be necessary either since an element will not typically be both an abbreviation for one thing and a moniker for something else ].

    Note that while 'dt' is not included among the list of moniker referencing elements, it is closely associated with a companion moniker referenced 'dd' element. Note also that the 'dfn' element can serve as either a moniker referencing element or a moniker referenced element or both simultaneously as one way to provide a chained variant, referencing a moniker variant providing phonetic information which in turn references another moniker referenced element providing phonetic information for the canonical form of the moniker and even a complete definition.

    Moniker matching occurs not only for the contents of any of the moniker referencing elements, but also for any word (as defined by Unicode Word_Boundaries) within a document. In interactive UAs, user should be able to interact with a Unicode word or marked up moniker – e.g., through hover, or selection – to view the details about the moniker provided by the moniker referenced element.

    By permitting authors to omit markup for Unicode words, HTML greatly simplifies the authoring process for creating moniker referencing and also facilitates the simplified authoring of homograph monikers where one homograph in a set of homographs can be left without markup while the other homographs in the set each get included as the contents of 'hom' element.

    Locating moniker properties: type, hyphenation, pronunciation, definition.

    If the moniker element includes a type attribute with a valid QName, that attribute determines the monikers type. Otherwise the matching moniker referenced element determines the monikers type.
    If the moniker element includes the 'phonemes' attribute and it has a valid value, use that value to provide phoneme information (either visually, aurally or otherwise). Otherwise the matching moniker referenced element's 'monphonemes' attribute determines the monikers phonetic pronunciation. For abbreviations (iabbr, acronym and abbr), UAs must also use the 'pronounceas' attribute to determine whether to use the full form or the abbreviated form for pronunciation and as a pronunciation fallback when phonemes are unavailable to pronounce whether to prounounce as characters or as a single word.
    If the moniker element or Unicode word includes soft-hyphen (&shy; U+00AD) use those characters to determine hyphenation line breaks. Otherwise the matching moniker referenced element's 'string' attribute determines the monikers hyphenation.
    If the moniker element includes a 'define' attribute with a valid value, that value determines the definition for the moniker. Otherwise the matching moniker referenced element's associated definition determines the monikers definition (note that this can be a 'dfn' element's chained moniker referenced element's definition element contents where, for example, the following chain of moniker elements occurs: 't' => 'dfn', as a variant => 'define').

    For abbreviated forms marked up within 'abbr', iabbr', 'acronym' elements, the full form corresponding to the abbreviated form is provided by:

    the element's 'fullform' attribute value if that is a valid value
    the element whose ID matches the IDREF value of the 'full' attribute, if available,
    otherwise the moniker referenced element's associated full form
    otherwise the value of the element's 'title' attribute if that has a valid value

    Note that for legacy UAs, the value of the 'title' attribute will always be used instead.

    Moniker referenced elements are the 'dfn', 'dd', and 'define' from within the document or any linked document whose 'rel' attribute's value is 'glossary' or 'expansions' or any element within the current document whose ID value matches the IDRef of a moniker referencing element's 'full' or 'monikerfor' attributes. Once the linked documents are retrieved (those linked by the documents 'link' elements whose 'rel' attribute contains either 'glossary' or 'expansions'), moniker referencing elements are matched to moniker referenced elements according to the following algorithm (or any algorithm producing identical results).

    Matching monikers (either the contents of moniker referencing elements or Unicode words) to moniker referenced elements:

      • 'monikerfor' IDRef
    the moniker referenced element is the element whose ID matches the 'monikerfor' IDREF
    • otherwise starting from either of the following Unicode strings dubbed STRING:
      • the 'variantof' attribute value on a moniker referencing element when the value is a valid value,
      • the contents of the moniker referencing element or
      • any Unicode word within the document
    when searching from the current element in reverse document order match the first moniker referenced element - from within the document or any or the linked moniker document - whose 'string' attribute's value equals STRING using Unicode complex case insensitivity
    comparisons must be disregarded as a match when the STRING is not the sole contents of a moniker referencing element and the moniker referenced element's boolean 'applytoall' attribute is 'false'; and
    comparisons must be disregarded as a match when the candidate matching moniker referenced element's boolean 'casesensitive' attribute is 'true' and STRING does not match the 'string' attribute value in a case-sensitive comparison
    if the beginning of the document is reached and the complete linked documents also yield no moniker referenced element match, the moniker has no match and any moniker properties are drawn from the element itself or other supporting protocols (such as an implementation dictionary)

    String comparisons must be done in a diacritic sensitive manner and without performing any Unicode normalization on the strings. Therefore, authors must ensure that strings intended to match share the same diacritics and the same character composition for those diacritics and other composed characters.

    [edit] DOM Interface

    [edit] Presentation Consideration

    UAs should not provide distinguishing styling to the moniker elements (beyond any differentiating styling already provided for the 'var' and 'dfn' elements). Authors may elect to style particular monikers in distinguishing ways, for example marking all or some specific classes of 'dfn' elements as bold to indicate the introduction of new terms.

    A separate HTML4All proposal will document newly proposed CSS features to present monikers in different media independent and media specific ways. An HTML4All community authored XSLT might also be useful for presenting monikers within documents. In particular moniker presentation features might include:

    • generation of lists from monikers such as:
      • glossary
      • abbreviation expansion (full form) glossary
      • index of terms
      • index of authorities
    • styling of generated lists as pseudo-elements with sub-pseudo-elements
      • indexed item (the entire record of an index or glossary)
      • indexed key (the moniker whether in a glossary or an index)
      • indexed values (the list of associated pointers
      • indexed value (the individual value such as the hypertext reference or page number for the indexed key or the specific definition for the reference)
    • A CSS function to extract the phonemes for an element whether from the 'monikerphonemes' of a matching moniker referenced element or the 'phonemes' attribute of the element itself. This might be used to visually display the phonemes in the users environmental default phoneme glyphs or to facilitate proper HTML pronunciation through CSS implementation.
    • An XSLT (including an implementation provided XSLT) for interactive media might provide a transform which alters each moniker by adding a unique ID (when none already exists) and creating a list item of each unique moniker containing a list links to each moniker use (e.g., using numbered section headings) within the document.

    [edit] Authoring

    [edit] When to use monikers

    Whenever an author can reasonably expect a user's user agent to provide the necessary moniker supporting metadata, then there is no need to use monikers. For example, common words, dictionary words, In general the use of a dictionary word should not require the use of a moniker. For such dictionary words, the UA should have easy access to hyphenation, definition, pronunciation and type information. The one exception to this rule is when using monikers to differentiate identical homographs that have different pronunciations (or any other moniker information), such as “lead” or “wind”. For homographs, authors can use a single moniker referenced element to provide pronunciation information for the most common homograph of a set of identical homographs and for instances of others in the set, include them within a moniker referencing element (such as a 'hom' element).

    Whenever an author uses a term in different manner than the dictionary definition or coins a neologism, the author should use a moniker referencing element and include the definition within a moniker referenced element. Similarly, authors using variables should use moniker referencing to provide users easy access to variable specifications. Using the 'var' element, authors should also provide pronunciation information to enable speech synthesizers to properly pronounce the variable name. Without this information, the speech synthesizer might aurally render variable names in an awkward or even incomprehensible manner.

    [edit] When to use moniker referencing elements

    Since moniker referencing works with Unicode words even without markup, authors can take advantage of monikers simply by using the moniker referenced elements ('define', 'dfn', and 'dd'). This means authors can use the moniker referenced elements now and they will degrade gracefully in legacy browsers (some work around may be needed for 'define', but for the others these can be used as authors already use them and simply add the moniker attributes and users enjoy moniker features in updated browsers).

    However, there are situations when authors should use the moniker referencing elements ('var', 't', 'pn', 'hom', 'icon', 'abbr', 'iabbr', and 'dfn'). For example, enclosing monikers in these elements. For example if a moniker is a phrase containing word breaks, enclosing it within markup ensures browsers and other UAs match the phrase to the moniker referenced properties correctly.

    Secondly, wrapping monikers within moniker referencing elements permits the use of the 'title' attribute for support for legacy browsers that do not yet support general monikers. For example, an author can use the 'var' element to wrap all monikers for legacy support and include the 'title' attribute with defining information.

    The use of a moniker referencing element allows authors to apply merdia specific CSS styling to monikers. For example the 'icon' element is intended for authors to provide CSS replacement generated content where the element is replaced by an iconic image that expresses the abbreviated contents of the element and the elaborated details of the matching moniker referenced element.

    Likewise, with the 'dfn' element, authors can apply bold or colored styling or change the speech stress for the initial introduction of neologisms, jargon and other technical terms. Wrapping all terms within the 't', 'pn', or 'var' element allows easy extraction by XSLT, CSS, or DOM to produce glossaries, term or authority indexes.

    [edit] Other authoring issues
    • re-specifying a moniker within the same document (the most recent moniker referenced specification prevails)
    • using type attribute
    • providing hyphenation information
    • providing phonetic information
    • providing definitions, details and identifying and background information for 'pn' monikers
    • including moniker variants
    • using variables and variable definitions within a document


    [edit] Ruby Module

    [edit] XForms Module

    The XForms module is defined by the W3C XForms recommendations. However HTML 4.1 also adds an 'xinput' element that is not a void self-closing element. This 'xinput' element behaves the same as the XForms 'input' element in every way except for the improved parsing of the 'xinput' element. The XForms attributes are global and so can be added to any element. The 'xinput' element like the XForms and HTML 'input' elements serves as an element accepting plain text user-editable content, is generally presented with a border to indicate it is a editable text field and allows no line breaks. It may be advisable to add something to CSS3 to clearly match the presentational aspects of any element to the 'input' elements presentational characteristics.

    [edit] XFrames Draft

    XFrames Module

    XFrames and legacy HTML Frames have much in common. XFrames attempts to address some of the early criticisms of the frames idiom. However many of those criticisms were largely implementation deficiencies and not criticisms related directly to the document vocabulary of frames themselves. By far the most important advance XFrames makes over legacy frames it he introduction of a new URL fragment approach to enable the referencing of a specific state of an entire frames arrangement. This new URI syntax allows the bookmarking of a precisely saved frame document state which could work equally well for XFrames or legacy HTML frames.

    http://example.org/home.xframes#frames(id1=uri1,id2=uri2,...).
    

    Aside from this URI syntax, there are very few differences between XFrames and HTML Frames and really no capability enhancements.

    MechanismHTML Frames XFrames Recommendation
    host documentHosted in HTML in place of 'body' elementSelf-Hosting in a 'frames' root element with its own 'head', 'title', and 'style' elements or presumably hosted in a compound HTML hosting document with the XFrames default prefix namespace declaration (e.g., "f:frame" and "f:group")n/a
    structural element'frameset' element'group' elementMake these synonymous element names
    structural attribute(s)'rows' and 'cols' attributes'compose' attribute ('vertical' | 'horizontal' | 'single' | 'free') and child element count which also supports potential tab-like interface or floating window-like frames through a declarative mechanismadopt XFrames attributes as override for HTML 'cols' and 'rows' attributes
    browsing context identifiernamexml:idspecify priority and allow either attribute
    alternate/fallback content'noframes' elementno dedicated alternate content mechanism though DOM mutation could be used or the Alternative Content Modulenone
    accessibility enhancements'longdesc' attributeNone, though possibly considered unnecessary by the XFrames editors because the frame embedded content is textual in nature or can provide its own text equivalents. ARIA could also provide this functionality now. use ARIA
    presentational features or hints'frameborder'relies on CSS for presentational mechanisms (CSS3 resize property included)Use CSS, though some of these attributes might also be useful in markup as hints for default CSS styling.
    'marginwidth' attribute
    'marginheight' attribute
    'noresize' attribute
    'scrolling'

    XFrames sought to address some criticisms of HTML Frames. In particular:

    • The [back] button works unintuitively in many cases. [largely a problem that has been addressed in the latest UAs and requires no changes to the specification structurally]
    • You cannot bookmark a collection of documents in a frameset, or send someone a reference to the collection. [addressed through a XFrames newly introduced URI syntax]
    • If you do a [reload], the result may be different to what you had. [largely a problem that has been addressed in the latest UAs and requires no changes to the specification structurally]
    • [page up] and [page down] are often hard to do. [true in any multiple scroll-view application and not really a short-coming of a capability to declarative create such applications over the web]
    • You can get trapped in a frameset. [due to authoring or implementation errors]
    • Searching finds HTML pages, not Framed pages, so search results usually give you pages without the navigation context that they were intended to be in. [largely a problem that has been addressed in the latest UAs and requires no changes to the specification structurally]
    • Since you can't content negotiate, noframes markup is necessary for user agents that don't support frames. However, almost no one produces noframes content, and so it ruins Web searches, since search engines are examples of user agents that do not support frames. [largely a problem that has been addressed in the latest UAs and requires no changes to the specification structurally]
    • There are security problems caused by the fact that it is not visible to the user when different frames come from different sources. [this is an issue that could be addressed through UA norms and not necessarily a problem with the frames declarative structure]

    So of those issues little needs to change in the syntax and semantics other than the XFrames introduced URI enhancements. Accompanying this change, it might be helpful to add new UA and possibly authoring norms to improve the frames authoring and implementation status.

    [edit] Auxiliary Modules

    [edit] Editing Elements Module

    Includes the elements: 'ins', 'del' and 'moved'

    This module also includes the newly introduced 'redact' element.

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    ins Common (especially the editing attributes) ( PCDATA | Flow )* anyELement HTMLELement
    del Common (especially the editing attributes) ( PCDATA | Flow )* anyELement HTMLELement
    moved Common (especially the editing attributes) ( PCDATA | Flow )* anyELement HTMLELement
    redact Common, redactedby, block, chars, cols, lines VOID anyELement HTMLELement Indicates a portion of the document is missing due to redaction. In other words an earlier version or a different version of the document exists with the content missing from this instance. UAs, authors, or users may use CSS generated content to generate a representative or garbled set of glyphs to replace the redacted content.

    [edit] Attributes

    Attribue name Type Default DOM Other UA behavior Details
    'redact' element attributes
    redactedbyURIRep W (redactedBy) indicates a URI representing the principal responsible for redaction (such as 'ldap:www.example.com/principals/eliott_ness' or mailto:eliott_ness@example.com)
    blockxsd:boolean W Indicates the redaction replaces a block of characters
    charsInteger W Indicates an approximate count of characters or grapheme clusters that have been redacted. If 'chars' alone is set it determines the size of the redact element in the absence of overriding CSS declarations.
    colsInteger W If the value of both cols and lines is a valid integer, these values are used to provide dimensions for the redact element, unless overridden by CSS. The cols provides a width in CSS em units.
    linesInteger1 W For inline redaction, authors may provide the count of characters (grapheme clusters and other baseline advancing characters) redacted. If chars equals null, then implementation stylesheets should provide some default width for redact elements (e.g., something like ‘width: 12em;’). If the value of both cols and lines is a valid integer, these values are used to provide dimensions for the redact element, unless overridden by author or user CSS.
    If the 'block' attribute is true, then . If chars and not cols has a valid integer, then the height (H) is: (chars / the element’s CSS em width ) × 2.5 = H CSS ex units; and the width (W) is determined by CSS inheritance
    If cols and lines are both valid integers, then the height (H) is: lines × 2.5 = H CSS ex units and width (W) is cols = W CSS em units

    [edit] Embedding Elements Module

    Since HTML4.1 moves toward globalized embedding attributes, the need to use dedicated embedding elements is made optional. Any element in HTML 4.1 can embed image, video, or audio content. The contents of such elements serve as the text equivalent fallback for the content. In other more complex situations or for legacy reasons, authors may want to make use of the 'object', 'iframe', 'img', 'audio', 'video' and 'canvas' elements.

    Embedding Module

    [edit] Image Map Elements Module

    Using image map elements for legacy implementation support instead of using the global image map attributes on any element. Image/Area Map elements also allow for the reuse of the same area maps across different elements with the 'usemap' attribute.

    This module provides some legacy elements to define image maps on embedded images for support of legacy UAs that do not yet support the global image map attributes.

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    map Common, name(NMTOKEN) area+ | ( Text )* Structural | Text HTMLMapElement
    area Common, name(NMTOKEN) EMPTY 'map' | Structural | Text HTMLAreaElement

    [edit] DOM Interface

    [edit] HTMLMapElement

    Legacy interface (usable for authoring purposes but not recommended for implementor strategy where interfaces should be included from the global attributes module):

    interface HTMLMapElement : HTMLElement {

    attribute DOMString name;
    readonly attribute HTMLCollection areas;
    readonly attribute HTMLCollection images;

    };

    [edit] HTMLAreaElement

    Legacy interface (usable for authoring purposes but not recommended for implementor strategy where interfaces should be included from the global attributes module):

    interface HTMLAreaElement : HTMLElement {

    attribute DOMString shape;
    attribute DOMString coords;
    attribute DOMString alt;
    attribute DOMString href;
    attribute DOMString target;
    attribute DOMString ping;
    attribute DOMString rel;
    readonly attribute DOMTokenList relList;
    attribute DOMString media;
    attribute DOMString hreflang;
    attribute DOMString type;

    };


    [edit] UA Processing

    • UAs should append &mapname=areaname to the query portion of the image map
    • areas should be focusable for each associated image along with:
      • applied CSS
      • 'area' element 'title' attribute view on hover (if that is the UAs presentation of title attributes)


    [edit] Presentation Considerations

    • Introduce a CSS3 Area Map Module: where CSS box model and other properties are applied broadly to non-rectangular boxes. In other words applied not only a rectangle, but also to circles, polygons, and other non-rectangular shape generating elements.

    [edit] Authoring Considerations

    [edit] Presentational Module

    This module includes elements used for OCR to HTML and other machine creation of HTML content or for HTML legacy and other presentational formats to HTML conversion.

    [edit] Elements

    Element name Attributes Content Model Context Model DOM Interface Specialized UA Behavior Details
    strong Common ( PCDATA | Text )* Text HTMLElement Strong is not technically a presentationally related keyword, however the motivation for including strong within the HTML vocabulary is largely a device dependent and presentational one. With low resolution output devices, it is more difficult to visually distinguish italics from plain text than it is to distinguish bold text from plain text. Therefore, though there is no common convention in typography to distinguish emphasized text from strongly emphasized text, the 'strong' element was introduced more for emphasized phrases in documents intended for low resolution devices. With the introduction of CSS media queries, authors are encouraged to use the 'em' element for emphasis and use CSS media queries to make the presentation of emphasized text either bold, italic or otherwise according to the pixel resolution of a UA in a device independent fashion.
    b Common ( PCDATA | Text )* Text HTMLElement
    i Common ( PCDATA | Text )* Text HTMLElement
    strike Common ( PCDATA | Text )* Text HTMLElement
    u Common ( PCDATA | Text )* Text HTMLElement
    sub Common ( PCDATA | Text )* Text HTMLElement
    sup Common ( PCDATA | Text )* Text HTMLElement

    [edit] UA Processing

    [edit] Presentation Considerations

    • Implementations must present these elements in the prescribed manner.
    • Authors should not change the presentation of these presentational elements since their semantics is their presentation. For example, 'b' means bold so to change it to something else will lead to confusion for subsequent authors and users.

    [edit] Authoring Considerations

    [edit] Legacy Frames Module

    Use of frames in the HTML namespace. [need to test using XFrames for the application/xhtml+xml content-type on common browsers].

    See also the section above on XFrames and the comparison between XFrames and HTML frames.

    [edit] Legacy Forms Module

    Using the legacy user interface elements without the XForms attributes and XForms processing model.

    [edit] Legacy Keyboard Focus Attributes Module

    The 'tabindex' and 'accesskey' attributes. Implementation must conform to the HTML 4.1 focus algorithm. Authors are however discouraged from using 'tabindex' and 'accesskey' except when targeting UAs that do not fully support HTML 4.1. See the Keyboard Focus Attributes Module for more details.

    [edit] Specific User-agent Guides

    [edit] Conformance Checkers (and Validators)

    [edit] Rendering Engines

    [edit] Interactive (Browser) User-agents

    [edit] Assistive User-agents

    [edit] Visual Editors

    [edit] Source Editors

    [edit] Search Agents

    [edit] Email Readers

    [edit] Email Composers

    [edit] Summary Lists

    [edit] List of Elements

    [edit] List of Attributes

    [edit] List of Events

    [edit] List of DOM Methods and Attributes

    [edit] List of Interactive UA Algorithms

    |

    [edit] Schema Definitions

    [edit] HTML DTD

    [edit] HTML XSD Schema

    [edit] HTML RelaxNG Schema

    [edit] Notes

    1. Some properties in the aural stylesheets from CSS2 are not necessarily speech related, but are more broadly aural properties. Implementation are not required to implement these non-speech related aural properties, and should consider the CSS3 speech module in differentiating between speech-related properties and other aural properties.

    [edit] Key

    [edit] DOM

    W = a writable (read/write as opposed to readonly) DOM attribute which takes or returns a DOMString and whose name is following in parentheses (unless identical to the name of the element attribute)
    R = content attribute participates in a readonly DOM attribute (see the details column for more details)
    = content attribute participates in other DOM accessor methods (see the details column for more details)

    [edit] Other UA behavior

    The "Other UA behavior" columm indicates that full support for this attribute requires capabilities beyond DOM and standard CSS capabilities.

    Y = This attribute requires special behavior beyond DOM and CSS implementation (for example the 'href' attribute requires the implementation of link activation not included in DOM or CSS)";"utf-8"
    Views