Technical
Site

Related Links

Download

FHISO Citation Elements: Bindings for RDFa

This is an exploratory draft of a standard documenting the proposed usage of the FHISO Citation Elements standard in RDFa. This document is not an FHISO standard and is not endorsed by the FHISO membership. It may be updated, replaced or obsoleted by other documents at any time.

In particular, some examples in this draft use citation elements that are not even included in the draft Citation Element Vocabulary. These elements are very likely to be changed as the vocabulary progresses.

FHISO's suite of Citation Elements standard provides an extensible framework and vocabulary for encoding all the data about a genealogical source that might reasonably be included in a formatted citation to that source.

This information is represented as a sequence of citation elements, logically self-contained pieces of information about a source. This document defines a means by which citation elements may be identified and tagged within an XML or HTML formatted citation, allowing a computer to extracted them in a systematic manner. The tagging of citation elements is done using a standard set of HTML attributes known as RDFa attributes, which can also be used in XML languages besides HTML.

Other documents in the suite of Citation Elements standards are as follows:

Not all of these documents are yet at the stage of having a first public draft.

Introduction

Conventions used

Where this standard gives a specific technical meaning to a word or phrase, that word or phrase is formatted in bold text in its initial definition, and in italics when used elsewhere. The key words must, must not, required, shall, shall not, should, should not, recommended,
not recommended, may and optional in this standard are to be interpreted as described in [RFC 2119].

An application is conformant with this standard if and only if it follows all the requirements and prohibitions contained in this document, as indicated by use of the words must, must not, required, shall and shall not, and the relevant parts of its normative references. Standards referencing this standard must not loosen any of the requirements and prohibitions made by this standard, nor place additional requirements or prohibitions on the constructs defined herein.

Adding requirements or prohibitions is disallowed so as to preserve interoperability between applications: data generated by one conformant application must always be acceptable to another conformant application, regardless of what additional standards each may conform to.

This standard depends on the Citation Elements: General Concepts standard [CEV Concepts]. To be conformant with this standard, an application must also be conformant with [CEV Concepts]. Some words and phrases defined in that standard are used here without further definition.

Readers are advised to read at least the introduction to [CEV Concepts] before reading this standard.

Indented text in coloured boxes, such as preceding paragraph, does not form a normative part of this standard, and is labelled as either an example or a note.

Editorial notes, such as this, are used to record outstanding issues, or points where there is not yet consensus; they will be resolved and removed for the final standard. Examples and notes will be retained in the standard.

RDFa attributes

The tagging of citation elements in formatted citations is done using a standard set of HTML attributes known as RDFa attributes which are defined in [RDFa Core]. Compliance with this FHISO standard does not require full RDFa compliance: support for the full [RDFa Core] is optional, and RDFa features other than those for which support is required in this standard should not be used when compatibility between implementations is desirable.

The specification of [RDFa Core] assumes a good working knowledge of the RDF graph model. A more accessible introduction to RDFa can be found in the [RDFa Primer], but FHISO's use of RDFa attributes here is limited and this standard is designed to be used without any knowledge of RDFa or RDF. An application parsing RDFa attributes according to this specification does not need a full RDFa parser, far less to support the full RDF graph model.

These attributes may be used in HTML or any XML-based markup language, but for the purpose of tagging citation elements in formatted citations it is recommended that they be used in XHTML. The language they are used in is referred to here as the host language.

Applications wishing to implement a fully-compliant RDFa parser for HTML will find the formal specification on the use of RDFa in HTML in two standards, [HTML+RDFa] and [XHTML+RDFa].

In the simplest case, the citation element name (which is an IRI) can be put in a property attribute on an XML or HTML element, and the citation element value is the text contents of the element. The particular type of element on which the attributes are placed is not relevant.

A simplified formatted citation to Settipani's book Les ancêtres de Charlemagne might be marked up as the following HTML fragment:

<p>Settipani, Christian.  <i>Les ancêtres de Charlemagne</i>.</p>

The title of the book can be tagged by adding a property attribute to the existing <i> element. As written above, no element contains just the author's name as the <p> element also encloses the title; however author's name can be wrapped in an <span> element and the property attributed added to that. HTML's <span> element has no defined meaning of its own, but exists to provide a place for attributes such as this.

<p><span property="http://terms.fhiso.org/sources/authorName"
  >Settipani, Christian</span>. 
  <i property="http://terms.fhiso.org/sources/title">Les ancêtres 
    de Charlemagne</i>.</p>

An HTML renderer will correctly format this while ignoring the two property attributes, but an application that conforms to this standard will extract these two citation elements from this HTML:

authorName: "Settipani, Christian"
title: "Les ancêtres de Charlemagne"

Index of attributes used

This standard makes use of the following attributes:

Motivation and limitations

In this standard, unless otherwise stated, the term HTML refers to any backwards-compatible version of HTML, and XHTML refers to any version of HTML that is also well-formed XML.

This definition of HTML includes HTML 4.01, XHTML 1.0, XHTML 1.1, HTML5 and HTML 5.1. For the last two, it includes both their XML and non-XML forms. It will include future editions of HTML5 too, assuming they retain backwards compatibility. This definition of XHTML includes not just the standards that are named XHTML, but also the XML forms of HTML5 and later.

The use of HTML, or a subset of HTML, is often permitted in genealogy applications to allow users to add formatting to text in various contexts. It is recommended that applications which allow users to edit or manually lay out formatted citations should permit the use of some HTML elements in them.

[CEV Concepts] recommends that if high quality formatted citations are required, users should be allowed to fine-tune the presentation by hand because it is not anticipated that an application will always do a perfect job. Many citation styles use italics and some use bold, underlining or other text-level formatting when formatting certain citation elements. In order to allow the user to fine-tune the use of such formatting, the user should be allowed the formatted citation to be edited as HTML.

If an application automatically generates an HTML formatted citation from a citation element set, it should add RDFa attributes in such a manner that will another application conformant with this standard will be able to extract the citation elements again. This should not be an application's principal means of serialising a citation element set: applications should prefer a format that serialises the citation element set directly rather than after converting it to a formatted citation.

The use of RDFa attributes is not the recommended way of serialising citation element sets primarily because it requires creating a formatted citation. Doing this to a reasonable standard is non-trivial, and results in particular language and style being favoured. This standard is provided for situations when a formatted citation is desired or required anyway. For example, an enormous amount of genealogical research has been published online and includes formatted citations. If they are tagged according to this standard, these formatted citations can be copied and pasted into a genealogy application which can convert them back to a citation element set.

Shorthand IRIs

The [CEV Concepts] standard makes heavy use of IRIs as identifiers, as does RDFa. In particular, the datatype, property and typeof attributes contain IRIs.

The datatype attribute shall contain a single IRI. The property and typeof attributes shall contain a list of IRIs separated by whitespace. Leading and trailing whitespace is discarded.

A common reason why multiple IRIs might be present is when two IRIs exist with similar meanings and the creator of the citation wishes to use both for compatibility.

<i property="http://terms.fhiso.org/sources/title
             http://purl.org/dc/terms/title">Les ancêtres de
  Charlemagne</i>

Here two alternative IRIs are used to tag the title, presumably because the citation's creator anticipated it being processed by applications that support [Dublin Core] metadata as well as FHISO's Citation Elements standards. A parser conforming to this standard will treat both IRIs as valid and create two citation elements, both with the same citation element value, however if the Dublic Core IRI is not known to the application, it will likely be ignored.

In the uses described by this standard the property attribute will always contain a citation element name, and the datatype attribute will always contain a class name. The typeof attribute will contain an IRI that allows this standard's use of RDFa to be distinguished from any other uses also present in the document.

RDFa provides two separate mechanism for abbreviating the IRIs in these attributes: by setting a local default vocabulary, and by using prefixes to create compact URIs expressions (CURIEs). Applications processing formatted citations in accordance with this standard must support both of these mechanisms. Expansion of terms using the local default vocabulary shall be done before the expansion of CURIEs. An application must behave as if all datatype, property and typeof attributes have been expanded before continuing to process the data.

Default vocabularies

A term in RDFa is an XML NCName that also permits slash (U+002F) as a non-leading character. It matches the term production given in §7.4.3 of [RDFa Core].

This production is as follows:

term     ::=  NCNameStartChar termChar*
termChar ::=  ( NameChar - ':' ) | '/'

The definitions of NameChar and NCNameStartChar are found in [XML] and [XML Names] respectively.

When a datatype, property or typeof attribute contains a term, it shall be converted to an IRI by prepending the local default vocabulary if one exists. The local default vocabulary is an IRI which is specified using a vocab attribute. It applies to the element where it is specified and to all elements in its content unless overridden with another vocab attribute.

Terms look similar to relative IRIs and this process is similar to resolving relative IRIs against a base IRI, but the process of applying a local default vocabulary is simpler as the two strings are simply concatenated without understanding the structure of the IRI.

Markup generators should ensure that a vocab is present if terms are being used when compatibility between implementations is desirable. When these attributes are used in languages other than HTML, the definition of that language may provide a default vocabulary that applies in the event that no vocab attribute is found; HTML provides no such default.

If no local default vocabulary was found, a parser may use an initial context as described in §9 of [RDFa Core] to resolve the term to an IRI; if not, or if it was not found in the initial context, the term shall be ignored. When an initial context is used, it must be the standard one for the host language: implementations must not define their own initial context.

<p><span property="authorName">Settipani, Christian</span>. 
  <i vocab="http://terms.fhiso.org/sources/"
     property="title">Les ancêtres de Charlemagne</i>.</p>

In this fragment, both property attributes contain a term. The title term is converted to the IRI of FHISO's title citation element:

http://terms.fhiso.org/sources/title

In considering the authorName term, a parser looks for a vocab attribute on the <span> or the enclosing <p> element. No such attribute exists, and the RDFa attributes are being used in HTML which provides no default vocabulary.

The parser may consider the standard initial context too, and if it is a full RDFa parser it must. As the host language is HTML, the initial context is defined in [HTML5+RDFa Context]. At the present time this only includes mappings for describedBy, license and role. These are to be matched case-sensitively, or failing that case-insensitively, but the title term used in this example clearly does not match.

Regardless of whether the application considered the initial context, the title term cannot be resolved to an IRI and is therefore ignored.

If use of the initial context is changed to be required for CURIEs, below, it should be changed here too.

Compact URI Expressions (CURIEs)

A CURIE comprises two components, a prefix and a reference, separated by a colon (U+003A). It matches the curie production given in §6 of [RDFa Core].

This production is defined as follows:

curie       ::=   ( prefix? ':' )? reference
prefix      ::=   NCName
reference   ::=   ( ipath-absolute | ipath-rootless | ipath-empty ) 
                       ( '?' iquery )?  ( '#' ifragment )?

The definitions of NCName is found in [XML Names]. The various productions referenced in the definition of reference are defined in [RFC 3987]. None of these ipath productions match a string beginning "//", therefore IRIs of the form http://... never match the curie syntax production. There is a conflict with certain other, less-used IRI schemes, and mailto:user@example.com does match the syntax. However this only results in this IRI being treated as a CURIE if mailto is defined as a CURIE prefix. The RDFa working group considered the risk of this to be minimal.

Although this syntax definition allows the omission of both prefix and the colon, in practice there is no situation in RDFa where both can be omitted and the result still parsed as a CURIE. A parser conforming to this standard may safely treat the colon as mandatory.

When a datatype, property or typeof attribute contains something that is syntactically a CURIE, the parser should look up its prefix to see whether a prefix mapping (which is an IRI) has been defined. This look-up occurs case-insensitively.

If the prefix has been omitted and the CURIE begins with a colon, parsers may ignore the CURIE and must not fall back to treating it as an IRI; if is is not ignored, the prefix mapping must be

http://www.w3.org/1999/xhtml/vocab#
This vocabulary contains little of use in marking up formatted citations.

When the prefix is present, a parser must try to look it up in the local prefix mappings. These are set using prefix attributes. This attribute must contain an even number of whitespace separated tokens: the first and every subsequent odd token must be an NCName followed by a colon; the second and every subsequent even token must be an IRI. The NCName is the prefix and the IRI is its prefix mapping. The mapping applies to the element where it is specified and to all elements in its content unless overridden.

The following is an example of a well-formed prefix attribute.

<div prefix="cev: http://terms.fhiso.org/sources/
             dc:  http://purl.org/dc/elements/1.1/">
  <i prefix="dc:  http://purl.org/dc/terms/"
     property="cev:title dc:title">Les ancêtres de Charlemagne</i>
</div>

The prefix attribute on the <div> defines two local prefix mappings, one for the cev prefix, the other for the dc prefix. The dc local prefix mapping is overridden by the prefix attribute on the <i> element; the cev local prefix mapping has not been overridden and remains in operation.

The prefix consisting of a single underscore character (U+005F) has special meaning in §7.4.5 [RDFa Core] for referencing blank nodes. It must not be used in CURIEs other than for that purpose. Support for blank nodes is optional in this standard. Applications that do not support blank nodes must ignore CURIEs with a prefix consisting of a single underscore.

In determining the local prefix mapppings, a parser may also use XML namespace declarations as defined in §7.5, item 3 of [RDFa Core]. This is not required even in full RDFa parsers and is deprecated; it is not recommended by this standard.

If the prefix was not found in the local prefix mappings, a parser may use an initial context as described in §9 of [RDFa Core] to determine the prefix mapping. When an initial context is used, it must be the standard one for the language on which the RDFa tags are used: implementations must not define their own initial context.

It may be worth making this required rather than optional as the initial context for HTML contains prefix mappings for several potentially useful vocabularies including Dublin Core and PROV. It is unlikely to add much complexity to the parser or this specification.

If a prefix mapping is found, the CURIE is converted to an IRI by prepending the prefix mapping to the reference part of the CURIE.

The two CURIEs in the previous example expand to these IRIs:

http://terms.fhiso.org/sources/title
http://purl.org/dc/terms/title

If no prefix mapping is found, the CURIE shall be treated as an IRI if it is syntactically valid as one or ignored otherwise. If this results in an IRI with an unknown scheme, the parser may ignore it; parsers must not ignore the http, https or urn schemes.

Virtually all CURIEs are syntactically valid IRIs since prefix:reference is a valid IRI, despite having an unknown scheme. The option of ignoring unknown IRI with unknown schemes is introduced because this standard makes the use of an initial context optional. CURIEs with prefixes that would be resolved via the initial context in a full RDFa parser may therefore be left unresolved by a parser conforming to this standard. Almost invariably they will have an unknown scheme when reinterpreted as an IRI and can therefore be dropped. Full RDFa parsers must use initial contexts and therefore must not ignore IRIs with unknown schemes.
If support for initial contexts becomes required, the ability to ignore unknown schemes should probably be dropped.

Locating citation elements

In general a document will contain more than just a single formatted citation. Other parts of the document may also contain RDFa attributes for entirely different reasons, and even if the only use of RDFa is for tagging citation elements it is important not to mix the citation elements from one formatted citation with those of another.

Citation elements are identified using property attributes. However a property attribute shall only be interpreted as representing a citation element if:

Source-type elements

A source-type element is any element that has a typeof attribute whose value once shorthand IRIs have been expanded includes the IRI:

http://terms.fhiso.org/sources/Source

HTML or XML content is only considered to be part of a formatted citation if it is a source-type element or is contained within one.

The following example contains two entirely unrelated uses of RDFa attributes:

<p vocab="http://terms.fhiso.org/sources/" typeof="Source">
  <span property="authorName">Settipani</span>, <i>Ibid.</i></p>
<div vocab="http://creativecommons.org/ns#">Released under a 
  <a href="http://creativecommons.org/licenses/by/3.0/"
     property="license">Creative Commons License</a>.</div>

The typeof attribute of the <p> element has a value that expands to the required IRI. This marks the <p> element as a source-type element, and its contents as a formatted citation. This contains just one property attribute, so a parser will find just one citation element: an authorName one with value "Settipani".

The license property is not contained in a source-type element and therefore does not denote a citation element. It is a use of RDFa that is outside the scope of this standard. This is as well: Settipani's book is not licensed under a Creative Commons License, though a page discussing it may well be.

An external mechanism may be used to designate the entirety of an HTML document or fragment a source-type element.

A non-HTML syntax might embed fragments of HTML to represent individual formatted citations. It would likely designate each fragment to be a source-type element, in which case no typeof attribute is required.

Source-exclusion elements

The concept of a source-exclusion element is necessary to prevent a parser from misinterpreting property attributes that are part of more complex RDFa constructs which this standard does not require to be supported. Future FHISO standards may make use of some of these RDFa constructs and this restriction also allows for forwards compatibility.

An application that supports only those RDFa features for which support is required by this standard must consider an element to be a source-exclusion element if:

The circumstances in which the source-type element is itself excluded needs further consideration giving particular attention to the processing sequence in §7.5 of [RDFa Core].

The following example includes a more complex use of RDFa attributes, beyond what this standard requires to be understood.

<p prefix="foaf: http://xmlns.com/foaf/0.1/"
   vocab="http://terms.fhiso.org/sources/" typeof="Source">
  <span rel="foaf:maker">
    <span property="foaf:name">Settipani</span></span>,
  <i property="title">Les ancêtres de Charlemagne</i>.
</p>

The <p> element is a source-type element due to the typeof="Source" attribute, and the formatted citation is the string "Settipani, Les ancêtres de Charlemagne."

The <p> element contains one source-exclusion element: the outer <span> element due to its rel attribute. Parsers are not expected to understand the meaning of the rel attribute, just to note its presence. As the inner <span> element is contained within this source-exclusion element, the property="foaf:name" attribute must not be treated as tagging a citation element.

The property attribute on the <i> element is not located within a source-exclusion element, and therefore it does denote a citation element. This is the only citation element in this example.

These rules allow source-type elements to nest.

<p vocab="http://terms.fhiso.org/sources/" typeof="Source">
  <span property="authorName">Settipani</span>; citing  
  <span rel="cites" typeof="Source"><i property="title">Vita 
    Sancti Arnulfi</i></span>.</p>

The <p> and second <span> elements are both source-type elements. The former contains the formatted citation "Settipani; citing Vita Sancti Arnulfi", while the latter contains the formatted citation "Vita Sancti Arnulfi". The second <span> element is also a source-exclusion element of the <p> source-type element, meaning the title property is only a citation element of the nested <span> source-type element, and not also of the enclosing <p> source-type element. The enclosing <p> source-type element only has one citation element: the authorName.

This behaviour is intentional, and is how layered citations are expected to be implemented. The details have yet to be finalised.

Applications which support a larger part of RDFa may treat fewer elements as source-exclusion elements. If so, they must ensure that RDFa constructs are only treated as citation elements when they produce an RDF triples whose subject has the following RDF type:

http://terms.fhiso.org/sources/Source

In addition, applications supporting a larger part of RDFa may discard triples where the object is an RDF blank node.

This standard is designed to allow implementers to parse those RDFa constructs used without having to consider how they map to RDF. The preceding text is only of relevant if an implementor wishes to make greater use of the RDF features underlying RDFa.

Parsing citation elements

As defined in the [CEV Concepts] standard, a citation element consists of three components:

Once a parser has identified the property attributes that are tagging citation element it shall determine each component of each citation element as described in the following sub-sections.

The property attributes shall be considered in the order they appear in the document.

The detailed specification in §7.5 of [RDFa Core] requires that property attributes are processed and used to generate RDF triples in document order. However the [RDFa Core] processing model requires these triples to be added to an RDF graph which are not required to preserve the order of triples. Nevertheless, most current RDFa processors do output properties in document order. Implementations using an RDFa parser to implement this specification should verify that the document order of properties can be determined.

For the purpose of this section, the current element refers to the element that has the property attribute which tags the current citation element.

Layer identifiers

This draft does not yet address how the layer identifier is set. Possibly with named blank nodes?

Citation element names

The citation element name shall be the value of the property attribute, once shorthand IRIs have been expanded. If the property attribute contains more than one IRI, each shall be used as the citation element name of a separate citation element with the same layer identifier and citation element value.

Citation element values

In parsing a citation element, an application shall determine its current property value. This is used to construct its citation element value. The citation element value is a translation set when the citation element is translatable and a string otherwise. To decide this, an application shall determine whether the element is translatable. If the citation element was found (or assumed by default) to be translatable, the application shall also determine the language tag. The rules for determining the current property value, its translatability and its language tag are given in the sections below.

If the citation element was found (or assumed by default) to be translatable, a new translation set shall be constructed to serve as the citation element value. It shall comprise a single string, which shall be the current property value, and shall be tagged with the language tag. If the citation element was found not to be translatable, its citation element value shall be a string which shall be the current property value.

These rules are illustrated by example in the sections below.

RDFa, as used in this standard, is a list-flattening format. This means it does not naturally provide a means of keeping the translation sets of each citation element separate because it has no means of distinguishing multi-valued elements from translatable elements. Applications must therefore assume every property attribute refers to a separate citation element.

It would be possible to define a usage of RDFa that was not a list-flattening format. After careful consideration is was decided not to do this on the grounds that it would make the RDFa usage excessively verbose and contrary to standard RDFa idioms, so much so that it would likely compromise the uptake of this standard.

The following RDFa markup will be misinterpreted by a parser conforming to this specification.

<p lang="en-GB" typeof="Source">
  <span property="authorName" 
        content="Lansdowne, Marquess of">Lord Lansdowne</span> and
  <span property="authorName" lang="jp-Latn">Hayashi Tadasu</span>
  (<span property="authorName" lang="jp">林 董</span>),
  <i property="title">The Anglo-Japanese Treaty</i>,
  <span property="publicationDate">1902</span>.
</p>

The Anglo-Japanese Treaty was (at least nominally) authored by two people: the Marquess of Lansdowne and Count Hayashi Tadasu whose name is written in kanji as 林 董. A conformant application will see three authorNames and make each into a separate citation element, when in fact the desired behaviour is for "林 董" to be part of the same translation set as "Hayashi Tadasu".

Applications are required to use the translatedElement mechanism defined in §3.4.1 of [CEV Concepts] to disambiguate these cases.

The RDFa markup from the previous example can be fixed by using a translatedElement to encode the second for of Hayashi's name. At its simplest, this alters the two <span> elements referring to Hayashi to read:

  <span property="authorName" lang="jp-Latn">Hayashi Tadasu</span>
  (<span property="translatedElement" lang="jp">林 董</span>)

However, [CEV Concepts] recommends that the first string in the translation set should be the untranslated, and ideally untransliterated form of the citation element. Undoubtedly it is the Latin form that is the transliteration, and therefore these elements are the wrong way round. While this is only a recommendation, applications should try to follow it; this can be achieved as follows:

  <span property="authorName" lang="jp" content="林 董" />
  <span property="translatedElement" 
        lang="jp-Latn">Hayashi Tadasu</span> (林 董)

This use of the content attribute is discussed below. It provides a value for the citation element while hiding the value from an HTML renderer.

Current property value

The current property value is a string which will be used to create the citation element value. It is determined based on the RDFa attributes present on the current element as follows.

The use of the term current property value in this standard coincides with its definition in [RDFa Core].

If current element has a content attribute, and either has no datatype attribute, or its datatype attribute is empty or has a value (after expanding shorthand IRIs) other than either of the following IRIs then the current property value shall be the value of the content attribute.

http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML
These two IRIs have special treatment in RDFa. This standard excludes them for completeness, but it is not anticipated that they will arise in practice.

The purpose of the content attribute is to allow the citation element value to be something that is not rendered or otherwise used in HTML. This is particularly important when the citation element is required to have a value in a format that is different to how the element is formatted.

<span property="http://terms.fhiso.org/sources/publicationDate"
      content="2017-05-22">May 22nd, 2017</span>

In this case, the use of a content attribute is necessary because the publicationDate citation element value must be a date in the prescribed date format based on [ISO 8601]: it must not be a date like "May 22nd, 2017".

It would be desirable to add support for the resource attribute here. Before adding it, it is necessary to establish how safe it is to remove resource from the list of attributes that make a source-exclusion element.

Otherwise, if the host language is HTML, if the current element has a datetime attribute, the current propetry value shall be the value of the datetime attribute.

Consider adding support for the <date> element, even without a datetime attribute, if the host language is HTML.

Otherwise, in HTML or in other XML languages that support an href attribute, if the current element has an href attribute and no datatype attribute, the current property value shall be the value of the href attribute, which shall be an IRI.

Otherwise, in HTML or in other XML languages that support an href attribute, if the current element has a src attribute and no datatype attribute, the current property value shall be the value of the src attribute, which shall be an IRI.

The [HTML+RDFa] standard does not change which HTML elements can have a datetime, href or src attribute. At present, the datetime attribute is only permitted on a <time> element; most href attributes in HTML are found on <a> elements; most src attributes are on elements that display some form of media, particularly <img> and in HTML5, <video> and <audio>.

When an href or src attribute links to an online source, it can be tagged as a citation element.

<div vocab="http://terms.fhiso.org/sources/" typeof="Source">
  <a href="http://discovery.nationalarchives.gov.uk/"
     property="accessURL"><span property="title">Discovery</span></a>
  (online catalogue)
</div>

This example has two citation elements:

accessURL: http://discovery.nationalarchives.gov.uk/
title: "Discovery"

The fact that the second property attribute is on a child element of the element containing the first property attribute is irrelevant and does not signify any additional connection between the title and the accessURL over and above their usual relationship.

Otherwise, the current property value shall be formed by concatenating the text contained in each of the descendant elements of the current element in document order.

This definition allows citation elements to nest which can be useful when tagging full titles and short versions of them.

<i property="title"><span property="shortTitle">The visitations 
  of Kent</span>, taken in the years 1530–1 by Thomas Benolte, 
  Clarenceux, and 1574 by Robert Cooke, Clarenceux.</i>

The shortTitle property takes the value "The visitations of Kent", while the title property takes the value "The visitations of Kent, taken in the years ..." by concatenating the text in the nested <span> element with the text directly in the <i> element.

Translatability

A conformant parser must determine the translatability of a citation element as follows.

If an application has access to the definition of the citation element, it must use its translatability as given in the definition.

This is expected to be the normal case, as applications are expected to ship with definitions included for those citation elements their users are likely to use commonly.

Otherwise, an application may use one or more discovery mechanism to attempt to obtain a machine-readable definition of the citation element, and if successful should use the translatability from that definition.

The [CEV Concepts] standard does not currently define a discovery mechanism. This is likely to be subject of a future FHISO standard.

Otherwise, if the current element has a non-empty datatype attribute, then the citation element shall be considered not to be translatable. The value of the datatype attribute (once shorthand IRIs have been expanded) should be the range of the citation element. A datatype attribute must not be present on a citation element which is translatable; otherwise the use of a datatype attribute is recommended for citation elements that are not well-known.

Suppose a vendor defines a citation element called reviewDate which contains an [ISO 8601] date. This third-party element may perhaps not be well known, so an RDFa author may mark up its use with a datatype attribute:

<span prefix="vendor: http://example.com/sources/
              xsd:    http://www.w3.org/2001/XMLSchema#"
      property="vendor:reviewDate" datatype="xsd:date" 
      content="2000-10-08" />

By using a datatype attribute, the RDFa author is not only ensuring the application processing the data knows the citation element is not translatable, but is also telling the application that the citation element value is a date.

Otherwise, if the host language is HTML, if the current property value was found in a datetime attribute or was the contents of a <time> element, an application may examine the current property value, and if it is syntactically valid as the following data types from [XSD Pt2], it may determine the citation element not to be translatable:

http://www.w3.org/2001/XMLSchema#date
http://www.w3.org/2001/XMLSchema#time
http://www.w3.org/2001/XMLSchema#dateTime
http://www.w3.org/2001/XMLSchema#duration
http://www.w3.org/2001/XMLSchema#gYear
http://www.w3.org/2001/XMLSchema#gYearMonth
This rule exists for compatibility with a full HTML+RDFa parser; implementation of this rule is otherwise not recommended. Document authors should not rely on this behaviour, and should instead add a datetype attribute.

An application that implements this rule will generate a citation element value containing the string 2005 from the following markup:

<time property="vendor:reviewDate">2000-10-08</time>

Had a different HTML element been used, say a <span>, and assuming this third-party element was unfamiliar to the parser, a parser not implementing this rule would have generated a translation set from this HTML element.

Otherwise, if the current property value was found in a src or href attribute, then the citation element shall be considered not to be translatable.

Otherwise, the application must assume the citation element is translatable and make its citation element value a translation set.

This is so that the current language tag is not lost, as it would be if the default were a string.

The language tag

The language tag of the citation element shall be the value of xml:lang or lang attribute on the current element, failing which on the nearest ancestor element of the current element. If both attributes are present on the same element, the xml:lang attribute takes precedence.

This standard does not change when the xml:lang and lang attributes may be used on an HTML element. In particular, the xml:lang attribute is only allowed in XHTML documents.

<p vocab="http://terms.fhiso.org/sources/" typeof="Source" lang="en">
  <span property="authorName"
        content="Settipani, Christian">Christian Settipani</span>, 
  <i property="title" lang="fr">Les ancêtres de Charlemagne</i>, 
  <span property="edition" content="2">2nd ed.</span> 
</p>

This formatted citation is correctly tagged with the language tag en denoting English. This is because, even though the book's title is French, the citation as a whole is in English. Had the citation been written in French, the edition would have been written "2ᵉ éd" rather than "2nd ed".

This example contains three citation elements. The authorName and edition citation elements both inherit the en language tag. In the case of authorName this may or may not be what was intended: the author is French but his name would not normally be altered in translation to English. The explicit language tag is necessary on the title citation element, as the title is clearly French.

If no applicable xml:lang or lang, an external mechanism may be used to supply the language tag.

In a document fetched via HTTP, a Content-Language header may provide the default language tag for the whole document.
If the formatted citation is a fragment of XHTML in a different XML language, the value of any xml:lang attributes in the host XML will be inherited by the XHTML as defined in §2.12 of [XML].

When these attributes are used in languages other than HTML, the definition of that language may provide a default language tag that applies in the event that no such attribute is found.

FHISO does not recommend the use of a default language tag when it gives privileged status to one language. If technical considerations require a default language tag, a neutral language tag such as und (defined in [ISO 639-2] to represent an undetermined language) should be used.

If no applicable xml:lang or lang attribute was found, no value was supplied through an external mechanism and no default applies, or if provided language tag is an empty string, the citation element has no language tag.

Synchronising citation elements

When an application has both a formatted citation tagged with RDFa attributes per this standard and a citation element set for the same citation, the two will typically have much content in common. This introduces the possibility that the data in the two places becomes unsynchronised. This section discusses ways of avoiding this.

In general, applications should consider information from the citation element set to have precedence over information extracted from a formatted citation.

If an application allows the manual editing of formatted citations tagged with RDFa attributes per this standard, it should take steps to prevent this from changing the citation element values that a conformant application would extract from the formatted citation to be different from the citation element values in the citation element set.

This document does not prescribe a particular mechanism for ensuring this, but most strategies will involve parse the RDFa attributes before and after the edit and identify any citation elements whose values have changed. An application might ask the user whether the change should be propagated back to the original citation element set. If the change is not to be propagated back to the citation element set, the application might delete the property attribute so the changed data is no longer recognised as a citation element, or insert a content attribute containing the correct data per §4.3.1.

Suppose an application generates the following formatted citation.

<p><span property="http://terms.fhiso.org/sources/authorName"
  >Settipani, Christian</span>. 
  <i property="http://terms.fhiso.org/sources/title">Les ancêtres 
    de Charlemagne</i>.</p>

If a user edits this HTML to replace Les ancêtres de Charlemagne with Ibid., the application should then take steps to ensure a future parser does not believe the source literally has the title Ibid. In this case, clearly the change should not be propagated back to the citation element set as the source isn't titled Ibid., and the user would presumably decline if offered this option. An application might delete the property attribute so Ibid. is not understood to be a title, or insert a content attribute containing real title as follows:

<p><span property="http://terms.fhiso.org/sources/authorName"
  >Settipani, Christian</span>. 
  <i property="http://terms.fhiso.org/sources/title"
     content="Les ancêtres de Charlemagne">Ibid.</i></p>

If an application stores formatted citations tagged with RDFa attributes as per this standard, it should take steps to ensure that changes to the underlying citation element set propagate to the formatted citation.

An application doing this would parse the formatted citation per this standard, locate the part of the HTML or XML that contains the old citation element value and overwrite it with the new value. For citation elements that are multi-valued elements, the application needs to know both the old and the new citation element value so that it knows which value is being updated; for other elements it is not necessary to know the old value.

Longer example

This example gives a full HTML document of the sort a genealogist might publish online. In a paragraph of narrative text it gives some brief details of King Edward II's birth and parents. Although brief, this information is properly sourced to three published books with the citations formatted according to the Chicago Manual of Style. Each of these formatted citations has been marked up with RDFa attributes as described in this standard. The document includes several other instances of RDFa attributes that will not be detected as citation elements by a compliant parser.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title property="dc:title">Edward II</title>
    <meta property="dc:creator" content="FHISO, Inc." />
    <style>
      p { max-width: 720px; }
      .notes p, .note { font-size: smaller; }
      .fnref { vertical-align: super; font-size: smaller; }
      .fnref::before { content: '['; }
      .fnref::after { content: ']'; }
    </style>
  </head>

  <body>
    <h1>Edward II</h1>
 
    <p>
      Edward II was the fourth son of Edward I and his first wife,
      Eleanor of Castile.<a class="fnref" href="#fn1">1</a>
      He was born in Caernarfon Castle in North Wales on 
      25 April 1284, less than a year after Edward I had conquered 
      the region, and as a result is sometimes called 
      Edward of Caernarfon.<a class="fnref" href="#fn2">2</a>
      His father was the King of England, and had also 
      inherited Gascony in south-western France, 
      which he held as the feudal vassal of the King of France, 
      and the Lordship of Ireland.<a class="fnref" href="#fn3">3</a>
    </p>

    <div vocab="http://terms.fhiso.org/sources/" class="notes">
      <h2>References</h2>

      <p typeof="Source" id="fn1">
        <span property="authorName">Roy Martin Haines</span>, 
        <i property="title">King Edward II: His Life, his Reign and 
          its Aftermath, 1284–1330</i> 
        (<span property="publicationPlace">Montreal, Canada 
           &amp; Kingston, Canada</span>: 
         <span property="publisher">McGill-Queen’s 
           University Press</span>, 
         <span property="publicationDate">2003</span>), 
        <span property="page" content="3">3</span>.
      </p>

      <p typeof="Source" id="fn2">
        <span property="authorName">Seymour Phillips</span>, 
        <i property="title">Edward II</i> 
        (<span property="publicationPlace">New Haven, US 
           &amp; London, UK</span>: 
         <span property="publisher">Yale University Press</span>, 
         <span property="publicationDate">2011</span>), 
        <span property="page" content="33, 36">33 &amp; 36</span>.
      </p>

      <p typeof="Source" id="fn3">
        <span property="authorName">Michael Prestwich</span>, 
        <i property="title">Edward I</i> 
        (<span property="publicationPlace">Berkeley, US 
           &amp; Los Angeles, US</span>: 
         <span property="publisher">University of California 
           Press</span>, 
         <span property="publicationDate">1988</span>), 
        <span property="page" content="13-14">13–14</span>.
      </p>
    </div>

    <hr/>
    <p class="note">This file is an example of an HTML document 
      containing formatted citations marked up with RDFa attributes
      per the FHISO draft standard 
      <a href="http://tech.fhiso.org/drafts/cev-rdfa-bindings"
        >Citation Elements: Bindings for RDFa</a>.</p>

    <p vocab="http://creativecommons.org/ns#"
       class="note">Content copied from 
      <a href="https://en.wikipedia.org/wiki/Edward_II_of_England"
         property="dc:source">Wikipedia</a> and released under a 
      <a href="http://creativecommons.org/licenses/by-sa/3.0/"
         property="license">Creative Commons License</a>.</p>
  </body>
</html>

References

Normative references

[CEV Concepts]
FHISO (Family History Information Standards Organisation). *Citation Elements: General Concepts". Exploratory draft of standard. See http://tech.fhiso.org/drafts/cev-concepts.
[RDFa Core]
W3C (World Wide Web Consortium). RDFa Core 1.1. W3C Recommendation, 3rd ed., 2015. See http://www.w3.org/TR/rdfa-core.
[RFC 2119]
IETF (Internet Engineering Task Force). RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. BCP 14. Scott Bradner, 1997. See http://tools.ietf.org/html/rfc2119.
[XML]
W3C (World Wide Web Consortium). Extensible Markup Language (XML) 1.0 (Fifth Edition). W3C Recommendation, 26 Nov 2008. See https://www.w3.org/TR/REC-xml/.

Other references

[CEV ELF]
FHISO (Family History Information Standards Organisation). *Citation Elements: Bindings for ELF". Early draft of standard.
[CEV GEDCOM X]
FHISO (Family History Information Standards Organisation). *Citation Elements: Bindings for GEDCOM X". Early draft of standard.
[Dublin Core]
Dublin Core Metadata Initiative. Dublin Core metadata element set. Dublin Core recommendation, version 1.1, 1999. See http://dublincore.org/documents/dcmi-terms/.
[HMTL+RDFa]
W3C (World Wide Web Consortium). HTML+RDFa 1.1. W3C Recommendation, 2nd ed., 2015. See http://www.w3.org/TR/html-rdfa.
[HTML5+RDFa Context]
W3C (World Wide Web Consortium). HTML5+RDFa Initial Context. Last updated 9 Dec 2011. See http://www.w3.org/2011/rdfa-context/html-rdfa-1.1.
[ISO 639-2]
ISO (International Organization for Standardization). ISO 639-2:1998. Codes for the representation of names of languages — Part 2: Alpha-3 code. 1998. (See http://www.loc.gov/standards/iso639-2/.)
[ISO 8601]
ISO (Internation Organization for Standardization). ISO 8601:2004. Data elements and interchange formats — Information interchange — Representation of dates and times. 2004.
[RDFa Primer]
W3C (World Wide Web Consortium). RDFa 1.1 Primer. W3C Recommendation, 3rd ed., 2015. See http://www.w3.org/TR/rdfa-primer.
[XHMTL+RDFa]
W3C (World Wide Web Consortium). XHTML+RDFa 1.1. W3C Recommendation, 3rd ed., 2015. See http://www.w3.org/TR/xhtml-rdfa.
[XML Names]
W3 (World Wide Web Consortium). Namespaces in XML 1.0 (Third Edition). W3C Recommendation, 8 Dec 2009. See https://www.w3.org/TR/REC-xml-names/.