Exert (XML assertion grammars)

Exert is an extension of RelaxNG to support improved modularity, and draws upon object oriented programming languages for inspiration.

Well formed XML documents can be modeled as a tree of elements, attributes and text nodes. There are few other less common node types, but these will be ignored for the purposes of this account. Here is a simple example:

<book isbn="0-19-860045-3">
  <title>The Oxford Pocket English Dictionary</title>
  <publisher>Clarendon Press</publisher>
  <date>1996</date>
</book>

where book, title, publisher and date are elements and isbn is an attribute. Such XML documents can be described in terms of regular tree expressions describing sequences and choices of node types. RelaxNG is an example of a formalism for representing such regular tree expressions. The above document may be described in RelaxNG as follows:

<element name="book" xmlns="http://relaxng.org/ns/structure/1.0">
  <attribute name="isbn">
    <text/>
  </attribute>
  <element name="title">
    <text/>
  </element>
  <element name="publisher">
    <text/>
  </element>
  <element name="date">
    <text/>
  </element>
</element>

This can flattened out as separate definitions, e.g.

<grammar xmlns="http://relaxng.org/ns/structure/1.0">

  <start>
    <ref name='book'/>
  </start>

  <define name="book">
    <element name="book">
      <attribute name="isbn">
        <text/>
      </attribute>
      <ref name="title"/>
      <ref name="publisher"/>
      <ref name="date"/>
    </element>
  </define>

  <define name="title">
    <element name="title">
      <text/>
    </element>
  </define>

  <define name="publisher">
    <element name="publisher">
      <text/>
    </element>
  </define>

  <define name="date">
    <element name="date">
      <text/>
    </element>
  </define>

</grammar> 

You can further place such definitions in separate files to make it easier to maintain larger schemas. There are also constructs for sequences with one or more items, choices, and many other features. See the RelaxNG tutorial for more examples.

Extending existing definitions

Let's say you have an element with a 'name' attribute where the permitted content for the element depends on the value of that attribute. RelaxNG allows you to express this with a 'choice' element, e.g.

<define name='event'>
  <element name='event'>
    <attribute name='target'>
      <text/>
    </attribute>
    <choice>
      <ref name='load'/>
      <ref name='unload'/>
      <ref name='click'/>
    </choice>
  </element>
</define>

<define name='load'>
  <attribute name='name'>
    <value>load</value>
  </attribute>
<define>

<define name='unload'>
  <attribute name='name'>
    <value>unload</value>
  </attribute>
<define>

<define name='click'>
  <attribute name='name'>
    <value>click</value>
  </attribute>
  <attribute name='x'>
    <data type="integer"/>
  </attribute>
  <attribute name='y'>
    <data type="integer"/>
  </attribute>
<define>

where the event element always has a name and target attributes, but if name is 'click' it also has x and y attributes for the location clicked.

It would be desirable to start with the definition of the event element on its own, e.g.

<define name='event'>
  <element name='event'>
    <attribute name='name'>
      <text/>
    </attribute>
    <attribute name='target'>
      <text/>
    </attribute>
  </element>
</define>

and to then extend it with other definitions, e.g.

<define name='click' extends='event'>
  <attribute name='name'>
    <value>click</value>
  </attribute>
  <attribute name='x'>
    <data type="integer"/>
  </attribute>
  <attribute name='y'>
    <data type="integer"/>
  </attribute>
<define>

which asserts that if the event element has a name attribute with the value 'click' then it must also have the attributes x and y. This overrides the definition of the 'name' attribute in the original definition. It could have also overridden the element's name and its content model. Note this is an extension that isn't part of the RelaxNG standard.

The means to make definitions that refer to what they extend makes it possible to compose an existing schema in one file with extensions in other files without the need to edit the existing schema file to refer to the new extensions. This makes for greater modularity of specifications and is the motivation for introducing yet another language for defining XML schemas.

Here is an example that names a choice and later defines what the choices can be.

<define 'Pizza'>
  <element name='pizza'>
    <oneOrMore>
      <choice name='topping'/>
    </oneOrMore>
  </element>
</define>

<define 'Cheese' extends='topping'>
  <element name='cheese'/>
</define>

<define 'Sausage' extends='topping'>
  <element name='sausage'/>
</define>

Here the content model for <piza> is defined as one or more <cheese/> or <sausage/> elements. The term 'choice' indicates that it names a set of choices that are defined elsewhere. In principle you can also have some locally defined choices as well, e.g.

<define ...>
  ...
  <choice name='colors'>
    <value>black</value>
    <value>white</value>
  </choice>
  ...
</define>

<define name='Fuschia' extends='colors'>
  <value>fuschia</value>
</define>

where the set of colors is now black, white and fuschia.

If a derived type doesn't permit an attribute defined in a base type, it can be undefined, e.g.

<define name='foo' extends='event'>
  <undefine name='target'/>
</defines>

This only applies to attributes since the element name and content model can be easily overridden, assuming that the default content model is empty. To ensure that a derived type has an empty content model you would use <empty/>.

All of the other features of RelaxNG apply as is. Definitions referred to by the <ref/> element are treated as macro substitutions and can be overriden by derived types. You can thus choose to use either the top-down or bottom-up approaches for definitions as appropriate.

Dave Raggett, 3rd September 2006

Volantis Email: [email protected], phone/fax: +44 1225 866 240 mobile: +44 7917 839 038 (GSM)