This document shows an example run of an experimental schema annotation system. None of this document, nor the implementation, nor the schema annotation conventions are endorsed by the membership of W3C.
There is, of course, a large amount of data available in non-RDF XML that would be useful to the semantic web. One can create XSLT templates to convert this data to an RDFXML idiom, but that is tedious, error prone, and not tied to the specification of the XML grammar. RDAL is an annotation convention to express semantic actions for productions described in an XML grammar. This implementation uses RelaxNG Compact Syntax as the annotation language and calls functions in a library for expressing RDF triples in the ntriples syntax.
Note: RDAL is not confined to the making RDF statements, it is merely the test scenerio. The XQuery in the annotations may call other APIs besides RdfXS.
We have some colloquial XML document that expresses some information for a human resources department. This includes employee names and addresses and department affiliations.
<per:Personel xmlns:per="http://example.com/Personel" xmlns:addr="http://example.com/Address"> <per:Person per:ID="bsmith"> <per:given>Bob</per:given> <per:family>Smith</per:family> <per:email>[email protected]</per:email> <per:addr per:href="#bsmith_addr"/> </per:Person> ... <per:Departement> <per:name>R-n-D</per:name> <per:manager per:href="#bsmith"/> ... <per:location per:href="#rnd_addr"/> </per:Departement> <addr:Address per:ID="rnd_addr">1 king street</addr:Address> <addr:Address per:ID="bsmith_addr">123 elm street</addr:Address> ... </per:Personel>
We have a graph in mind and want some RDF triples out of this document, like:
<http://localhost/#bsmith> <http://xmlns.com/foaf/0.1/givenname> "Bob" . <http://localhost/#bsmith> <http://xmlns.com/foaf/0.1/familyname> "Smith" . <http://localhost/#bsmith> <http://xmlns.com/foaf/0.1/email> <mailto:[email protected]> . <http://localhost/#bsmith> <http://example.com/HR#addr> <http://www.w3.org/2004/02/03-rdal/HR.xml#bsmith_addr> .
This example shows the annotations added to a RelaxNG Compact Syntax schema for the HR schema.
# default namespace = "http://example.com/Personel" namespace per = "http://example.com/Personel" namespace addr = "http://example.com/Address" datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes" namespace a = "http://www.w3.org/2002/12/26-XMLgrammer2RDFdb/annot#" namespace foaf = "http://xmlns.com/foaf/0.1/" namespace r = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" start = doc doc = Personel # Function declarations. >> a:prototype ["declare function addTriple($predicate, $subject, $object) \x{a}" ~ "declare function addTriple_lit($predicate, $subject, $object) \x{a}" ~ "declare function addTriple_ref($predicate, $subject, $object, $baseUri) \x{a}" ~ "declare function error($hint, $expected)"] # All the predicates are here for easy maintenance. >> a:globals ["declare global $baseURI:='-- need base URI parameter --' \x{a}" ~ "declare global $foaf:='http://xmlns.com/foaf/0.1/' \x{a}" ~ "declare global $hr:='http://example.com/HR#' \x{a}" ~ "declare global $addr:='http://example.com/Addr#' \x{a}" ~ "declare global $Given:=concat('<', $foaf, 'givenname>') \x{a}" ~ "declare global $Family:=concat('<', $foaf, 'familyname>') \x{a}" ~ "declare global $Email:=concat('<', $foaf, 'email>') \x{a}" ~ "declare global $Addr:=concat('<', $hr, 'addr>') \x{a}" ~ "declare global $Manager:=concat('<', $hr, 'manager>') \x{a}" ~ "declare global $Grunt:=concat('<', $hr, 'grunt>') \x{a}" ~ "declare global $Location:=concat('<', $hr, 'location>') \x{a}" ~ "declare global $StreetAddr:=concat('<', $addr, 'streetAddr>')"] Personel = element per:Personel { PersonelElts+ } PersonelElts = Person | Department | Address Person = element per:Person { # The subject of the following triples comes from the @per:ID. [a:assignment["let $subject:=concat('<', $baseURI, '#', @per:ID, '>')"]]attribute per:ID { xsd:NMTOKEN }, element per:given { [a:action["addTriple_lit($Given, $subject, text())"]]Name }, element per:family { [a:action["addTriple_lit($Family, $subject, text())"]]Name }, element per:email { [a:action["addTriple($Email, $subject, concat('<mailto:', text(), '>'))"]]text }, element per:addr { [a:action["addTriple_ref($Addr, $subject, @per:href, $baseURI)"]]attribute per:href { Ref } } } Department = element per:Departement { [a:assignment["let $subject:=concat('<', $baseURI, '#', per:name/text(), '>')"]]element per:name { text }, element per:manager { [a:action["addTriple_ref($Manager, $subject, @per:href, $baseURI)"]]attribute per:href { Ref } }, element per:grunt { [a:action["addTriple_ref($Grunt, $subject, @per:href, $baseURI)"]]attribute per:href { Ref } }+, element per:location { [a:action["addTriple_ref($Location, $subject, @per:href, $baseURI)"]]attribute per:href { Ref } } } Address = element addr:Address { attribute per:ID { xsd:NMTOKEN }, text } >> a:action ["addTriple_lit($StreetAddr, concat('<', $baseURI, '#', @per:ID, '>'), text())"] Ref = string # text # xsd:NCName Name = text
A RelaxNG schema, add some RDAL annotations, run rngSerializer -m xsl
and get an XSLT stylesheet. Run that on some instance document and get ntriples.
./rngSerializer -m xsl ../test/rng/HR-rdal.rnc -l RdfXStoNTriple.xsl> HR.xsl xsltproc --stringparam baseURI http://www.w3.org/2004/02/03-rdal/HR.xml HR.xsl ../test/rng/HR.xml > HR.ntriple
To get a graph image:
./rngSerializer -m xsl ../test/rng/HR-rdal.rnc -l RdfXStoDot.xsl> HRdot.xsl
xsltproc --stringparam baseURI http://www.w3.org/2004/02/03-rdal/HR.xml HRdot.xsl ../test/rng/HR.xml > HR-img.dot
# Impose some namespaces.
perl -pi -e "s|http://example.com/HR#|hr:|g" HR-img.dot
perl -pi -e "s|http://example.com/Addr#|addr:|g" HR-img.dot
perl -pi -e "s|http://xmlns.com/foaf/0.1/|foaf:|g" HR-img.dot
perl -pi -e "s|http://www.w3.org/2004/02/03-rdal/HR.xml#|doc:|g" HR-img.dot
dot -T png -o HR-img.png HR-img.dot
rngSerializer parses the RNC into a SchemaValidationCompileTree. It then calls toXsl on the objects in this tree. These generate a set of XSLT templates that traverse the entire grammar, validating the input document against the grammar. Another component, the Rdal handler, gets callbacks for everything in the rdal namespace. It adds output text to the templates. Basically, rngSerializer is a small tool that connects the output of a RelaxNG parser to the Rdal handler and prints the output.
Given a schema with no annotations, rngSerializer will yield a stylesheet that validates the input document, but produces no output. This is comparable to the step of hand-generating XSLT minus output text.
RelaxNG describes the production for an element or attribute in terms of a name class defining allowable names for the element or attribute, and pattern defining the content model. The content model is a set of elements, attributes, text and any productions wrapped in some logic describing series and mutual exclusion (and, in the case of attributes and text, allowed values). Let's take the simple case with a root R containing two elements A and B:
start = element R { element A {}, element B {} }
The order can be enforced with XSLT:
<!-- root: . A,B<END> --> <xsl:call-template name="AB_A" select="/"> <xsl:with-param name="__INDEX" select="'0'"/> </xsl:call-template> <xsl:template name="AB_A"> <xsl:with-param name="__INDEX"/> <xsl:for-each select="*[$__INDEX]"> <xsl:choose> <xsl:when test="self::A"> <!-- A: . <END> --> <xsl:call-template name="A_END"> <xsl:with-param name="__INDEX" select="'0'"/> </xsl:call-template> <!-- root: A, . B<END> --> <xsl:call-template name="AB_B" select=".."> <xsl:with-param name="__INDEX" select="$__INDEX+1"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="error"/> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:template> <xsl:template name="AB_B"> <xsl:with-param name="__INDEX"/> <xsl:for-each select="*[$__INDEX]"> <xsl:choose> <xsl:when test="self::B"> <!-- B: . <END> --> <xsl:call-template name="_END"> <xsl:with-param name="__INDEX" select="'0'"/> </xsl:call-template> <!-- root: A,B . <END> --> <xsl:call-template name="_END" select=".."> <xsl:with-param name="__INDEX" select="$__INDEX+1"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="error"/> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:template> <xsl:template name="_END"> <xsl:with-param name="__INDEX"/> <!-- ... . <END> --> <xsl:for-each select="*[$__INDEX]"> <xsl:call-template name="error"/> </xsl:for-each> </xsl:template>
Template names (AB_A, AB_B, ...) are helpful for the observer, but will need to be made unique to prevent name collions in some grammars (even in XML DTDs, two elements may have the same content model).
I think this step will allow XSLT to represent the DFA coming from a RelaxNG schema and thus offer complete validation. This will also make the semantic actions dispatch provably reliable.
$Id: Overview.html,v 1.14 2004/06/25 09:37:40 eric Exp $
Eric Prud'hommeaux