This document describes a XSLT stylesheet that transforms application/xml+rdf to a series of RDF database API calls. Further, it describes a schema annotation system for generating that XSLT, as well as other grammar-defined applications. This research is used in RDAL.
This document represents some experiments in using XSLT to parse RDF. This is not endorsed by the W3C membership.
This basic RDF API has four syntactically differentiated datatypes (no constructors for them now):
and five functions:
where the parameters $predicate, $subject, $object should all be interned from the database atom dictionary.
It seems practical to add the following constructors:
The expressivity of XSLT limits variable assignment to an awkward construction of segmenting a template and passing all of the state into the new second segment of the template. In a terse syntax, this looks roughly like:
# Call typedNode with a predicate and a subject. call-template typedNode_0(predicate="p1", subject="s1") # Template for typedNode production. template typedNode_0 (predicate, subject) if (@r:about) call-template typedNode_1(predicate, subject, object=uri(@r:about)) if (@r:ID) call-template typedNode_1(predicate, subject, object=uri(@r:ID, baseUri)) if (@r:nodeID) call-template typedNode_1(predicate, subject, object=bnode(@r:nodeID)) call-template typedNode_1(predicate, subject, object=bnode(generate-id(.))) # Chained typedNode production with object variable set. template typedNode_1 (predicate, subject, object) # Continue the typedNode template with object set.
To really have uri, bnode and literal be templates would require a version of each template for each possible set of parameters passed to the next template. Yeah, right. Perhaps it will be easy to implement them as sort of a macro that gets expanded when writing the XSLT.
This language is called XQ-like because it is similar in syntax to XQuery, and even shares some semantics like variable assignment, XPath node access... It is intended to use a very small subset of XQuery. That subset currently excludes access to parent and child nodes (apart from the attribute nodes that are children of their containing element) in order to make the SAX event handlers simple. It will be possible to write handlers that track state for access to XPath nodes during other events, but that seemed like work so I punted. (Some of this has to be done to distinguish productions which differ in their nested elements.)
Generating the collection template is going to be hard. I fear it. I rue the day.
RDF is unordered by default. The parseType="Collection"
attribute is used to specify ordered, closed (thoroughly enumerated) sets in RDF. As an example
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/stuff/1.0/"> <rdf:Description rdf:about="http://example.org/basket"> <ex:hasFruit rdf:parseType="Collection"> <rdf:Description rdf:about="http://example.org/banana"/> <rdf:Description rdf:about="http://example.org/apple"/> <rdf:Description rdf:about="http://example.org/pear"/> </ex:hasFruit> </rdf:Description> </rdf:RDF>
states that the basket has exactly the set of (banana, apple, pear). This is represented by the graph . This nil node indicates the end of the list (and keeps anyone from adding to the closed list).
The tricky part is adding the arc to the nil node as the rest of the last element. RDFXMLtoRdfXS currently uses the test ./*[$index+1]
to find the last element in a collection and has some conditional code to stich the earlier elements together. This breaks the easy mapping to SAX handlers. Guess this will require some of the state-tracking hander alluded to in XQ-like.
The hand-coded stylesheet uses a recursive template to walk through children (members of the collection):
if (@parseType = 'Collection') addTriple(predicate, subject, bnode(.)) collection_r(subject, 1) collection_r(subject, index) for-each select="./*[$index]" typedNode_0(r:first, subject) if (./*[$index+1]) addTriple(r:rest, subject, bnode(.)) collection_r(bnode(.), index+1) else addTriple(r:rest, subject, r:nil)
So far, I've only tested the hand-generated XSLT on a few RDF tests:
name | input | output | problems |
---|---|---|---|
kitchen sink test | test.rdf | test.ntriple | needs XSLT for c14n |
attribute | testAttr.rdf | testAttr.ntriple | |
literal | literal.rdf | literal.ntriple |
The current machine-generated XSLT shows is much more rigourous, though not actually functional. Features:
The bulk of the work is in the RDFXMLtoRdfXS.xsl script so versions track it's CVS version:
This work started with rdfToDB.xsl, which is no longer being maintained: