SPAT — SPARQL Annotations

Work in progress.
$Revision: 1.10 $ of $Date: 2007/02/10 02:26:46 $
Author:
Eric Prud'hommeaux, W3C <[email protected]>
related technologies: GRDDL · RDAL · SPDL

Abstract

This document describes SPAT, the SPARQL Annotations language, an extension of the SPARQL Triple Pattern. SPAT extends a part of the SPARQL Query Language to include XPaths, in order to facilitate the flow of data between the semantic web and XML.

Status of this Document

This is the work of the author — it is not endorsed by the W3C members.

Table of Contents

1 Introduction

Developers of semantic web applications are often confronted with a need to access data in XML. Because this data is in a radically different model, it has been difficult to query using the RDF query language SPARQL [SPARQL]. For the same reason, data from the semantic web is tedious to place in XML documents, generally requiring construction of an XSLT stylesheet to transform the data either from RDF XML or SPARQL XML Results [RESULTS] to the desired XML schema.

SPARQL Annotations offers a solution — enabling simple associations between RDF terms and information in an XML document by adding XPaths [XPath] to SPARQL triple patterns. SPARQL Annotations are evaluated in the context of an XML document, allowing the direct transfer of data between that document and the RDF triples described by the triple pattern.

SPARQL Annotation can be viewed as an enabling technology. Any functionality is derived from applications using SPARQL Annotations. In this document, we will assume a simple XSLT-like application with inputs of an RDF graph BooksGraph, an either an XML document BooksDoc or a W3C XML Schema BooksSchema describing some class of documents. A second XML Schema BooksSchema-SPAT illustrates SPARQL Annotations in a schema file, and how that affects namespace and XPath resolution.

1.1 Document Convetions

The following table defines the namespace prefixes used in this document and describes their purposes.

prefix description namespace
spat Namespace for SPARQL Annotations http://www.w3.org/2007/01/SPAT/ns
xb example namespace for XML Books documents http://example.org/schemas/XML/Books/
rb example Books namespace in RDF http://example.org/schemas/RDF/Books#

RDF Graphs are represented in the RDF Graph language TURTLE [TURTLE].

2. Example

In order to illustrate SPARQL Annotations, consider an example:

[ rb:name      xpath("/xb:BookList/xb:Book/xb:Title") ;
  rb:publisher xpath("/xb:BookList/xb:Book/xb:Publisher") ] .

This example includes two triples patterns, associating the objects of the triples with some information accessible by the XPaths /xb:BookList/xb:Book/xb:Keywords and /xb:BookList/xb:Book/xb:SearchIndex. These XPaths match elements in a document described by this schema:

<xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 elementFormDefault="qualified"
 targetNamespace="http://example.org/schemas/XML/Books/"
 xmlns:xb="http://example.org/schemas/XML/Books/">
  <xs:element name="BookList">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" ref="xb:Book"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="Book">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Title" type="xs:string"/>
        <xs:element name="Publisher" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

3. Namespace and XPath Resolution

The resolution of XPaths and namespaces [NAMESPACE] is defined by the enabling technology. In our example application, namespaces come from xmlns declarations in the XML Schema document S and XPaths are resolved against the documents described by S, including the instance document D. For instance, the simple application described in this document is defined to resolve the SPAT XPaths against the root of the given (BooksDoc) XML document. If it is passed an XML schema with embedded SPARQL Annotations, it resolves them against the notional node described by that location in the XML Schema. For example, adding SPARQL Annotations to the Book element described by the above XML Schema

  <xs:element name="Book"
   spat:SPAT="[ rb:name      xpath(&quot;xb:Title&quot;) ;
	        rb:publisher xpath(&quot;xb:Publisher&quot;) ] .">

The namespaces in this SPARQL Annotation are resolved against xmlns declarations alread in the schema, plus extra ones added to handle the prefixes added in this annotation:

 xmlns:spat="http://www.w3.org/2007/01/SPAT/ns"
 xmlns:rb="http://example.org/schemas/RDF/Books#"

The XPaths are resolve against the location in the documents described by this schema. Thus, xb:Title is resolved relative to the xb:Book element.

4. Transferring XML Information to RDF

A simple SPARQL Annotation can be used to extract data from an XML document to construct triple pattern. The example SPARQL Annotations can be resolved against a document BooksDoc, conforming to the above schema, to produce the node xpath("xb:Title") => {"Hare-brained XML", "Semweb Chicanery"} and xpath("xb:Title") => {"Carrion Brothers", "Carrion Brothers"}. When substituted into the triple patterns, we get:

[ rb:name      "Hare-brained XML" ;
  rb:publisher "Carrion Brothers" ] .

[ rb:name      "Semweb Chicanery" ;
  rb:publisher "Carrion Brothers" ] .

5. Transferring RDF Information to XML

SPARQL Annotations can be used in conjunction with other document structure knowledge, for instance, XML Schema, to enable a processor to build a document with that structure, populated by data from an RDF graph, in this example, BooksGraph.

@prefix rb: <http://example.org/schemas/RDF/Books#> .

[ rb:name      "Hare-brained XML" ;
  rb:publisher "Carrion Brothers" ] .

[ rb:name      "Semweb Chicanery" ;
  rb:publisher "Carrion Brothers" ] .

The XML Schema describes a document with the root <BookList>. The annotation on the element Book provides the necessary information to create a series of two <Book> elements. The document resulting from creating the maximum number of <Book> elements is:

<xb:BookList xmlns:xb="http://example.org/schemas/XML/Books/">
  <xb:Book>
    <xb:Title>Hare-brained XML</xb:Title>
    <xb:Publisher>Carrion Brothers</xb:Publisher>
  </xb:Book>
  <xb:Book>
    <xb:Title>Semweb Chicanery</xb:Title>
    <xb:Publisher>Carrion Brothers</xb:Publisher>
  </xb:Book>
</xb:BookList>

A. References

[SPARQL]
SPARQL Query Language for RDF, E. Prud'hommeaux, A. Seaborne, Editors. World Wide Web Consortium. 19 April 2005. Work in progress. This version is http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050419/. The latest version of SPARQL Query Language for RDF is available at http://www.w3.org/TR/rdf-sparql-query/.
[XPATH]
XML Path Language (XPath) 2.0, Don Chamberlin , Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 23 Jan 2007. This version is http://www.w3.org/TR/2007/REC-xpath20-20070123/. The latest version is available at http://www.w3.org/TR/xpath20/.
[TURTLE]
"Turtle - Terse RDF Triple Language, Dave Beckett.
[RESULTS]
SPARQL Query Results XML Format , D. Beckett, Editor, W3C Working Draft (work in progress), 27 May 2005, http://www.w3.org/TR/2005/WD-rdf-sparql-XMLres-20050527/ . Latest version available at http://www.w3.org/TR/rdf-sparql-XMLres/ .
[NAMESPACE]
Namespaces in XML 1.1 , T. Bray, A. Layman, D. Hollander, R. Tobin, Editors, W3C Recommendation, 4 February 2004, http://www.w3.org/TR/2004/REC-xml-names11-20040204 . Latest version available at http://www.w3.org/TR/xml-names11/ .