Copyright © 2004 W3C® ( MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, and document use rules apply.
This document describes a SPARQL extension to optimize federated queries.
This documents experiments by the author. It is not endorsed by the W3C Team or Membership. It is hoped that the work described here will be pertinent to the life sciences work persued by W3C.
SPARQL queries are not confined by datasource boundries. Queries over distributed data often entail querying one source and using the acquired information to constrain queries of the next source. Without extension, SPARQL entines express this acquired knowledge by rewriting the federated query with bindings produced by earlier queries. This requires the query to be issued repeatedly, once for each putative solution. SPARQLfed
bundles an intermediate result set with a SPARQL query, allowing the remote engine to locally join its data against the current constraints.
As an example consider the five data sources listed in Case Study: FeDeRate for Drug Research. the initial GRAPH
query
# Get a name and a chemical from the (SQL) MicroArray database. GRAPH db:MicroArray.prop { ?g ma:name ?name . ?g ma:expression "up" . ?g ma:experiment ?kinase . ?kinase ma:against ?agin . ?agin cs:chemical ?chemical } }
is dispatched on the MicroArray database, producing an intermediate result set:
g | name | kinase | agin | chemical |
---|---|---|---|---|
g1 | name1 | kinase1 | agin1 | chemical1 |
g2 | name2 | kinase2 | agin2 | chemical2 |
g3 | name3 | kinase2 | agin2 | chemical3 |
The next GRAPH
query
# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
?p ma:name ?name . # bound to ?ma.ma:name
?p up:motif ?motif .
?p up:pathway "apoptosis" }
is constrained by the variable name
(though it could easily by constrained by more of the variables introduced in the previous query). The SPARQL engine may either
name1
, name2
, name3
) substituted for ?name
# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
?p ma:name "name1" . # bound to ?ma.ma:name
?p up:motif ?motif .
?p up:pathway "apoptosis" }
Either approach faces inefficiencies, and the former can be a disastrously avaricious, retrieving all of the remote data in a database.
The SPARQLfed
extension modifies the SPARQL grammar, adding a BindingClause
to the WhereClause
:
WhereClause ::= ("WHERE")? GroupGraphPattern (BindingClause)? BindingClause ::= "BINDINGS" (Var)+ "{" (Binding)* "}" Binding ::= "(" (VarOrTerm)+ ")"
This enables the query engine to dispatch the above federation in one query:
# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
?p ma:name ?name . # bound to ?ma.ma:name
?p up:motif ?motif .
?p up:pathway "apoptosis" }
BINDINGS ?name {
("name1")
("name2")
("name3") }
SPARQLfed
has been implemented in FeDeRate
and is underway in the SPASQL MySQL port.