① SPASQL Example · ② SPASQL-MySQL · ③ SPASQL XTech Paper · ③ XTech Slides · ⑤ SPASQL CVS Tree
This is an experiment to add SPARQL support to MySQL. It's based on query translation work currently implemented in OO perl. An example of that translation is describe in a write-up on of an interface to WWW 2005 conference data. If anyone is interested in participating, please contact <[email protected]>.
The results of a SPARQL select are largely expressable in an SQL result set. It makes a preverse kind of sense to make SPARQL queries available to existing MySQL clients by re-using the MySQL wire protocol.
mysqld offers two interfaces, a named pipe (probably something equivilent on Windows, if mysqld runs on Windows), and a socket interface. msyql clients speak a simple line protocol that consists of a one-byte command and parms. In the case of a query, the command is known by the enum COM_QUERY
(3) and looks like (in gdb escaping):
p packet $16 = 0x8b45fe0 "\003SELECT * FROM __Holds__ INNER JOIN __Nodes__ ON __Holds__.p=__Nodes__.id LIMIT 2"
The MySQL command line client, mysql
, passes commands that it does not recognize as queries. For instance,
mysql> SPARQL:SELECT * FROM __Holds__ INNER JOIN __Nodes__ ON __Holds__.p=__Nodes__.id LIMIT 2;
got to the server as
\003SPARQL:SELECT * FROM __Holds__ INNER JOIN __Nodes__ ON __Holds__.p=__Nodes__.id LIMIT 2
The command stack for a successfully parsed query
#1 0x0814acdd in handle_select (thd=0x8b44ff0, lex=0x8b4e0b0, result=0x8b4e2e8) at sql_select.cc:183 #2 0x08129c7d in mysql_execute_command () at sql_parse.cc:1509 #3 0x0812e879 in mysql_parse (thd=0x8b44ff0, inBuf=0x8b45120 "\001", length=146034684) at sql_parse.cc:3080 #4 0x081289c8 in dispatch_command (command=COM_QUERY, thd=0x8b44ff0, packet=0x8b45fe1 "SELECT * FROM __Holds__ INNER JOIN __Nodes__ ON __Holds__.p=__Nodes__.id LIMIT 4", packet_length=146034672) at sql_parse.cc:1113
presents several potential places to intercept SPARQL queries. Options:
\015PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?mbox WHERE { ?a foaf:knows ?b. ?b foaf:mbox ?mbox }
Clean, but no SPARQL queries will be typable in the client mysql
without hacking it as well.
\003SPARQL:PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?mbox WHERE { ?a foaf:knows ?b.
?b foaf:mbox ?mbox }
Allows mysql
users to type queries. Also available to any other client libraries that pass unknown strings as SQL queries.
Seems a little profane.
10.2.3 Adding a New Native Function has some pointers for adding functionality to MySQL. The impedance mismatch between SPARQL and SQL will require more effort by the parser/post-processor than the current parser code in sql_yacc.yy . The query construction will build an internal datastructure similar to that of SqlDB.pm . It will transform that into a set of calls to something like
void join_on(Field *f1, Field *f2) { Item *exp = new Item_func_eq(new Item_field(NullS, f1.tableName, f1.fieldName), new Item_field(NullS, f2.tableName, f2.fieldName)) add_join_on(f2.tableName, exp)
SQL queries don't care about foreign keys or your entity relationship diagram or any of that. It is no more natural to join Person.address ON Address.primaryKey than to join Person.shoeSize ON Address.streetNumber. (Relational calculus cares about domains, but SQL doesn't reflect that, as far as I've seen.) SQL treats foreign keys as integrity constraints at modification time. Consequentially, many databases do not have any hints about how tables/fields are linked.
RDF encourages us to model relationships conceptually. Joe's address is not 21853, but instead the data in the tuple with the primary key 21853. Query tranlation leans heavily on a table of foreign keys. They are detected by structures like ?x p ?KEY . ?KEY p2 ?z
where ?KEY
is the object of one arc and the subject of another. SPASQL introduces a new primary key Item to MySQL which resolves it's field reference after the schema is loaded for the tables.
Unlike SQL LEFT OUTER JOINS, SPARQL's OPIONALs are potentially complex graph patterns, with multiple joins, references to new variables and previously bound variables, and FILTER constraints. The general recipe for this is to keep track of the variables (and therefor, the table attributes) introduced by an OPTIONAL graph pattern. The WHERE constraints need to enforce the FILTER conmstraints only if all the introduced variables are NOT NULL. Despite the fact that the relational engine may have been able to find some bindings for optional variabls, we can only report them if all the variables in the pattern were bound. For example
SELECT ?orderDate ?postCodePos ?districtPos WHERE { ?order <Orders.orderDate> ?date . OPTIONAL { ?order <Orders.shippingAddress> ?address . ?address <Addresses.pattern> ?inf . ?inf <AddressPattern.postCodePosn> ?postCodePos . ?inf <AddressPattern.districtPosn> ?districtPos FILTER ( ?postCodePos != "N/A" && ?districtPos != "N/A" ) } }
translates to
SELECT ?orderDate, if(?address IS NULL || ?inf IS NULL || ?postCodePos IS NULL || ?districtPos IS NULL, NULL, ?postCodePos ) AS ?postCodePos, if(?address IS NULL || ?inf IS NULL || ?postCodePos IS NULL || ?districtPos IS NULL, NULL, ?districtPos ) AS ?districtPos WHERE ( ?address IS NULL || ?inf IS NULL || ?postCodePos IS NULL || ?districtPos || (?postCodePos != "N/A" && ?districtPos != "N/A") )
The remaining problem is that one may get duplicate rows like
?orderDate | ?postCodePos | ?districtPos |
---|---|---|
2002-09-08 00:00:00 | NULL | NULL |
2002-09-08 00:00:00 | NULL | NULL |
if there were, say, two AddressPatterns @@need backward pattern to illustrate properly?@@ for a given address, but some other binding in the OPTIONAL fails.
An example that nests UNIONs and OPTIONALs is essential for working out the GUT (grand unified theory). This example shows what parts will pass through shipping and recieving:
SELECT ?part ?descr ?given ?family ?posn ?rma ?billCity ?billState ?shipCity ?shipState WHERE { { ?order <Orders.product> ?prod . ?order <Orders.customer> ?cust . OPTIONAL { ?order <Orders.shippingAddress> ?shipAddr . ?shipAddr <Addresses.city> ?shipCity . ?shipAddr <Addresses.state> ?shipState } } UNION { ?order <Returns.product> ?prod . ?order <Returns.customer> ?cust . ?order <Returns.RMAnumber> ?rma } ?prod <Products.partNo> ?part . ?prod <Products.description> ?descr . { ?cust <Customers.id> ?num . ?cust <Customers.givenName> ?given . ?cust <Customers.familyName> ?family . ?cust <Customers.billingAddress> ?addr } UNION { ?cust <Employees.id> ?num . ?cust <Employees.givenName> ?given . ?cust <Employees.familyName> ?family . ?cust <Employees.position> ?posn } ?cust <Employees.homeAddress> ?addr . ?addr <Addresses.city> ?billCity . ?addr <Addresses.state> ?billState }
or, factored into predicates that can map to one of two tables:
SELECT ?part ?descr ?given ?family ?posn ?rma ?billCity ?billState ?shipCity ?shipState WHERE { ?order <Orders-Returns.product> ?prod . ?order <Orders-Returns.customer> ?cust . { OPTIONAL { ?order <Orders.shippingAddress> ?shipAddr . ?shipAddr <Addresses.city> ?shipCity . ?shipAddr <Addresses.state> ?shipState } } UNION { ?order <Returns.RMAnumber> ?rma } ?prod <Products.partNo> ?part . ?prod <Products.description> ?descr . ?cust <Customers-Employees.id> ?num . ?cust <Customers-Employees.givenName> ?given . ?cust <Customers-Employees.familyName> ?family . { ?cust <Customers.billingAddress> ?addr } UNION { ?cust <Employees.position> ?posn } ?cust <Employees.homeAddress> ?addr . ?addr <Addresses.city> ?billCity . ?addr <Addresses.state> ?billState }
These correspond to:
SELECT Products_0.partNo AS part, Products_0.description AS descr, U1.given, U1.family, U1.posn,U0.rma, Addresses_1.city AS billCity, Addresses_1.state AS billState, U0.shipCity, U0.shipState FROM → ( → → SELECT¿↔→ → Orders_0.customer AS cust, Orders_0.product AS prod, NULL AS rma, IF(Addresses_0.city IS NULL OR Addresses_0.state IS NULL, NULL, Addresses_0.city) AS shipCity, IF(Addresses_0.city IS NULL OR Addresses_0.state IS NULL, NULL, Addresses_0.state) AS shipState FROM Orders AS Orders_0 LEFT OUTER JOIN Addresses AS Addresses_0 ON Orders_0.shippingAddress=Addresses_0.id WHERE Orders_0.customer IS NOT NULL AND Orders_0.product IS NOT NULL UNION ALL → → SELECT Returns_0.customer AS cust, Returns_0.product AS prod, Returns_0.RMAnumber AS rma, NULL AS shipCity, NULL AS shipState → FROM Returns AS Returns_0 WHERE Returns_0.customer IS NOT NULL AND Returns_0.product IS NOT NULL AND Returns_0.RMAnumber IS NOT NULL ) → → → → → → → AS U0 → → INNER JOIN Products AS Products_0 ON U0.prod=Products_0.id INNER JOIN ( SELECT Customers_0.id AS num, Customers_0.givenName AS given, Customers_0.familyName AS family, NULL AS posn, Customers_0.billingAddress AS addr FROM Customers AS Customers_0 WHERE Customers_0.id IS NOT NULL AND Customers_0.givenName IS NOT NULL AND Customers_0.familyName IS NOT NULL AND Customers_0.billingAddress IS NOT NULL UNION ALL SELECT Employees_0.id AS num, Employees_0.givenName AS given, Employees_0.familyName AS family, Employees_0.position AS posn, Employees_0.homeAddress AS addr FROM Employees AS Employees_0 WHERE Employees_0.id IS NOT NULL AND Employees_0.givenName IS NOT NULL AND Employees_0.familyName IS NOT NULL AND Employees_0.position IS NOT NULL AND Employees_0.homeAddress IS NOT NULL ) AS U1 ON U1.num=U0.cust INNER JOIN Addresses AS Addresses_1 ON U1.addr=Addresses_1.id WHERE Products_0.partNo IS NOT NULL AND Products_0.description IS NOT NULL AND Addresses_1.city IS NOT NULL AND Addresses_1.state IS NOT NULL;
part | descr | given | family | posn | rma | billCity | billState | shipCity | shipState |
---|---|---|---|---|---|---|---|---|---|
G1013 | pool | Biff | Thompson | EdgeCity | AV | ||||
G0max5 | skateboard | Chip | Thompson | EdgeCity | AV | ||||
G0jwl13 | nose ring | Chip | Thompson | EdgeCity | AV | ||||
G0jpl56 | other ring | Chip | Thompson | EdgeCity | AV | EdgeCity | AV | ||
G1grl5 | grill | Willie | Wonka | mad scientist | 湘南台 | 神奈川 | |||
G0rk5 | rebelious music | Chip | Thompson | fickle2 | EdgeCity | AV |
The SPARQL language supports named graphs
which identify the source of information. SPARQL queries can use named graphs to allow the query to direct portions of the data collection/unification to occur at any data source which supports a query protocol. These sources may be SPARQL endpoints or other databases that support SPARQL or some similarly expressive language. The FeDeRate for Drug Research demonstrates such a query in the pharmaceutical domain.
Practically, the engine doing the federation needs to know what language to use to speak to the remote data source. If the remote is a SPARQL endpoing (something that speaks the SPARQL Protocol, the protocol will be HTTP. If it's another MySQL database, the federating engine can use either SQL or SPARQL over the MySQL protocol.
One way to federate queries is to ask the federated query one time for each row in the current working set, substituting in all the bound variables in each row. This process is described in FeDeRate for Drug Research. This process works for SQL, SPARQL and XQuery (untried, but theoretically true) remote query services.
I have no eartly idea how to implement this in MySQL. Fortunately, the two methods described below allow all of the relevent bindings to be shipped to the remote query service at once.
The SPARQLfed query language allows one to bundle the relevent variables from the result set with a query. Instead of the client asking a query one time per binding, the server can parse the query either construct a temporary table with the passed variable bindings, or iterate over the substitutions locally. Either way, it is quicker than having the client ask the question over and over again.
SQLfed is like SPARQLfed except that the bindings are specifically passed as a temporary table creation, inserts into that table, and a join against that table.
CREATE TABLE T23123 (name VARCHAR(255), planNo INTEGER); INSERT INTO T23123 (name, planNo) VALUES ("bob", 23), ("sue", 18), ("trina", 23); SELECT Person_0.email FROM Person AS Person_0 INNER JOIN HealthPlan AS HealthPlan_0 ON HealthPlan_0.planID=Person_0.healthPlan INNER JOIN T23123 ON T23123.name=Person_0.email AND HealthPlan_0.planID=T23123.planNo; DROP TABLE T23123;
This requires table creation, insert (, select) and table drop privileges; pretty much the keys to the castle.
Apparently, Brian Miezejewski has some code that may be helpful here. Pinging him now.
For a first exercise, the query:
SELECT orderDate FROM Orders WHERE id=2186;
corresponds to:
SELECT ?d WHERE { <Orders.id=2186> <Orders.orderDate> ?d };
It has been handy to set a break point in sql_yacc.cc at switch (yyn)
in order to see all the functions invoked by grammatical actions (see Query Construction) .
I basically copied the Yacker-generated parser into the source tree and added some links to it from mysql_select.
@@@
The sparql directory has all the pieces needed to patch and re-build a MySQL with SPARQL support. One can build from the debian source:
apt-get source mysql-server
apt-get build-dep mysql-server
--with-debug
in debian/rules
(mind the trailing '\'s)mkdir tmp4151 && (tar cf - mysql-dfsg-5.0-5.0.41/ | (cd tmp4151 && tar xf -)) && mv tmp4151/mysql-dfsg-5.0-5.0.41/ mysql-dfsg-5.0-5.0.41-spasql && rmdir tmp4151/
mysql-dfsg-5.0-5.0.41-spasql/
if you did the above step)
cd mysql-dfsg-5.0-5.0.41/
patch -p1 < ~/checkouts/dev.w3.org/2006/spasql/mysql5/mysql+sparql-server-5.0.41.patchSee A short introduction to GNU diff and patch for an explanation of
patch
.fakeroot debian/rules binary
Each time you make a mistake, you can rm configure-stamp
to force autoconf to run again.
or just install a pre-built binary:
MySQL 5.0
is the current distribution branch. The SPASQL-MySQL-5.0 patches statically link in the sparql parser. Queries are made with any MySQL 5.x client like so:
SPARQL: SELECT * WHERE { ?order <Orders.orderDate> ?date }
MySQL 5.1
is the development branch. There are patches for a dynamically-loadable parser, a fixup to use the TCP port defined in configure
invocation, and a patch to use the /etc directory underneath your configured prefix: (dynParser, portFixup, etcFixup)MySQL 5.2
is next intended distribution branch. (dynParser, portFixup, etcFixup)The free bitkeeper client bkf
is described on this page on Installing from the Development Source Tree.
bkf clone bk://mysql.bkbits.net/mysql-5.2 mysql-5.2
wget or browse and Save As a package in the MySQL 5.1 Downloads page
cd mysql-5.2/(or whichever version you obtained)
BUILD/autorun.sh(if it exists, you need it. if not, you don't.) I need to run this twice!
Install whichever patches you've chosen above (you need at least dynParser).
patch -p 0 < dynParser-20070903.1.patch patch -p 0 < portFixup-20070903.1.patch patch -p 0 < etcFixup-20070903.1.patch
./configure --prefix=/usr/local/SPASQL \ --with-unix-socket-path=/usr/local/SPASQL/var/run/mysql.sock \ --localstatedir=/usr/local/SPASQL/var/lib/mysql \ --with-tcp-port=3307 --with-server-suffix=-dynParser \ '--with-comment=with dynamically loadable parsers' --enable-shared \ --enable-static --enable-thread-safe-client --enable-assembler \ --enable-local-infile --with-big-tables --with-mysqld-user=mysql \ --with-libwrap --with-vio --with-ssl --without-docs --with-bench \ --without-readline --with-extra-charsets=all --with-innodb --with-isam \ --with-archive-storage-engine --with-csv-storage-engine \ --with-federated-storage-engine --with-blackhole-storage-engine \ --with-ndbcluster --with-ndb-shm --without-ndb-sci --without-ndb-test \ --with-embedded-server --with-embedded-privilege-control --with-debug \ --with-ndb-docs \ CC=gcc 'CFLAGS=-DBIG_JOINS=1 -O0' CXX=g++ 'CXXFLAGS=-DBIG_JOINS=1 -felide-constructors -fno-rtti -O0' \ CFLAGS='-g -DDBUG_ON -DSAFE_MUTEX -DBIG_JOINS=1 -O0 ' \ CXXFLAGS='-g -DDBUG_ON -DSAFE_MUTEX -DBIG_JOINS=1 -felide-constructors -fno-rtti -O0 -fno-implicit-templates -fno-exceptions -fno-rtti'
Note that this installs your MySQL installation in /usr/local/SPASQL, defaulting to use port 3307. This takes about 5 minutes and lots of output, like this sample output from MySQL 5.2
Building MySQL from the bitkeeper source requires compilation of sql/sql_yacc.yy
, which is problematic with older versions of bison
(even the one in debian's unstable distribution!). You can avoid this problem by copying the appropriate versions of sql_yacc.h
and sql_yacc.cc
: (5.2: sql_yacc.h, sql_yacc.cc) (5.1: sql_yacc.h, sql_yacc.cc) into the sql
directory.
wget -O sql/sql_yacc.h http://dev.w3.org/cvsweb/~checkout~/2006/spasql/mysql5/sql_yacc-5.2.h?rev=HEAD\&content-type=text/x-c%2B%2Bhdr wget -O sql/sql_yacc.cc http://dev.w3.org/cvsweb/~checkout~/2006/spasql/mysql5/sql_yacc-5.2.cc?rev=HEAD\&content-type=text/x-c%2B%2Bhdr
Build and install the the code in /usr/local/SPASQL.
make && make install
This takes around an hour and should give output like this sample output from MySQL 5.2
Check out the SPASQL sources and build libmysql.so:
cvs -d :pserver:[email protected]:/sources/public checkout -d sparql 2006/spasql/mysql5/sparql cd sparql make libsparql.so
Make the directories not created in the make install
:
cd /usr/local/SPASQL/ && mkdir -p var/lib/mysql && mkdir -p var/run
Set up the auth tables
./bin/mysql_install_db --basedir /usr/local/SPASQL --datadir /usr/local/SPASQL/var/lib/mysql
Start mysqld in the background (or in another window):
./libexec/mysqld ↦
Get some data to play with:
wget -O makeOrderTracking.sql http://dev.w3.org/cvsweb/~checkout~/perl/modules/W3C/Rdf/test/makeOrderTracking.sql?rev=HEAD ./bin/mysqladmin -u root create OrderTracking ./bin/mysql -u root OrderTracking < makeOrderTracking.sql
Load the SPARQL parser and make a query:
./bin/mysql -u root OrderTracking SET GLOBAL dynamic_parser2='/tmp/mysql-5.2/sparql/libsparql.so'; parser2: SELECT ?d ?name WHERE { ?o?d . ?o ?c . ?c ?name };
You should see four results to that last query.
The dynamically loadable parser provides a very nice way to develop and update the SPASQL/MySQL implementation (or any other language) without compiling MySQL. It extends the client/server protocol with flags for 3 user parsers in addition to the regular SQL parser. The user and set and query the dynamic parser through the system variables dymanic_parser1
, dymanic_parser2
, dymanic_parser3
. Parsers are loaded by setting the dynamic_parserN
global variable.
SET GLOBAL dynamic_parser2='libsparql.so' Query OK, 0 rows affected (0.00 sec) mysql> show variables like '%parse%'; +-----------------+--------------+ | Variable_name | Value | +-----------------+--------------+ | dynamic_parser1 | | | dynamic_parser2 | libsparql.so | | dynamic_parser3 | | +-----------------+--------------+ 3 rows in set (2.61 sec)
The mysql client has been extended to spot queries starting with "parser1", "parser2" or "parser3" and pass them to the server with the appropriate request identifier.
The dynamic parsers are derived from the class Dynamic_parser
, defined in include/dyn_parser.h
. Each library object has an initialize
function which takes a THD pointer and a pointer to the string to be parsed. (In this implementation, the parser may modify or truncate the string; sql_parse simply deletes it when the parser is done.) initialize
returns a parser object. mysqld
calls the bool parse()
function and continues throgh the rest of the code path as it would have after calling parse_sql
.
API extensions seem likely, however, they do not affect the stability of the rest of MySQL.
$Log: Overview.html,v $ Revision 1.24 2008/02/18 15:37:23 eric + Next Steps Revision 1.23 2008/02/15 03:22:14 eric ~ found 5.2 patches Revision 1.22 2007/09/05 20:01:51 eric ~ reflect updates to SPASQL/MySQL impl (5.1, 5.2) Revision 1.21 2007/06/19 23:06:58 eric + (most of?) Query Federation plan Revision 1.20 2007/06/19 22:12:33 eric + W3C icon Revision 1.19 2007/06/19 22:08:16 eric ~ navigation bar Revision 1.18 2007/05/25 20:18:57 eric + more specific patch instructions Revision 1.17 2006/05/27 13:51:12 eric + started Building on Debian section Revision 1.16 2006/05/15 10:47:54 eric + subsections for the parts flow example Revision 1.15 2006/05/14 19:23:35 eric fix city and state Revision 1.14 2006/05/06 23:20:49 eric ~ fixed parts flow SQL Revision 1.13 2006/05/06 00:47:38 eric added states to the parts flow example Revision 1.12 2006/05/06 00:40:09 eric + details from SQL -> RDF model mapping Revision 1.11 2006/05/01 21:29:24 eric + parts flow example