This document describes the release of non-private WWW2005 data via an RDF SPARQL query. The query is transformed to an SQL query over a normalized conventional database containing the WWW2005 conference data.
This documents experiments by the author. It is not endorsed by the W3C Team or Membership.
There are many benefits to making relational data available on the semantic web. There is a wealth of relational data; database schemas help keep this data accurate semantically consistent; regular access by SQL helps keep the data in gor form for machine processing. This data is the key to much of our infrastructure. Providing access to this data will encourage the adoption of semantic web tools for infrastructure support.
Relational data may easily be copied into RDF by script. The approach taken here was to declaratively describe the WWW2005 conference data and use FeDeRate to transform SPARQL (an RDF query language) queries into SQL queries. Declarative database descriptions are easier to maintain and share amongst applications. FeDeRate also uses these descriptions to provide update access to the authoritative relational data.
Much of our infrastructure depends on machine-processable data in relational databases. In a sense, that some data is in a relational database shows that it was valuable enough to maeke it available to machines. Making this data available on the semantic web encourages a smooth adoption of the semantic web tools into our daily lives.
This demo shows how RDF queries can be translated to SQL queries over a conventional relational database. The query presents an intuitive, schema independent interface to the data. The generated SQL shows that for many practical queries, a simple RDF query mapping to SQL produces queries that are as good as anything hand-coded.
A more terse presentation of this material is available in a WWW2005 talk ([Talk]).
Given non-private conference data availabe in RDF, the user can make the a SPARQL query (HTML demo interface) and learn which talks are presented by "Bing Liu":
name | roomDesc | start | end |
---|---|---|---|
"Visualizing and Comparing Opinions on the Web" | "Room 303" | 10:30 | 12:00 |
"Web Data Extraction Based on Partial Tree Alignmen" | "Room 303" | 10:50 | 12:20 |
PREFIX w5: <http://www.w3.org/2004/10/18-RDF-WWW2005/#> PREFIX xs: <http://www.w3.org/2001/XMLSchema#> SELECT ?name ?roomDesc ?start ?end WHERE { ?author w5:authorName "Bing Liu" . ?author w5:wrote ?paper . ?paper w5:paperName ?name . ?paper w5:inSession ?session . ?session w5:Session_room ?room . ?room w5:Room_description ?roomDesc . ?program w5:Program_session ?session . ?program w5:start ?start . ?program w5:end ?end }
The above query uses a machine-readable description of the database to create the following SQL query:
SELECT STRAIGHT_JOIN Author_0._id AS author__id, Paper_0._id AS paper__id, Paper_0.name AS name_name, Session_0._id AS session__id, Room_0._id AS room__id, Room_0.description AS roomDesc_description, Program_0._id AS program__id, Program_0.start1 AS start_start1, Program_0.end1 AS end_end1 FROM Author AS Author_0 INNER JOIN Paper AS Paper_0 ON Author_0.paper=Paper_0._id INNER JOIN Session AS Session_0 ON Paper_0.session=Session_0._id INNER JOIN Room AS Room_0 ON Session_0.room=Room_0._id INNER JOIN Program AS Program_0 ON Program_0.session=Session_0._id WHERE Author_0.name="Bing Liu" GROUP BY author__id,paper__id,name_name,session__id,room__id,roomDesc_description,program__id,start_start1,end_end1
SQL queries generated from SPARQL queries can be seen by enabling the HTML rendering option processing messages
(see the SQL for the above query).
Most of the database description comes from a properties file describing access to the database and some mappings of RDF properties to database fields:
user: xxx password: yyy database: WWW2005 queryHost: localhost schemaNS: http://www.w3.org/2004/10/18-RDF-WWW2005/# predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#wrote => Author.paper predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#authorName => Author.name predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#paperName => Paper.name predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#paperEmail => Paper.email predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#inSession => Paper.session predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#start => Program.start1 predicateMap: http://www.w3.org/2004/10/18-RDF-WWW2005/#end => Program.end1 auth.group.Paper.email: registered
The top section includes database access information. The schemaNS
specifies the base URI for the row and property identifiers to provide this database with global identifiers. predicateMap
s provide aliases for these properties. They hide the actual field names from the query writer. This is handly because often databases evlove with some rather arcane field names (such as Program.start1
above). The auth
directive is described below in .
This data should be represented in RDF instead of a Java properties file. At that point, it can merge with other system data and the predicateMaps will be simple rules. (This is already accomplished by interface rules, but the predicateMap
s are easier to explain and debug.) Note that wrote
is a reciporical property to dc:creator
. One need only consider how one would like that data to appear in the semantic web to set up predicate map
s.
The only data that was in the relation data that was not available on the website was the email addres contacts for associated with each paper. No query may divulge or even confirm these email addresses. The directive
auth.group.Paper.email: registered
prevents FeDeRate from processing any query involving the Paper.email
field for any user who is not in the registered
group. This group is currently empty, except for the author, but is intended populated by all the authors of the conference papers. Properties could be defined (and code implemented) to make sure that only the author could see the email
field. This will be most useful when the properties file is in RDF.
The predicateMap
s described above are a simple form of interface rule. Interface rules can hide database structure as well as field names. To date, the only work with interface rules was done on the W3C ACLs database. The rules were expressed in Algae2. Here is an example:
interface (sqlDB:W3Cacls API (?sub acl:memberOf ?super) impl (#?g1 sqlDB:idInclusions.id ?sub . ?g1 sqlDB:idInclusions.groupId ?super)) interface (sqlDB:W3Cacls API (?a acl:accessor ?principal) impl (?a sqlDB:acls.id ?principal))
This section uses terms from the SPARQL query language ([SPARQL Query]).
Apart from interface rules, query translation involves a linear iteration over the triple patterns in the graph pattern, linear tree transformations, many lookups in small (order of the number of variables in the graph pattern) hash tables, and linear query serialization. In short, the query preparation is a very small amount of work.
Result processing is linear with the number of results. It involves some URI-generation, and hash lookups. By far, the largest overhead comes from re-serializing the data.
The work to make a relational database available to RDF is very small. One need only provide a properties file defining the database access parameters (host, database, name, password) and add auth
parameters to protect confidential information. For this minimal investment, the publisher offers a new kind of access to the relational data. This may be used to join with other data in other databases.
Additional modeling work in the form of predicateMap
s or interface rules is rewarded by a more intuitive interface to the data. Again, the investment is small compared to the potential benefits. It is the author's hope that this work will encourage the reader to expose more data on the semantic web.
Here are some ideas for future conferences:
The data used for the above query was from a relational database storing the WWW2005 conference data. The schema of this data is described by example with excerpts from the tables involved in the above query:
SELECT * FROM Author WHERE name="Bing Liu";
_id | paper | type | name | org |
---|---|---|---|---|
194 | 516 | author | Bing Liu | University of Illinois at Chicago |
204 | 536 | author | Bing Liu | University of Illinois at Chicago |
SELECT * FROM Paper WHERE _id IN (516,536);
_id | name | name1 | name2 | session | |
---|---|---|---|---|---|
516 | Web Data Extraction Based on Partial Tree Alignmen | Liu | Bing | [email protected] | 5 |
536 | Visualizing and Comparing Opinions on the Web | Liu | Bing | [email protected] | 23 |
SELECT * FROM Session WHERE _id IN (5,23);
_id | name | room | type | title |
---|---|---|---|---|
5 | PP03 | 8 | paper | Data Extraction |
23 | PP12 | 8 | paper | Text Analysis and Extraction |
SELECT * FROM Room WHERE _id IN (8);
_id | _key | name | description |
---|---|---|---|
8 | 303 | 303 | Room 303 |
SELECT * FROM Program WHERE session IN (5,23);
_id | date | start | end | session | start1 | end1 |
---|---|---|---|---|---|---|
5 | 11 | 10:50:00 | 12:20:00 | 5 | 2005-05-11 10:50:00 | 2005-05-11 12:20:00 |
23 | 12 | 10:30:00 | 12:00:00 | 23 | 2005-05-11 10:30:00 | 2005-05-11 12:00:00 |