Workshop Report: Linked Enterprise Data Patterns

Data-driven Applications on the Web

6-7 December 2011, Cambridge, MA
Hosted by W3C/MIT

Executive Summary

Tim Berners-Lee defined "Linked Data" as four principles that provide a very simple framework for publishing data on the World Wide Web. This has led to a remarkable evolution in the domain of publicly available data, commonly known as "Linked Open Data". However, as organizations try to follow these principles in developing a large array of applications, especially outside the domain of Linked Open Data, they face significant challenges Tim's four principles do not address. This workshop provided the community for a way to meet and discuss some of these challenges and what the W3C might do to address them.

The presentations covered many different topics, ranging from the benefits a set of additional conventions would bring to specific technical issues such as the challenges of dealing with the reality that URLs do change sometimes, as well as the need for a more robust security model, and specific gaps in the current set of standards.

The participants quickly agreed that the current lack of formal definition of "Linked Data" is detrimental to its uptake in the industry and the W3C should do something about it. After discussion on the possible paths the W3C could take, the participants unanymously concluded that the W3C should create a Working Group to create a Recommendation that formally defines a "Linked Data Platform". This is expected to be an enumeration of specifications which constitute Linked Data, with some small additional specifications to cover specific functionality such as pagination, if necessary.

IBM indicated that they would submit a specification to W3C to kick start the WG and welcome people to contact them (specifically: Arnaud Le Hors <[email protected]>) if parties are interested in helping with this in some way.

A public mailing list [email protected] (archive) was created to provide a forum to develop a charter for the WG to be proposed. In doing so it will be important to analyze any possible overlaps with the new Government Linked Data Working Group and suggest mitigating measures if appropriate.

Workshop Overview

Format

18 submissions were received, out of which 16 were selected for presentation at the workshop. Unfortunately two persons couldn't make it. 28 people attended the workshop.

The presentations were organized in four general themes:

potential/requirements/APIs
name resolution/resource replication
bridging data formats
dbs, ACLs, $$$

The first day and most of the morning of the second day were allocated to presentations and Q&A. While the last part of the second morning and afternoon were scheduled to have break-out and plenary discussions.

Workshop Day 1

Eric Prud'hommeaux started the day (minutes) with a short presentation on Linked Data and the intent of the workshop. Tim Berners-Lee followed up with a short speech during which he explained that his five star system is really about Linked Open Data and to make that clear the W3C has updated its five-star mug. He finished by stating that he believes that identifying what patterns we use and which ones have not been used due to lack of tools will help broader adoption of Linked Data.

The presentation sessions got kicked off with Martin Nally (minutes) who presented a proposal to better define Linked Data by adding several principles to Tim's set of four and advocating for the definition of some standard way of dealing with common functionality such as containers and pagination of RDF resources. Martin's presentation generated a lot of discussion, primarily in support of Martin's position.

The second presentation by Martynas Jusevicius and Julius Seporaitis (minutes) focused on the way specific Semantic Web technologies are used in the Graphity platform and the need for best practices around the use of RDF and XSLT.

Benjamin Heitmann (minutes) discussed issues regarding noisy data, model mismatch (OO, graph, relational), embedding RDF in web pages, ACLs (FOAF+SSL, RDF Push) and the need for implementation guidelines, libraries, and factories.

Bradley Allen then presented (minutes) on how Linked Data Content Management Systems can provide device-independent homogeneous access to multimedia, fine-grained addressing, shared ontologies, and stressed the need for best practices, globalization/localization, intuitive serializations and deployment design patterns. He advocated for the definition of RDF stores requirements including infrastructure supporting publication (named graphs, prov, access), validators, choosing e.g. APIs vs. SPARQL.

The afternoon started with David Wood doing his best at communicating the points made by Dominik Tomaszuk (minutes), absent at the meeting, and was followed by David Wood discussing a diverted URI pattern (minutes) allowing mirroring and caching of remote linked data by indirections based on a URI mapping.

John Arwe (minutes) talked about the fact that URLs cannot all be "cool" and do change in reality for a variety of reasons, pointing out the current lack of guidance on how to deal with the issues this creates.

Ora Lassila (minutes) talked about conflations of identity and location, versioning of data and identity, and lack of identity stability, pointing out that the existing confusion hinders interoperability.

Cornelia Davis (minutes) kicked off the last theme of the day with a presentation on a pragmatic approach to introducing semantic web technologies in the enterprise which relies on adopting linked data principles within legacy frameworks such as ATOM and XML.

Steve Battle (minutes) then presented a template language to simplify the generation of rich web pages containing metadata in RDFa and Matthew Philips concluded the first day of presentations with a discussion on issues related to federated linked library data including latency as well as license differences for different resources.

Workshop Day 2

Day 2 started with a presentation from David Booth (minutes) on an RDF pipeline providing a framework to generate and update a unified RDF representation of data on the edge.

Eric Prud'hommeaux (minutes) discussed the need for finer acceess controls and presented a possible solution.

Lee Feigenbaum (minutes) explained how data is segmented in Anzo, highlighting the importance of named graphs and their extensive use of graph names to control access.

Finally, David Schaengold (minutes) concluded the presentation section of the workshop with a talk on Revelytix and their use of RIF and several extensions they would like to be included in the standard.

The relatively small audience made it possible to have open discussions throughout the presentations and people largely took advantage of that, engaging in lively discussions throughout the first day and a half. As time went by, areas of specific interest and topics for potential future work were captured on a whiteboard.

The participants then tried to identify which of these topics the breakouts should focus on and who would participate in each of them (minutes). However, it eventually became clear that this operating mode didn't really appeal to the majority and the group decided to proceed as a general plenary instead.

The plenary discussion (minutes) quickly converged on the necessity to address the lack of clear definition of Linked Data and the challenge this creates for those who want to adopt or promote Linked Data.

During the discussions that took place around the presentations given at the workshop several documents were pointed to as possible answers to issues being brought up. The nature of these documents varied greatly, from whitepapers and educational material from external sources to W3C notes and specifications at different stages of development. This actually demonstrated the challenges adopters face today: too many options and lack of an authoritative reference.

The participants agreed that the W3C should develop such a reference.

On the other hand it took a lot of discussion to then figure out what type of document the W3C should produce (standards/recommendation, white paper/note, etc.) and in which manner (community group, working group, etc.)

Results

Below are the topics that were on the white board at the end of the presentations:

Best Practices: LD Profile, LD Service Levels, Relative URIs
minimal HTTP+RDF server
Patterns:
- POST to update collections
- pagination
- meta-data-only
- PATCH
validation/constraints
access control
RDF/JSON
RDFizers
GRDDL stylesheet repo

Eventually, the participants unanymously agreed that the W3C should create a a Working Group to produce a W3C Recommendation which defines a Linked Data Platform -- something that solves IBM Rational's use case [presented at the workshop by Martin Nally]. [The expectation is for] this to be an enumeration of specs which constitute linked data, with some small additional specs to cover things like pagination, if necessary."

Besides the need for the above recommendation, discussions during the presentations indicated a widely perceived need for outreach and tutorial material, though ultimately, the group did not identify any specific follow up action in this regard. It was noted that a clearer definition of Linked Data should make education and promotion easier and alleviate, at least to some degree, the need for additional educational material.

Participants also discussed the possibility for a Community Group to be created as a forum to discuss use cases and requirements. This was however left for the Working Group to consider as a possibility on how to proceed.

Recommended Next Steps

IBM will work on developing a submission that will provide the Working Group with a starting point. Parties interested in having some role in this are invited to contact Arnaud Le Hors.

A WG charter will be developed to be put before the W3C management and membership. In doing so it will be important to analyze any possible overlaps with the new Government Linked Data Working Group and suggest mitigating measures if appropriate.

The public mailing list [email protected] will serve as a forum for discussion of a draft charter.

Acknowledgements

The Linked Enterprise Data Patterns workshop was hosted by W3C/MIT which provided facilities, and IBM which provided refreshments and meals for both days of the workshop. The organizers and W3C are grateful for the help of Amy van der Hiel and Eric Prud'hommeaux who made all this possible by organizing the workshop logitstics.

The workshop was co-chaired by Arnaud Le Hors, Ashok Malhotra, and David Wood, assisted by Eric Prud'hommeaux. Thanks to all the participants for their dedication and for volunteering their time to further Linked Data.

Workshop Report: Linked Enterprise Data Patterns Workshop