Matching entities across data sources using different identifiers and formats is a pervasive issue on the web. This group revolves around developing a web API that data providers can expose, which eases the reconciliation of third-party data to their own identifiers. OpenRefine's reconciliation API is used as a starting point. Our goals are to document this existing API, share our experiences and lessons learnt from it, propose an improved protocol in the view of promoting it as a standard, and build tooling around it. A description of the existing protocol can be found here: https://reconciliation-api.github.io/specs/latest/
Note: Community Groups are proposed and run by the community. Although W3C hosts these
conversations, the groups do not necessarily represent the views of the W3C Membership or staff.
We have surely missed some more: if so, let us know on the mailing list or during our monthly meetings. May 2024 bring more of those exciting developments!
We are happy to announce that we have released the version 0.2 of the specifications. This version adds a range of mostly backwards-compatible features to the original API used by OpenRefine (which was released as version 0.1). Here is a highlight of the most noticeable changes:
Services can require authentication using a range of methods, taken from the OpenAPI specifications;
Exposing a type hierarchy has been made possible;
Reconciliation candidates can expose individual reconciliation features, for cases where the global matching score is not precise enough;
Reconciling without supplying entity names (but only properties) was enabled.
After this release, our intention is to rework the structure of the API to make it more compliant with the REST principles. This will result in various incompatible changes but should make it easier to implement reconciliation services in modern web frameworks. As always, feel free to join the discussion on our mailing list, GitHub and in our monthly video meetings.
The Ontotext team has published not just one, but three reconciliation endpoints for subsets of Wikidata: for people, organizations and locations. The endpoints are much faster than other endpoints based on the Wikidata API, thanks to their own indexing of those subsets in Elasticsearch. They explain the architecture of their services in this presentation at the Knowledge Graph Forum:
Brick is a uniform metadata schema for buildings. The goal of the project is to represent subsystems in a building, independently of their vendors, providing a standard for building management systems. And it offers a reconciliation service for its vocabulary.
The OpenRefine team is building a reconciliation service for Wikimedia Commons. The goal is to help the transition of the platform from text-based metadata to structured data built on top of Wikibase. This will is accompanied by work on OpenRefine itself to generalize its Wikibase integration.
This hopefully illustrates the diversity of use cases and stakeholders around our protocol. Many other services can be found on the reconciliation test bench. If yours is missing, register it now!
I’ve recently had the opportunity to briefly present our Community Group and what we do in a lightning talk at SWIB20, this years iteration of the annual (and this year digital) Semantic Web in Libraries conference (slides, video):
OpenRefine, and in particular its reconciliation feature, are widely used in the library world, where authority files are an established part of traditional cataloging workflows. Early reconciliation data sources for library use cases include FAST, VIAF, and VIVO.
Our Open Infrastructure team at hbz is offering a reconciliation service for the Integrated Authority File (GND). The GND is the main authority file in the German-speaking library field. It contains persons and corporations, subject headings, geographical entities, events, and works. With our reconciliation service, we’re building a bridge from a traditional library dataset to new applications within and outside the library domain, e.g. in the (German-speaking) digital humanities. This complements the general development of the GND in recent years, especially within the GND4C project, of opening up organizational structures, processes, data models, and tooling of the GND to other cultural heritage institutions like archives and museums.
Besides services, the library world is also the source of new clients that interact with services using the reconciliation API. Two of the known clients are from the library domain: AlmaRefine and Cocoda. Managing, identifying, and connecting entities is at the very core of librarianship, making it an ideal field for the goals of our Community Group.
Therefore, I’m very happy to join Antonin as co-chair of our group. I’m looking forward to help advancing and promoting our goal of a common protocol for data matching on the Web, both in the library field and beyond.
The reconciliation test bench developed by our Community Group gives an overview of the API features supported by reconciliation endpoints available online. It also lets developers try out their service interactively, helping them improve reconciliation quality and user experience.
Today, lobid announced that their GND reconciliation endpoint now implements the Suggest API, which helps users select entities, properties and types from OpenRefine’s user interface. They report that the test bench was used to plan and test this improvement. We hope this will encourage other services to implement more aspects of the API.
If you want to get involved with improving the test bench, head over to its GitHub repository.
We have started to map the existing environment around entity reconciliation on the Web. Our goal is to get a complete picture of all the data providers, clients, protocols, tools and other resources which are relevant to our community group.
This effort is happening on GitHub: the reconciliation-api/census repository hosts it as a collection of markdown files, which are exposed as a website at https://reconciliation-api.github.io/census/. If you are aware of anything even remotely related to entity matching on the Web, please add it there.
Our charter is still not final – feel free to tweak it. And if you want to get involved in running the group, it would be great to have more chairs.
Matching entities across data sources using different identifiers and formats is a pervasive issue on the web.
This group revolves around developing a web API that data providers can expose, which eases the reconciliation of third-party data to their own identifiers. OpenRefine’s reconciliation API is used as a starting point. Our goals are to document this existing API, share our experiences and lessons learnt from it, propose an improved protocol in the view of promoting it as a standard, and build tooling around it.
A description of the existing protocol can be found here:
https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API
This is a community initiative. This group was originally proposed on 2019-06-08 by Antonin Delpeuch. The following people supported its creation: Antonin Delpeuch, Ettore Rizza, Owen Stephens, Juliane Schneider, Ethan Gruber, Thad Guidry, Christina Harlow, Markus Mandalka. W3C’s hosting of this group does not imply endorsement of the activities.