Proposing two new SW Interest Group Task Forces
One of the exciting events of the past few months was the joint announcement of schema.org from three major search engine providers (Google, Yahoo, and Microsoft). It was a major step in the recognition that structured data, embedded in Web pages or otherwise, has a huge role to play on the Web. Put another way: structured data on web sites is definitely now mainstream.
The role of the schema.org site is twofold. It defines a family of vocabularies that search engines "understand"; although these vocabularies are still evolving, they reflect the areas that search engines consider as most important for average Web pages. Independent of the vocabularies, schema.org also defines the syntax that search engines understand, i.e., how the vocabularies should be embedded in an HTML page. At the moment the emphasis from schema.org is on the usage of microdata.
As with all such important events, the announcement of the schema.org site has generated lots of discussion on the blogosphere, on different mailing lists, twitter, and so on. The discussion crystallized around two, technically different set of issues:
- What is the evolution path of the schema.org vocabularies; how do they relate to vocabulary developments around the world that has already brought us such widely used vocabularies like Dublin Core, GoodRelations, FOAF, vCard, the different microformat vocabularies, etc?
- What is the role of RDFa and microformats for search engines; would search providers also accept RDFa 1.1 or microformats as an alternative encoding of structured data? This also raises the more general issue on how microdata and RDFa relate to one another as W3C specifications, and to microformats, independently of the specific vocabularies.
These issues will be discussed on the upcoming schema.org workshop in Mountain View, CA, on 21 September. They are also within scope of discussion within Semantic Web Interest Group (SWIG). Accordingly, as a result of a variety of discussions, I am proposing two new SWIG Task Forces to discuss these and flesh out solutions. Note that this is also related to a TAG request from June. Assuming the proposals are approved, the two Task Forces will be:
- Web Schemas Task Force, to be chaired by R.V. Guha (Google), concentrating on general vocabulary-related discussions. The Task Force's focus should be on collaboration around vocabularies, mappings between them, and around syntax-neutral vocabulary design and tooling. Issues like convergence of various vocabulary schemas, use cases, tools and techniques, documentation of mappings and equivalences between schemas, should all be in scope for this Task Force.
- HTML Data Task Force, to be chaired by Jeni Tennison, should conduct a technical analysis on the relationship between RDFa and microdata and how data expressed in the different formats can be combined by consumers. This Task Force may propose modifications in the form of bug reports and change proposals on the microdata and/or RDFa specifications where they would help users to easily translate between the two syntaxes or use them together. The Task Force should also work on a general approach for the mapping of microdata to RDF, as well as the mapping of RDFa to microdata JSON.
Both Task Forces should be public, both in terms of joining the respective mailing lists or following the discussions via the public archives.
Everybody is welcome!
The “HTML Data Task Force” link points to the same page as the “Web Schemas Task Force” one.
Thanks, Vasiliy for the report; link updated on "HTML Data Task Force".
Indeed, thanks Vasiliy, and also thanks Coralie for having changed it... This is what happens when a blog goes out very early in the morning (which was the case...)
Based on my understanding, Microsoft, Google, and Yahoo are all members of the W3C. I have also taken a quick look and the syntax and vocabulary of schema.org's structured data format.
schema.org's format is very similar to RDFa. Why doesn't these "members" of the W3C simply support RDFa instead of simply renaming all the characteristics of RDFa? Is RDFa still a viable option?
I understand that vocabularies can benefit from being decentralized, but is it really necessary to create a "new" syntax when a standardize one is available to do the same job?
In my opinion, this is why the vision of the Semantic Web will not be realized. If the big players in the game, particularly those who are supposed members of the standards organization responsible for bringing that vision to reality, are not supporting the works of the W3C, what is this truly communicating to those who are anxiously waiting to help reach full standardization and semantics of the web?
My main reaction is to the last bit of the description of the HTML Data task force. It is very premature to define a mapping of microdata to RDF/a and doing so prematurely could positively damage the ability to determine meaning on the web. So I would strike "The Task Force should also work on a general approach for the mapping of microdata to RDF, as well as the mapping of RDFa to microdata JSON."
Hi Alan: first of all, there is no mapping of microdata to RDFa, only to RDF. But, I must admit, I am not sure why defining such a mapping would damage the meaning on the Web. Remember that this is Task Force, ie, the mapping that the Task Force will possibly define is not a Recommendation. In other words, though it may become the basis for one, it will have to undergo further "testing" by the public in general. On the other hand, if such a mapping is available and is compatible with the RDF that a similar page using RDFa would yield, this ensures an interoperability between the two syntaxes, which is a really important goal.
B.t.w., there are some more details in a blog that has been just published on the workshop.