TPAC/2011/Semantic Syntaxes

From W3C Wiki
< TPAC‎ | 2011

Semantic Syntaxes

  • Proposer: Tantek Çelik
  • Discussion Leader: Ben Adida
  • Type of session: discussion

At the recent schema.org workshop, there was quite a bit of discussion of what syntax to use for adding semantic information to HTML documents from among: microdata, microformats, RDFa.

Ben Adida presented on the evolution of RDFa 1.1 and RDF 1.1 lite, and noted how RDFa has based many simplifications on microformats' syntax.

microdata itself has been evolving since it was first proposed, based on use-cases provided by RDFa proponents.

microformats has also been evolving with microformats 2, and most recently is proposing to use the "itemref" innovation of microdata over the previous "include-pattern"

It was clear from the discussion in the room that multiple syntaxes are actively co-evolving and learning from/with each other.

If you're interested in semantic syntaxes (microdata, microformats 2.0, RDFa) this session is for you. Topics:

  • How are syntaxes evolving?
  • What features are syntaxes borrowing from each other?
  • Is there a common (JSON?) data model that syntaxes are converging on?

notes

Scribe Fabien Gandon FabGandon on channel #semsyn of irc:irc.w3.org:6665

FabGandon changed topic to : Semantics and Syntax what syntax to use for adding semantic information to HTML http://www.w3.org/wiki/TPAC2011/Semantic_Syntaxes
FabGandon: Ben Adida opening the session with the history of microformat the small “s” semantic web.
FabGandon: ... vcard, events, places ...
tantek: please capture notes persistently on the wiki page also: http://www.w3.org/wiki/TPAC2011/Semantic_Syntaxes#notes
FabGandon: ... at the time at creative common and micro-format looked grat
FabGandon: ... Ben commenting on http://en.wikipedia.org/wiki/Microformat
FabGandon: ... microformat didn't look the right solution because remixing was not a use case
FabGandon: ... RDFa was a reaction to this
FabGandon: ... many people think RDFa is only for XHTML
FabGandon: ... (showing RDFa 1.1 Lite markup example
FabGandon: ... http://manu.sporny.org/2011/rdfa-lite/
FabGandon: ... RDFa 1.1 : profiles, vocab, etc.
JeniT: RDFa 1.1 Lite W3C Editors Draft is at http://www.w3.org/2010/02/rdfa/sources/rdfa-lite/Overview-src.html
FabGandon: ... SearchMonkey 2006 Yahoo product using RDFa to customize search results http://developer.yahoo.com/searchmonkey/
FabGandon: ... really nice pipeline
FabGandon: tantek: Yahoo never actually deployed such appraoch in their main search engine
FabGandon: ... only used in Search Monkey
KevinMarks: "energetic discussions with Ian" is my new band name
FabGandon: Ben Adida: Microdata focusing on search engine by Ian Hickson http://www.w3.org/TR/microdata/
FabGandon: ... another syntax where you can plug any vocabulary
tantek: Hixie defined a syntax in microdata, in microdata, and a licensing vocabulary and also a set of sample vocabularies, vcard in microdata, vevent (from iCalendar
FabGandon: ... RDFa makes it easy to mix different vocs, Microdata simplifies the way the page is adorned with property-value pairs
FabGandon: ... http://schema.org/ provides a list of vocs using microdata see http://schema.org/docs/schemas.html
FabGandon: ... Tantek made the point at the Schema.org workshop that multiple syntaxes is a good thing.
KevinMarks: also http://www.data-vocabulary.org/ was hixie's original schema home for microdata
FabGandon: Tantek: Timeline: microformats -> RDFa 1.0 -> Microdata -> RDFa 1.1 -> Microformats 2.0 each one building on the return on experience of the previous ones
FabGandon: ... microformat 2.0 http://microformats.org/wiki/microformats-2
FabGandon: ... a surprising feedback is that even microformats are sometime a too complex syntax
FabGandon: ... microformat 2 providing a list of optimizations to simplify markup (e.g. root classes
FabGandon: ... avoid hierarchy mechanisms in the mark-up
FabGandon: ... avoid class name collisons by separating the syntax from the vocabularies
FabGandon: ... web sites updates by different peoples tend to lead to loss of markup
KevinMarks: that happened to me with Google Profiles - someone stripped out my hCard classes :(
FabGandon: ... so use prefix class names (scribe was lost here
FabGandon: ... example of microformat 2 : <a class="h-card" href="http://benward.me">Ben Ward</a>
KevinMarks: "h-*" for root class names; "p-*" for simple (text properties; "u-*" for URL properties, e.g. "u-url", "u-photo", "u-logo"; "dt-*" for datetime properties; "e-*" for properties where the entire contained element hierarchy is the value
FabGandon: Tantek: examples of prefixes h- fo root class names, p- for properties, u- for URL properties, dt- for datetime properties, e- for properties, etc
hober: Can I use different prefixes for the same property in different instances of the same format?
KevinMarks: http://microformats.org/wiki/microformats-2#naming_conventions_for_generic_parsing
KevinMarks: in principle, yes
FabGandon: Noah : question about the simplification of the hierarchy mechanisms.
FabGandon: Tantek: the voc with the less hierarchy are the ones that got most adopted.
KevinMarks: we forgot the other semantic markup <script> with JSON in...
tantek: and most reliably adopted
tantek: KevinMarks - that's invisible
FabGandon: Ben Adida: RDFa tries to reuse the RDF stack as much as possible.
tantek: and duplicated (violates DRY
KevinMarks: yes
KevinMarks: we should discuss though, and OGP way too
KevinMarks: to explain why/what
FabGandon: Greg Kellogg: mapping of microdata to RDF
KevinMarks: also mapping of microformats to RDF via GRDDL
FabGandon: ... the mapping was removed because there is no right answer identified for now.
FabGandon: ... one proposal is to come up with a form of registry of mapping
FabGandon: Phil Archer: working with European commission on developing new vocs
tantek: ... great that microformats and microdata to end up with the same JSON
FabGandon: ... are there use cases to tell me when to use microformat vs. RDFa vs microdata ?
tantek: http://www.w3.org/wiki/Html-data-tf
KevinMarks: Monica should explain OGP
JeniT: PhilA, working on it: http://www.w3.org/wiki/Choosing_an_HTML_Data_Format
FabGandon: Ben Adida: RDFa strongest when you need to mix vocs from different sources and/or when you need the RDF stack
FabGandon: Tantek: the focus on syntax is not important during dev, the important question is the voc.
FabGandon: ... let's move the hard questions to the vocabulary which is the most important for communication
FabGandon: ... if we do our job the syntax won't be the problem.
FabGandon: Monica : at social cast we use several syntaxes
KevinMarks: Monica Wilkinson: OGP puts data in the <head> - violates DRY on purpose to avoid designers 'breaking things'
FabGandon: Ian: the syntax also depends on who is the consumer
FabGandon: Key: the problem is also that many people don't know enough about these technology to even see the syntax problem.
FabGandon: Alex Russell: the problem is also that they don't perceive the added value to do that
FabGandon: ... one way to address that is through web components
FabGandon: ... encapsulate the UI value and the data value at the same time
FabGandon: Ian: microdata is also useful in drag and drop actions for instance
KevinMarks: the rel- microformats have become part of HTML5, per Alex's point
FabGandon: Hadley Beeman: how far are we in separating the syntax and vocs ?
FabGandon: Ben Adida: RDFa did that from the start
KevinMarks: "people didn't know about it" is not a feature
FabGandon: Tantek: if everyone does his own voc you end up whith babel
FabGandon: TimBL: there will always a small number of small vocs extremely used and then a long tail
FabGandon: (no way I can capture TimBL hyper-speach
tantek: Alex Russell and TimBL having a healthy discussion about vocabularies.
KevinMarks: timbl: vocabularies have a fractal nature - we should not build just for the big head or long tail of vocabularies
KevinMarks: timbl: it worries me when you say "we built the web in wishy-washy way, so we can do this in wishy-washy way"
KevinMarks: timbl: if I put the data on many websites I should be able to reconstitute the database table without loss
FabGandon: Alex Russell: more and more data are ending up in javascript and we need to get that back into declarative format.
FabGandon: ... meaning drifts with the updates of the system
FabGandon: ... I have hopes for slang not for fixed vocs.
FabGandon: Tantek: the immediate UI experience is the best way to have high quality data
KevinMarks: tantek: first person benefits are the greatest path to high data quality. Add to addressbook link meant that data was much better
FabGandon: ... schema.org diverges from existing vocs and that’s a mistake.
FabGandon: Noah: couldn't we use the validators to promote common practices
FabGandon: ... alert people on what is going on
FabGandon: Eric Franzon: what the state of tool dev is? plugins, etc.
FabGandon: Ben Adida: several CMS include microformat in wordpress, RDFa in drupal, etc.
FabGandon: Ben Adida: partial reuse should be suported, its better than nothing.
KevinMarks: look at the #semsyn tag on twitter too FabGandon
FabGandon: Greg Kellogg: one of the problem is the lack of indirections
FabGandon: Ben Adida: Web dev and Vocab dev are two different communities.
FabGandon: TimBL: RDF engines should be able to do the follow your nose on the voc mapping.
FabGandon: ... validator may be too far away but browsers have the ability to show the sources
FabGandon: ... this could be where we could have thye view data and be able to correct any problem before copy paste
FabGandon: Alex: who's putting the data in the page in the first place.
gkellogg_: In RDFa, @vocab allows for a form of indirection
FabGandon: ... think of it in evolutionary terms,
tantek: Steve Zilles on web developers vs. scripts putting markup on the web.
FabGandon: Steve Zilles: the web dev are not the only one to put the data in the pages, also scripts
FabGandon: Tantek: there is always a human, a human created and maintained the script.
Vincent a quitté le salon (quit: Quit: This computer has gone to sleep
FabGandon: Steve Zilles: yes but it comes from a DB with a schema.
tantek: Our experience (back at Technorati was the even data from databases (DB rots over time. Up to 30-40% of RSS/Atom feeds were broken / inconsistent with the *visible* HTML pages.
tantek: Databases / scripts are not long term.


Twitter Archive Dump

fabien_gandon	02/11/2011	19:22	breakout session #semsyn at #w3c #tpac what syntax to use to add semantic information to HTML http://t.co/cOLcNHwy
JeniT	02/11/2011	19:23	@fabien_gandon Are you going to live-tweet? #semsyn?
kevinmarks	02/11/2011	19:24	#tpac #semsyn @benadida is explaining #microformats history - the lower case semantic web http://t.co/OK2uwNaV
fabien_gandon	02/11/2011	19:26	@JeniT on irc.w3.org:6665 channel #semsyn
kevinmarks	02/11/2011	19:28	#tpac #semsyn claims @benadida remixing fields from other schemas was not a #microformats goal
hadleybeeman	02/11/2011	19:32	RDFa lite 1.1 - W3C Editor's Draft 30 October 2011, via @jeniT http://t.co/FKCteMBR #linkeddata #semsyn #TPAC
kevinmarks	02/11/2011	19:35	energetic discussions with Ian is my new band name #tpac #semsyn
eyeonprofit	02/11/2011	19:36	RT @kevinmarks: "energetic discussions with Ian" is my new band name #tpac #semsyn
bsletten	02/11/2011	19:40	RT @kevinmarks: #tpac #semsyn @benadida where #microformats, RDFa, microdata agree is on using the actual contents of the page as data (the DRY principle)
kevinmarks	02/11/2011	19:40	#tpac #semsyn @benadida where #microformats, RDFa, microdata agree is on using the actual contents of the page as data (the DRY principle)
kevinmarks	02/11/2011	19:43	#tpac #semsyn @t #microformats RDFa and microdata have all been devloped int he open, which shows that open specification works
kevinmarks	02/11/2011	19:44	#tpac #semsyn @t now explaining the http://t.co/T8obHriv - now simpler and more coherent. washes brighter.
ciberch	02/11/2011	19:44	RT @kevinmarks: #tpac #semsyn @t #microformats RDFa and microdata have all been devloped int he open, which shows that open specification works
kevinmarks	02/11/2011	19:46	#tpac #semsyn @t: every social networking site has a name, photo and URL per person, so we can assume p-name u-url and u-photo for h-card
kevinmarks	02/11/2011	19:47	#tpac #semsyn @t: the more complex and hierarchical the syntax is, the more it reduces data quality (per Guha)
kevinmarks	02/11/2011	19:48	#tpac #semsyn @t there was no way to write a generic #microformats parser - with http://t.co/T8obHriv this is possible
kevinmarks	02/11/2011	20:01	#tpac #semsyn @benadida RDFa is at its best when you want to mix already-existing vocabularies without seeking consensus or need RDF stack
kevinmarks	02/11/2011	20:02	#tpac #semsyn @t the right thing to do is develop an open vocabulary first, then worry about the syntactic mapping to #microformats et al
kevinmarks	02/11/2011	20:06	#tpac #semsyn @t the vocabulary is about agreement; people stripping out code is a syntax issue
kevinmarks	02/11/2011	20:08	#tpac #semsyn Alex Russell:we get to a point where the search engine pipeline and the end-user are seeing different things on the page
kevinmarks	02/11/2011	20:08	#tpac #semsyn Alex Russell: when you mark up with #microfromats et al you aren't directly addressing the primary user of your page
kevinmarks	02/11/2011	20:10	#tpac #semsyn @slightlylate: we should treat these syntaxes as things that should be in HTML eventually and become first class
kevinmarks	02/11/2011	20:10	#tpac #semsyn @slightlylate: data we mark up is probabalistically semantic - not first-person semantic
kevinmarks	02/11/2011	20:14	#tpac #semsyn @timberners_lee vocabularies have a fractal nature - we should not build just for the big head or long tail of vocabularies
kevinmarks	02/11/2011	20:15	#tpac #semsyn @slightlylate: yes data is wishy washy - enterprise cases are full of this
kevinmarks	02/11/2011	20:15	#tpac #semsyn @timberners_lee: it worries me when you say "we built the web in wishy-washy way, so we can do this in wishy-washy way"
kevinmarks	02/11/2011	20:16	#tpac #semsyn @timberners_lee: if I put the data on many websites I should be able to reconstitute the database table without loss
LogicalB0T	02/11/2011	20:16	Fascinating. RT @kevinmarks - #tpac #semsyn @slightlylate: yes data is wishy washy - enterprise cases are full of this
kevinmarks	02/11/2011	20:18	#tpac #semsyn @slightlylate: I see more and more data in JSON on the web, and if we want a declarative form people make a second version
kevinmarks	02/11/2011	20:19	#tpac #semsyn @timberners_lee: data cleanliness is always a problem
kevinmarks	02/11/2011	20:20	#tpac #semsyn @slightlylate: meaning drifts over time - we're not going to get there by defining ontologies ahead of time
kevinmarks	02/11/2011	20:21	#tpac #semsyn @t: first person benefits are the greatest path to high data quality. Add to addressbook link meant that data was much better
kevinmarks	02/11/2011	20:22	#tpac #semsyn @t: if you're making up semantics for the sake of it, it will rot. 'you might someday look nicer in a search engine' !enough
kevinmarks	02/11/2011	20:23	#tpac #semsyn @t: RFC 6350 - vcard4 drew on Portable Contacts, hCard experience. http://t.co/K9aXzf9R Person ignored this
MartijnLinssen	02/11/2011	20:24	@kevinmarks With all due disrespect, W3C is a tech-fest run by nerds. We need business standards #tpac #semsyn
ciberch	02/11/2011	20:24	What are the main use cases for #semsyn (micro formats, microdata, RDFa), stream publishing ? ala #facebook
kevinmarks	02/11/2011	20:24	#tpac #semsyn @t: http://t.co/K9aXzf9R diverged from every existing vocabulary arbitrarily. and made things worse.
ciberch	02/11/2011	20:25	Or HTML APIs ? #semsyn
kevinmarks	02/11/2011	20:29	#tpac #semsyn @ciberch: having HTML APIs that make sense of the data on the page will drive this (see http://t.co/CKSXwVML )
tonyfish	02/11/2011	20:30	RT @kevinmarks: #tpac #semsyn @timberners_lee: data cleanliness is always a problem
tonyfish	02/11/2011	20:30	RT @kevinmarks: #tpac #semsyn @timberners_lee: if I put the data on many websites I should be able to reconstitute the database table without loss
tonyfish	02/11/2011	20:30	RT @kevinmarks: #tpac #semsyn @slightlylate: yes data is wishy washy - enterprise cases are full of this
tonyfish	02/11/2011	20:31	RT @kevinmarks: #tpac #semsyn @timberners_lee: it worries me when you say "we built the web in wishy-washy way, so we can do this in wishy-washy way"
kevinmarks	02/11/2011	20:32	#tpac #semsyn @t: as soon as you say indirection or subclass, you've lost most web developers @benadida: save pain for vocab developers
kevinmarks	02/11/2011	20:33	#tpac #semsyn @timberners_lee: I like what python does - from foaf import date - can bring in namespace pieces from elsewhere
kevinmarks	02/11/2011	20:34	#tpac #semsyn @timberners_lee: just as a browser has view source - we should have view data too
kevinmarks	02/11/2011	20:34	#tpac #semsyn http://t.co/wKzNTpjI enables bringing in a vocabulary to define keys
kevinmarks	02/11/2011	20:35	#tpac #semsyn @slightlylate: you view source on something to work out how it was done and borrow it for your own site.
ciberch	02/11/2011	20:39	@kevinmarks yup ideally we will move away from js apis that build iframes to pull html markup for a widget #semsyn
hadleybeeman	02/11/2011	20:41	Very good session on semantic syntaxes: RDFa, microformats and microdata run by @benadida & @t. #semsyn #TPAC

related