The mission of this community group is to establish a draft standard for a RDF-based representation of the HTML-vocabulary. With the HTML-vocabulary in RDF, any type of an HTML-document can be meaningfully represented, generated and validated using nothing but standard semantic technologies, without any vendor lock-in. In addition, full provenance can be provided for a generated HTML-document, as every atom of the document can be described and semantically enriched, ex ante (RDF) and ex post (Rdfa). For instance, the originating algorithm that calculates a certain budget amount in a governmental HTML-document can be linked to the table cell containing the very value. HTML-documents have a wide variety of use and so has the HTML vocabulary. The HTML-vocabulary can be used to generate 100% correct HTML or xHTML and to validate this. The HTML vocabulary can be used to model the front end of a website or application, whereas the logic behind the front end can be captured in SHACL Advanced Features, making for a full semantic representation and execution of digital infrastructure, without any vendor lock-in. An HTML-document can be generated with full compliance to laws and regulations, as these norms can be linked and applied while using the HTML-vocabulary. With full provenance, an HTML-document can battle fake news and show realtime how certain sensitive data in the document (privacy, security) was derived.
The community group will come up with a 0.1 draft specification. This will be input for a future working group within W3C. The community group can make use of the currently available draft specification as developed by the Dutch Ministry of Finance in a working prototype for the Dutch governmental budget cycle. By starting this community group, the Dutch Ministry wants to contribute to an open source based digital infrastructure.
Note: Community Groups are proposed and run by the community. Although W3C hosts these
conversations, the groups do not necessarily represent the views of the W3C Membership or staff.
Chairs, when logged in, may publish draft and final reports. Please see report requirements.
Dear members of the HTML Vocabulary Community Group,
We are pleased to invite you to our next meeting, scheduled for Wednesday, 11th December, from 19:00–21:00 CET (Central European Time, UTC+1)[13:00–15:00 EST].
Agenda:
Presentation of the Draft Report
We will present a draft report of the RDF-based HTML vocabulary, so that you are able to give feedback on the report. See for the current status the following link to the Github repository. Under the folder Specification, you will find not only the turtle serialization of the vocabulary but also a HTML file containing the current version of the draft report. The vocabulary itself is ready and functioning! This is demonstrated by two things:
(1) the use of the latest version of the HTML vocabulary within the Dutch ministry of Finance to generate the budget tables for the formal budget bills of law;
(2) the draft report that is generated by the HTML vocabulary itself, through the open source tool OntoReSpec.
Currently, we are still finishing some details of the draft report, like a diagram of the vocabulary and some paragraphs on the structure of the vocabulary, but most of the report is finished as well. Both the vocabulary and the report are open for your feedback though! Comments, criticisms and compliments are all welcome 🙂
Discussion on a Showcase Event
We will explore organizing a showcase event for a broader audience. This open-invite event aims to share the draft report and vocabulary, raising awareness of the existence of the vocabulary, its meaning and methodology. In addition, we aim to gather further valuable feedback.
How to proceed from here
We will also discuss how to proceed from this point. What do we still want and need to achieve as a community group? How can we start a formal W3C working group trajectory? Which open points of the vocabulary are there to address within the current community group and which ones in a W3C working group?
Questions and comments
Please mark your calendars! A separate calendar invite will be sent. We look forward to your participation and insights as we progress with this important work. Only with your presence we can take the next step in helping organisations and individuals to get a better grasp of their information products.
Best regards, Flores Bakker HTML Vocabulary Community Group Chair
Exciting news on our end: a first version of a draft report for the community group regarding the HTML vocabulary is available now through the Github. This specifies the HTML vocabulary in a readable format, using the ReSpec standard. Here one can see a preview of the draft report for the W3C community group:
In this document, one can read the specification of classes, properties, shapes and the like to get an understanding of the HTML vocabulary.
The document was actually generated using OntoReSpec, an open source tool based on semantic web technology. Here comes the twist: OntoReSpec uses the HTML vocabulary to generate the very HTML document we are looking at. The HTML vocabulary proves that it works by generating its own specification.
The HTML vocabulary has changed somewhat since the last time we sent out an update. Not so much a change of course, but more of a finetuning to align as much as possible with the Living Standard of HTML itself.
Consider for example the algorithm to serialize HTML fragments, modeled through the SHACL node shape shp:HTMLFragmentSerializationAlgorithm. This is the engine of the vocabulary that does all the work.
It now calls several SPARQL functions in order to serialize a HTML document from its leaf nodes up and till the root element and the document containing it. We thus could remove four unnecessary node shapes and improve the efficiency and readability of the vocabulary.
The current version of the HTML vocabulary will now undergo acceptance testing within the Dutch Ministry of Finance; in addition, it has already been proven to work in an OntoReSpec implementation as is mentioned above.
The concepts and methodology behind the HTML vocabulary, basically a form of an abstract syntax tree, are generic and can be applied outside the HTML domain. We came up with an XML vocabulary, with which XML dialects can be modeled. See for instance draft versions of OntoSVG and OntoArchimate. Going even further, we can now also apply this to domains such as English, Python and SPARQL itself. These are for now just inspiring attempts but can be developed more seriously in the future, just like we did with the HTML vocabulary.
Are we done yet with the HTML vocabulary? No. There are still some issues, although minor ones. We still want to finetune the documentation, enriching some definitions, perhaps improving OntoReSpec on the way as well. The ReSpec document contains two oddities at this moment: (1) unnecessary warning of 10 duplicate definitions, caused by classes and properties that are identically written but for an upper case letter. Think of html:abbr (property) versus html:Abbr (class). ReSpec does not like this. I have raised an issue at the Github for ReSpec as I do not know how to handle this in ReSpec. (2) the “Latest editor’s draft:” link in the document refers to a non-existing Github page, of which I do not know its origin. I have added this to another issue at the Github of ReSpec. Finally, and most importantly, we wish to add a rudimentary validation model of the structure of an HTML document.
All in all it looks like we are going to deliver our community group draft report by the end of the year. We can organize a meeting around that time and see what we as a group still want to do with the specification and how and when we can move forward to a formal working group trajectory.
We’ll keep you updated. In the mean time, feel free to read the specification, both the HTML document and the turtle file, and let us know what you think. Are you satisfied with the current state of the HTML vocabulary? Do you have specific wishes to improve the ReSpec documentation? Critical complaints and constructive criticisms are welcome, just as well as charming compliments 🙂
Kind regards,
Flores Bakker Chair of the HTML vocabulary community group Enterprise Architect @ Dutch Ministry of Finance
Welcome to the third session for the semantic HTML-vocabulary community group! It has been a long time but not without interesting developments. Looking forward to be able to share these with you. We are also going to ask you to get involved by simply using the vocabulary and related tooling, and sharing your experiences with us. Eternal glory upon those that have done so before this meeting 😉
1. Opening of the meeting 2. Welcome by the chair 3. Introduction of new members 4. Housekeeping (secretary/notes) 5. Recent developments * Tooling (RDF2HTML and HTML2RDF) * HTML vocabulary in use at the Dutch Ministry of Finance * Updated vocabulary (html attributes, logic, labeling, general hygiene) * OntoReSpec, OntoManchester, OntoMermaid 6. Your experience in using the vocabulary and tooling 7. Draft report for the community group 8. Planning/roadmap 9. Questions and suggestions 10. Closing of the meeting
The past year has gone by without any news…despite there being many developments. My apologies for that! This was mainly due to all the developments that took all my time. In the github you will find the latests commits, including new tools like RDF2HTML and HTML2RDF. The first offers the possibility to serialize a RDF model of an HTML document into HTML and the latter gives you the possibility to parse any HTML document and express that in RDF.
Second, the HTML vocabulary has now been officially used in the budget tables for the Ministry of Finance of the Netherlands, as of the 19th of September 2023, when the tables were presented to the parliament of the Netherlands. That is some start of a vocabulary!
Third, the model has been updated. Mostly with metadata, with an exceptional small adjustment in the logic here and there, as we had to find a way to work around some bugs in PyShacl and Rdflib, without hurting our standard vocabulary too much. It now works with those tools as well. Although this has to be tested and improved much more…here lies a task for you guys 🙂 In addition, I still want to add some standard html attributes based on the Living Standard (already added many though!), rename (skos:prefLabel) the defined HTML elements without the pesky ‘<‘ and ‘>’ tags as that leads too much with unintended html rendering in other applications when dealing with our vocabulary and do some general hygiene in layout and labeling.
Fourth, this year saw the birth of an early version of OntoReSpec, the ontology specification generator according to the ReSpec standard. You can offer your ontology and then the tool creates a HTML document in which your ontology is nicely presented. OntoReSpec is based on the semantic HTML vocabulary. It is the second use case for our semantic HTML vocabulary. No more need for laborious documentation writing by hand to describe your ontology, instead let it generate documentation based on the ontology itself. OntoReSpec is still in development as of now.
Fifth, related to OntoReSpec I also had to come up with a Manchester Syntax repository in order to represent OWL ontologies in an accessible language instead of very technical OWL terms. This can be used to explain your model in more simple terms.
And last but not least, also related to OntoReSpec, there is the Mermaid repository to express OWL ontologies into the Mermaid diagram language. Feel free to play around.
The mission of this community group is to establish a draft standard for a RDF-based representation of the HTML-vocabulary. With the HTML-vocabulary in RDF, any type of an HTML-document can be meaningfully represented, generated and validated using nothing but standard semantic technologies, without any vendor lock-in. In addition, full provenance can be provided for a generated HTML-document, as every atom of the document can be described and semantically enriched, ex ante (RDF) and ex post (Rdfa). For instance, the originating algorithm that calculates a certain budget amount in a governmental HTML-document can be linked to the table cell containing the very value. HTML-documents have a wide variety of use and so has the HTML vocabulary. The HTML-vocabulary can be used to generate 100% correct HTML or xHTML and to validate this. The HTML vocabulary can be used to model the front end of a website or application, whereas the logic behind the front end can be captured in SHACL Advanced Features, making for a full semantic representation and execution of digital infrastructure, without any vendor lock-in. An HTML-document can be generated with full compliance to laws and regulations, as these norms can be linked and applied while using the HTML-vocabulary. With full provenance, an HTML-document can battle fake news and show realtime how certain sensitive data in the document (privacy, security) was derived.
The community group will come up with a 0.1 draft specification. This will be input for a future working group within W3C. The community group can make use of the currently available draft specification as developed by the Dutch Ministry of Finance in a working prototype for the Dutch governmental budget cycle. By starting this community group, the Dutch Ministry wants to contribute to an open source based digital infrastructure.
This is a community initiative. This group was originally proposed on 2022-06-03 by Flores Bakker. The following people supported its creation: Flores Bakker, Gregg Kellogg, Paola Di Maio, Wouter Beek, Ruben Taelman. W3C’s hosting of this group does not imply endorsement of the activities.