Shex/Comparison
Obsolete - please see the ShEx github wiki
A comparison of the semantics of shape with some of the other things.
To make a few things clear first 2 comparisons.
When comparing XML documents to RDF there are 3 main differences.
- XML documents always contains ordered lists of childs whereas RDF documents are unordered.
- Parents to child relationships are always 1 to 1..N, whereas RDF objects can references each order 0..N to 1..N. XML documents can always be presented as tree.
- Childs are not referenced via a property, whereas in RDF they are. Note that this very important difference, which has many implications.(This is why you need a schema to interpret a XML document)
When comparing the RELAX NG, Java class defenitions and C structure definitions give the following comparison table.
SHEX | RELAX NG | Java | C | |
1) Name | resourceshape | element | class | struct/union |
2) Define 'rules/groups' | yes | yes | no | no, but there is option to use unions |
3) Reference rule by name | yes | yes | - | you can reference unions |
4) Have sequence of groups | yes | yes ->1?+* | - | special construct (need to define size field before) |
5) Extending classes | currently not | no | yes | you can inline a base structure |
6) Inlining | only rules | only rules | 'no' only by extending | you can choice by reference(pointer) or to inline |
7) Reference | only resource shape | only elements | default | you can choice by reference(pointer) or to inline |
8) Parent child relation ship | 0..N -> 1..N | 1 -> 1..N | 0..N -> 1..N | 0..N -> 1..N (inlining possible) |
9) Properties/childs ordered | no | yes | yes | yes |
1) Term used for the concept 'class'. 2) Can we define rules that describe the shape to which a instance of a 'class' must comply. 3) Can we reference use rule by name, so we can reuse and 'extend' them 4) Can we tell that we expect a not yet predefined amount of occurrences of a certain rule within a shape. For example ( ex:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime )* In RELAX NG its possible to choice between exactly 1(1),0 or 1(?), 1 or more(+) and 0 or more(*). 5) Can we make an extention on an existing class. 6) Can we tell the system to inline a certain group/rule, without making a separate object and referencing to it. 7) Can you tell the system to make a new object of the rule/group and reference to it. 8) What kind of relationships type is there from the parent to the child. In xml child elements are always contained in only one parent, so you always will have 1 -> 1..N relationships. 9) Are the child properties and references in some way ordered. When SHEX described an RDF database then they are not ordered.
When comparing SHEX and RELAX NG to Java there is the following main distinction SHEX and RELAX NG have definitions of rule and rule group whereas Java and C do not, but Java has clear definition of OO class semantics. C sits somewhere in the middle.
So looking at this, I can make a distinction between the definitions of the shapes/rules and that of the class semantics. Shapes define the structure of a graph in the same way as a database scheme defines the structure of a SQL database. Where as the class semantics give meaning to what the object represent. I think is good to keep this 2 things seperated from each other, so we can define shapes, without having to define class with a full semantic meaning attached to it.
When understanding a unknown RDF database, you must have the shape structure to be able to query, whereas the semantics is handy to have. When integrating it with another source then both the shape structure and the semantics become a must have.
The shape definitions are needed for:
- Understand the structure of a RDF resource, so that it become possible to create queries for an unknown resource.
- Define rules to which a RDF resource should comply to when given as an input into an application.
The semantics are need for:
- Find the meaning of the properties and classes used within a RDF resource
- (Automatically) integrating two different sources which each other
- Smart combining/integration of 2 or more documents that contain the same semantic information but are encoded in different structures.