ShEx/Obsolete/ShEx
Obsolete - please see the ShEx github wiki
ShEX, or Shape Expressions(intro), is a language for expressing constraints on RDF graphs.
It includes the cardinality constraints from OSLC Resource Shapes and Dublin Core Description Set Profiles as well as logical connectives for disjuntion and polymorphism.
It is intended to:
- validate RDF documents.
- communicate expected graph patterns for interfaces.
- generate user interface forms and interface code.
- compile to SPARQL queries (except for cyclic grammars).
A W3C ShEx Demo validates data against a schema, compiles SPARQL queries for the schema and generates an RDF representation.
Syntax
The ShEx syntax is modeled after RelaxNG Compact Syntax (RNC):
<IssueShape> { # A Issue shape :state ( :unassigned :assigned ), # has a state with 2 possible values :reportedBy @<UserShape>, # is reported by a user :reportedOn xsd:date, # is reported on a date ( :reproducedBy @<UserShape> # can optionally have 2 properties , :reproducedOn xsd:date # reproducedBy/On )?, :related @<IssueShape>* # is related to several other issues } <UserShape> { # A user shape can have either ( foaf:name xsd:string # name or | foaf:givenName xsd:string+ , # several given names and foaf:familyName xsd:string # family name ), foaf:mbox shex:IRI ? # mbox Optional, any IRI }
The previous example can be tested here (using RDFShape) and here (with Eric's fancy demo)
Shex definition can be defined in 2 syntaxes: SHEXc (SHEX compact format) and SHEX/RDF.
Semantics
ShEx (and RNC) are designed to be familiar to users of BNF and regular expressions. The conspicuous differences are that regular expressions correlate an ordered pattern of atomic characters and logical operators against an ordered sequence of characters. Shape Expressions correlate an ordered pattern of pairs of predicate and object classes (called NameClass and ValueClass) and logical operators against an unordered set of arcs in a graph. The logical operators in Shape Expressions, grouping, conjunction, disjunction and cardinality constraints, are defined to make as closely as possible to their counterparts in regular expressions and grammar languages like BNF.
Recursive shapes (like <IssueShape>) are problematic for Shape Expressions. The meanings of such shapes are open to question. The semantics for Shape Expressions does not handle them well, going into infinite loops, or being non-deterministic, or even being paradoxical.
See for more details and test cases [1]
SHEXc Language Summary
feature | example | description |
Matching a Predicate to a NameClass | ||
NameTerm | ex:state | The predicate of any matching triple is the same as the NameTerm IRI. |
NameStem | ex:~ | The predicate of any matching triple starts with the IRI. |
NameAny | . - rdf:type - ex:~ | A matching triple has any predicate except those terms NameTerms or NameStems excluded by the '-' operator. |
Matching an Object to a ValueClass | ||
ValueType | xsd:dateTime | The object of any matching triple is the same as the ValueType IRI. |
ValueSet | (ex:unassigned ex:assigned) | The object of any matching triple is one of the list of triples in the ValueSet. |
ValueStem | ex:~ | The object of any matching triple starts with the IRI. |
ValueAny | A matching triple has any object except those terms or stems excluded by the '-' operator. | |
ValueReference | @<UserShape> | The object of a matching triple is an IRI or blank node and the that node is the subject of triples matching the referenced shape expression. |
Rule Types | ||
ArcRule | foaf:givenName xsd:string+ | A matching triple matches the NameTerm and the ValueTerm. Cardinality constraints apply. |
AndRule | foaf:givenName xsd:string,
foaf:familyName xsd:string |
Each conjoint matches the input graph. |
OrRule | foaf:givenName xsd:string
foaf:name xsd:string |
Exactly one disjoint matches the input graph. |
GroupRule | x:reproducedBy @<EmployeeShape>,
ex:reproducedOn xsd:dateTime) |
A matching triple matches the enclosed rule (here an AndRule). Cardinality constraints apply. |
Cardinality | ||
? | foaf:givenName xsd:string? | rule must match 0 or 1 times. |
+ | foaf:givenName xsd:string+ | rule must match 1 or more times. |
* | foaf:givenName xsd:string* | rule must match 0 or more times. |
{m} | foaf:givenName xsd:string{3} | rule must match m times. |
{m,n} | foaf:givenName xsd:string{3,5} | rule must match at least m times and no more than n times. |
Cardinality constraints may appear after an ArcRule. A '?' may also appear after a GroupRule to indicate that it is optional. Any AndRule nested immediately inside the GroupRule must have every rule match or no rule match. | ||
Rule Inclusions | ||
&RuleName | & <PersonShape> | Include the referenced rule in place of the include directive. |
Rule Inclusions may appear before a shape definition inside of a definition. Befor a shape definition, they signify the inclusion of the referenced rule ("included rule") at the beginning of the one being defined, as well as asserting that ValueReferences to the included rule accept the defined shape as well. | ||
Semantic Actions | ||
%lang{ code %} | %js{ return _.o.lex > report.lex; %}
%sparql{ ?s ex:reportedOn ?rpt . FILTER (?o > ?rpt) %} |
Invoke semantic actions when a rule is satisfied. |
Semantic Actions may appear after an ArcRule, a Group Rule or a named Shape Expression. When used with validation, they are invoked only a valid pairs of a triple and a rule. Their use for interface validation is currently undefined. |
SHEX/RDF format
The page ShEx/RDF serialization defines SHEX/RDF schema which does self validate (Work in progress).
Formal definitions
ShEx semantics has been explained and documented with several documents describing its formalisms:
- ShEx Primer - introduction to ShEx with links to editable examples.
- Denotational Semantics (compare to Relax NG Semantics)
- Regular Bag Expressions
- Z Notation
- ShEx/OperationalSemantics Operational semantics inspired by Relax NG Semantics.
Implementations
There are currently the following implementations of Shape Expressions
Fancy ShEx Demo
- Formats: SHEXc
- Language: javascript based
- Algorithm: State based
- Developer: Eric Prud'Hommeaux
Live Demo Examples:
- ShEx Demo - test data against a schema, generate SPARQL and Resource Shape for the schema.
- GenX Demo - use ShEx semantic actions to translate RDF to XML.
- multiple inheritance example - demo ShEx's polymorphism
JSShexTest
- Formats: SHEX/RDF and SHEXc(partly)
- Language: Javascript based
- Algorithm: State based
- Developer: Jesse van Dam
- Working version: [2].
- Source code [3] (uses a local web server that can be started with ruby and accessed via localhost:4567).
For the validation code see [4] for the validation process (easy to read). Further description can be found here at ValidationCode.
RDFShape
- Syntax: ShExc (Shex compact syntax) with some extensions like regex
- Semantics: Open/Closed view of shapes
- Developer: Jose Emilio Labra Gayo
- Algorithm: Regular expression derivatives
- Programming language: Scala
- Extra features: Online RDF validator based on Shexcala
Shexcala
- Syntax: ShEx compact syntax
- Semantics: Closed and Open view of shapes based on Iovka's proposal
- Developer: Jose Emilio Labra Gayo
- Algorithm: Regular expression derivatives and backtracking (by selection)
- Programming language: Scala
- Extra features: Negation, Reverse arcs, language tags, regexps
Haws
- Syntax: Abstract syntax
- Semantics: Closed shapes based on operational semantics
- Developer: Jose Emilio Labra Gayo
- Algorithm: Backtracking
- Programming language: Haskell
Test cases
Test script that uses simplified semantics to test the matching logic created by Eric Prud'hommeaux can be found at [5]
The SHEX test suite is defined in a standardized format that can be found here [6] and the official set of test cases can be found here [7]
SHEX/RDF based test cases still(todo) only included in Jesse van Dam scripts can be found here [8]
Examples
A separate page contains some simple examples using ShEx.
The following list contains a list of examples that employ ShEx:
- Uniprot_SHEX_schema (out datet)
- LandPortal documentation using ShEx
- Web Index Data Portal documentation using ShEx
- FDA Renal Transplantation Ontology
Publications about ShEx
- Complexity and Expressiveness of ShEx for RDF, In International Conference on Database Theory (ICDT) 2015. With S. Staworko, J. E. Labra Gayo, S. Hym, E. G. Prud’hommeaux, and H. Solbrig. PDF
- Towards an RDF validation language based on Regular Expression derivatives, Jose Emilio Labra Gayo, Eric Prud'Hommeaux, Slawek Staworko and Harold Solbrig. PDFSlides
- Shape Expressions: An RDF validation and transformation language, Eric Prud'hommeaux, Jose Emilio Labra Gayo, Harold Solbrig, 10th International Conference on Semantic Systems, Sept. 2015, Leipzig, Germany, PDFSlides
- Validating and Describing Linked Data Portals using RDF Shape Expressions, Jose Emilio Labra Gayo, Eric Prud'hommeaux, Harold Solbrig, 1st Workshop on Linked Data Quality, Sept. 2015, Leipzig, Germany, PDFSlides
Proposed Features
UNIQUE
Proposed for 1.1
A UNIQUE constraint takes an optional scope (FOCUS|GRAPH, default: FOCUS) and 1+ predicates, e.g.:
<T> { :fname LITERAL, :lname LITERAL, :title LITERAL+, :homepage IRI UNIQUE(GRAPH, :fname, :lname) UNIQUE(LANGTAG(:title)) UNIQUE(GRAPH, :homepage) }
UNIQUEs can appear arbitrarily nested in expressions:
<PersonShape> { foaf:givenName ., foaf:familyName UNIQUE(foaf:given, foaf:family) | foaf:name . UNIQUE(foaf:name) }
UNIQUEs scoped to the FOCUSNODE can be dispatched immediately. Those scoped to the GRAPH or DATASET must have their values noted and associated with the UNIQUE constraint, noting any possible conflicts during insertion.
Shortcomings
It's possible we'd want uniques that span shapes, e.g. if the following data were permissible:
{ <s1> :code "1234"; :dept [ :code "5678" ] . <s2> :code "1234"; :dept [ :code "8765" ] }
but this were not:
{ <s1> :code "1234"; :dept [ :code "5678" ] . <s2> :code "1234"; :dept [ :code "5678" ] }
There's no way to stipulate uniqueness across repeated properties, e.g. if we wanted to make sure that creators from a.example were unique in the graph but creators from b.example were not:
schema: <S> { :creator PATTERN "^http://a\\.example/", :creator PATTERN "^http://b\\.example/" }
failing data: { <s1> :creator <http://a.example/1> ; :creator <http://b.example/2> . <s2> :creator <http://a.example/3> ; :creator <http://b.example/2> . }
Alternative Syntax - on shape
The UNIQUE constraints could go on the shape.
<T> UNIQUE(GRAPH, :fname, :lname) UNIQUE(LANGTAG(:title)) UNIQUE(GRAPH, :homepage) { :fname LITERAL, :lname LITERAL, :title LITERAL+, :homepage IRI }
This makes evaluation of predicates in disjuncts weird, e.g.
<PersonShape> UNIQUE(foaf:given, foaf:family) UNIQUE(foaf:name) { foaf:givenName ., foaf:familyName | foaf:name .}
where the semantics for enforcing UNIQUE(foaf:given, foaf:family) over the data
{ <s> :foaf:name "Bob Smith". }
are a bit weird and "NULL-y".
GRAPH constraints
Proposed for 1.1
Use Cases
Enable validation outside of a single named graph:
Dataset:
Default graph: { <s> :lookInGraph <G1> } <G1>: { <s> :p2 :o2 }
Schema:
<S> { :lookInGraph GRAPH{ @<GShape> } } <GShape> { :p2 . }
Discussion
See here for the currently ongoing discussions
See Discussion SHEX format for a list of other discussion topics
See here for a comparison between ShEx and OWL