SlideShare a Scribd company logo
1 of 108
Download to read offline
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu
Validation Framework
for RDF-based Constraint Languages
M.Sc. (TUM) Thomas Hartmann
Professor Dr. York Sure-Vetter
Professor Dr. Kai Eckert (Stuttgart Media University)
Professor Dr. Rudi Studer
Professor Dr. Andreas Geyer-Schulz
Disputation, 08.07.2016
2
enthusiasm for SW technologies
problem statement
3
common need for RDF Validation
problem statement
4
common needs of data practitioners
2013: W3C RDF Validation Workshop
2014: 2 international working groups on RDF validation
constraint languages
SPARQL Query Language for RDF
SPARQL Inferencing Notation (SPIN)
Web Ontology Language (OWL)
Shape Expressions (ShEx)
Resource Shapes (ReSh)
Description Set Profiles (DSP)
Shapes Constraint Language (SHACL)
none of these languages meets all requirements
RDF validation as research field
problem statement
W3C RDF Data Shapes
Working Group
DCMI RDF Application
Profiles Task Group
5
Resource Description Framework (RDF)
5problem statement
6
constraints of running example
6problem statement
7
constraints of running example
7problem statement
8
constraints of running example
8problem statement
9
constraints of running example
9problem statement
10
constraints of running example
10problem statement
11
provide a basis for continued research
RDF validation
development of constraint languages
further development of constraint languages based on
commonly approved requirements
incorporate the findings into the working groups
thesis objectives
thesis objectives
www.kit.edu
12
5 research questions
13
Which types of research data and related metadata
are not yet representable in RDF and
how to adequately model them
to be able to validate RDF data
against constraints extractable from these vocabularies?
research question 1
RQ1
IASSIST Quarterly, 38(4) & 39(1), 7-16
IASSIST Quarterly, 38(4) & 39(1), 17-24
IASSIST Quarterly, 38(4) & 39(1), 25-37
IASSIST Quarterly, 38(4) & 39(1), 38-46
LDOW (WWW 2013)
SemStats (ISWC 2013)
DC 2012
ESWC 2011 (Poster)
DDI Moving
Forward Project
RDF Vocabularies
Working Group
14
How to directly validate XML data
on semantically rich OWL axioms
using common RDF validation tools
when XML Schemas, adequately representing particular domains,
have already been designed?
research question 2
RQ2
IJMSO, 8(3)
ISWC 2012
ICITST 2011
OCAS (ISWC 2011)
www.kit.edu
15
research question 3
16
http://purl.org/net/rdf-validation
DC 2014RQ3
17RQ3
18RQ3
19RQ3
20RQ3
21
Which types of constraints
must be expressible by constraint languages to meet
all collaboratively and comprehensively identified requirements
to formulate constraints and validate RDF data?
research question 3
RQ3
22
a constraint is instantiated from a constraint type
each constraint type corresponds to a requirement
81 constraint types
types of constraints on RDF data
RQ3
www.kit.edu
23
research question 4
24
ShEx:
ReSh:
SHACL:
:Book { :author @:Person{1, } }
:Book a rs:ResourceShape ; rs:property [
rs:propertyDefinition :author ;
rs:valueShape :Person ;
rs:occurs rs:One-or-many ; ] .
minimum qualified cardinality restrictions (R-75)
:BookShape
a sh:Shape ;
sh:scopeClass :Book ;
sh:property [
sh:predicate :author ;
sh:valueShape :PersonShape ;
sh:minCount 1 ; ] .
:PersonShape
a sh:Shape ;
sh:scopeClass :Person .
RQ4
25
SPARQL and SPIN:
CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE {
?subject
a ?C1 ;
?predicate ?object .
BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ).
BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger )
AS ?cardinality ) .
FILTER ( ?cardinality < ?minimumCardinality ) .
FILTER ( ?minimumCardinality = 1 ) .
FILTER ( ?C1 = :Book ) .
FILTER ( ?C2 = :Person ) .
FILTER ( ?predicate = :author ) . }
SELECT ( COUNT ( ?arg1 ) AS ?c )
WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . }
RQ4
minimum qualified cardinality restrictions (R-75)
26
minimum qualified cardinality restrictions (R-75)
OWL:
DSP:
:Book rdfs:subClassOf
[ a owl:Restriction ;
owl:minQualifiedCardinality 1 ;
owl:onProperty :author ;
owl:onClass :Person ] .
[ dsp:resourceClass :Book ; dsp:statementTemplate [
dsp:minOccur 1 ;
dsp:property :author ;
dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] .
RQ4
27
high-level constraint languages either
lack an implementation or
are based on different implementations
How to consistently validate RDF data
against constraints of any constraint type
expressed in any RDF-based constraint language?
research question 4-1
RQ4
28
validation environment
constraint language implementation (SPIN mapping):
:MinimumQualifiedCardinalityRestrictions
a spin:ConstructTemplate ;
spin:body [ ...
CONSTRUCT { ... }
WHERE { ... } ... ] .
RQ4
29
validation process
RQ4
30RQ4
validation results
30
31
validation results
RQ4 31
32
validation results
RQ4 32
33
validation results
RQ4 33
34
validation results
RQ4 34
35
validation results
RQ4 35
36
validation results
RQ4 36
37
full implementations for
all OWL 2 and DSP language constructs
all constraint types expressible in OWL 2 and DSP
major constraint types representable by ShEx and ReSh
RDF serialization for DSP
validation environment
http://purl.org/net/rdfval-demo
RQ4
38
http://purl.org/net/rdfval-demo
RQ4
39
constraints and constraint language constructs
must be representable in RDF
constraint languages and supported constraint types
must be expressible in SPARQL
limitations
RQ4
40
How to represent constraints of any constraint type and
how to reduce the representation of
constraints of any constraint type
to the absolute minimum?
research question 4-2
RQ4
DSP ReSh ShEx SHACL OWL 2 SPARQL
17.3
(14)
25.9
(21)
29.6
(24)
51.9
(42)
67.9
(55)
100.0
(81)
41
intermediate abstraction layer
based on formal logics
enables to express any constraint type
enables straight-forward mappings from high-level constraint languages
reduces the representation of constraints to the absolute minimum
validation framework
for RDF-based constraint languages
RQ4
42
conceptual model
DC 2015
RQ4
74%
26%
43RQ4 43
simple constraints
44
different validation results
RQ4
45
different validation results
RQ4 45
46
different validation results
RQ4 46
47
different validation results
RQ4 47
48
different validation results
RQ4 48
49
different validation results
RQ4 49
50
How to ensure for any constraint type that
RDF data is consistently validated against
semantically equivalent constraints of the same constraint type
across RDF-based constraint languages?
framework is solely based on the abstract definitions of constraint types
just 1 SPIN mapping for each constraint type
research question 4-3
RQ4
51RQ4
semantically equivalent constraints
51
52
How to ensure for any constraint type that
semantically equivalent constraints of the same constraint type
can be transformed
from one RDF-based constraint language to another?
gc = mα (cα)
cβ = m'β (gc)
RQ4
research question 4-4
53
What is the role reasoning plays in practical data validation and
for which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5
RQ5
SEMANTiCS 2015
54
collected, classified, and implemented 115 constraints
from vocabularies or domain experts
on 3 common vocabularies
well-established (QB, SKOS)
under development (DDI-RDF)
evaluation
evaluation
IJSC, 10(2)
ICSC 2016
33 SPARQL endpoints
55
future work: validation database and framework
maintain and extend RDF validation database
collect case studies and use cases
extract requirements
publish constraint types
keep framework in sync
evaluate solutions
future work
http://purl.org/net/rdf-validation
56
future work: combine framework with SHACL
derive SHACL extensions
define mappings from SHACL to the abstraction layer and back
maintain consistency of implementations of constraint types
future work
W3C RDF Data Shapes
Working Group
DCMI RDF Application
Profiles Task Group
57
summary of main contributions
development of 3 RDF vocabularies
direct validation of XML using common RDF validation tools
publication of 81 constraint types
validation framework for RDF-based constraint languages
role of reasoning for RDF validation
THANK YOU!
58
acknowledgements, publications, research data
30 publications
6 journal articles, 9 conference articles, 3 workshop articles,
2 specifications, 10 technical reports
1. author of all (except 1) journal articles, conference articles, workshop articles
research data and results
KIT research data repository: http://dx.doi.org/10.5445/BWDD/11
GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis
4 international working groups
DCMI RDF Application Profiles Task Group
part of the editorial board
RDF Vocabularies Working Group
editor for DDI-RDF and PHDD
W3C RDF Data Shapes Working Group
DDI Moving Forward Project
THANK YOU!
www.kit.edu
59
appendix
60
publications: journal articles
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of
Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing,
10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc
2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation
Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4
3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for
Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4
4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly,
38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4
5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the
Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4
6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies
based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on
Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266.
http://www.inderscience.com/info/inarticle.php?artid=57760
Please note that in 2015, my last name changed from Bosch to Hartmann.
61
publications: articles in conference proceedings
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using
Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International
Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE.
http://www.ieee-icsc.com/
2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint
Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata
Applications (DC 2015) São Paulo, Brazil.
http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368
3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In
Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40).
Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867
4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In
Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014)
Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257
5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as
Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and
Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-
2014/paper/view/270
Please note that in 2015, my last name changed from Bosch to Hartmann.
62
publications: articles in conference proceedings
6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked
Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI
International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia.
http://dcpapers.dublincore.org/pubs/article/view/3654
7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain
Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J.
Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of
Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg.
http://dx.doi.org/10.1007/978-3-642-35173-0_34
8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically
Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and
Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab
Emirates. http://edas.info/web/icitst2011/program.html
9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation
Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session
Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html
Please note that in 2015, my last name changed from Bosch to Hartmann.
63
publications: articles in workshop proceedings
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A
Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on
Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013),
volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/
2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level
Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on
Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney,
Australia. http://semstats.github.io/2013/proceedings
3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on
XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011),
10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany.
http://ceur-ws.org/Vol-809/
64
publications: specifications
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A
Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data.
DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery
2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI
Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html
65
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis
Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062
2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery
Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02
3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A.,
Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata
Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements
4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A.,
Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI
Draft, Dublin Core Metadata Initiative (DCMI).
http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable
5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and
Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933.
http://arxiv.org/abs/1501.03933
66
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data
Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research
Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479
7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets
on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository
(CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478
8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation
Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470
9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies
Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences,
Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/
10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L.,
Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas,
W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification.
DDI Working Paper Series
67
research questions
1. Which types of research data and related metadata are not yet representable in RDF and how
to adequately model them to be able to validate RDF data against constraints extractable
from these vocabularies?
2. How to directly validate XML data on semantically rich OWL axioms using common RDF
validation tools when XML Schemas, adequately representing particular domains, have
already been designed?
3. Which types of constraints must be expressible by constraint languages to meet all
collaboratively and comprehensively identified requirements to formulate constraints and
validate RDF data?
4. How to ensure for any constraint type that (1) RDF data is consistently validated against
semantically equivalent constraints of the same constraint type across RDF-based constraint
languages and (2) semantically equivalent constraints of the same constraint type can be
transformed from one RDF-based constraint language to another?
5. What is the role reasoning plays in practical data validation and for which constraint types
reasoning may be performed prior to validation to enhance data quality?
appendix
68
summary of contributions
1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in
RDF and (2) to validate RDF data against constraints extractable from these vocabularies
2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms
extracted from XML Schemas properly describing certain domains
3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly
and extensively identified requirements to formulate constraints and validate RDF data against constraints
4.1 Consistent validation across RDF-based constraint languages
4.2 Minimal representation of constraints of any type
4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of
the same constraint type across RDF-based constraint languages
4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be
transformed from one RDF-based constraint language to another
5. We delineate the role reasoning plays in practical data validation and investigated for each constraint
type (1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in
terms of runtime validation is performed with and without reasoning, and (3) if validation results depend
on different underlying semantics
6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality
appendix
69
summary of limitations
1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way
2. Constraints of supported constraint types and constraint language constructs must be representable in RDF
3. Constraint languages and supported constraint types must be expressible in SPARQL
4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies
appendix
www.kit.edu
70
research question 1
71
Which types of research data and related metadata
are not yet representable in RDF and
how to adequately model them
to be able to validate RDF data
against constraints extractable from these vocabularies?
research question 1
RQ1
IASSIST Quarterly, 38(4) & 39(1), 7-16
IASSIST Quarterly, 38(4) & 39(1), 17-24
IASSIST Quarterly, 38(4) & 39(1), 25-37
IASSIST Quarterly, 38(4) & 39(1), 38-46
LDOW (WWW 2013)
SemStats (ISWC 2013)
DC 2012
ESWC 2011 (Poster)
DDI Moving
Forward Project
RDF Vocabularies
Working Group
72
development of 3 RDF vocabularies:
1. DDI-RDF Discovery Vocabulary (DDI-RDF)
to describe unit-record data
2. Physical Data Description (PHDD)
to describe data in tabular format and its physical properties
3. The SKOS Extension for Statistics (XKOS)
to describe the structure and textual properties of
formal statistical classifications
to describe relations between classifications and concepts
and among concepts
contribution
RQ1
www.kit.edu
73
research question 2
74
XML, XML Schema (XSD)
RDF, Web Ontology Language (OWL)
XML Schemas > OWL ontologies
time-consuming work designing domain ontologies from scratch by hand
reuse information contained in XML Schemas
designing OWL domain ontologies
RQ2
75
How to directly validate XML data
on semantically rich OWL axioms
using common RDF validation tools
when XML Schemas, adequately representing particular domains,
have already been designed?
research question 2
RQ2
IJMSO, 8(3)
ISWC 2012
ICITST 2011
OCAS (ISWC 2011)
76
sub-class relationships
OWL hasValue restrictions on data properties
OWL universal restrictions on object properties
semantically rich OWL axioms
<library>
<book year="February 1890">
<author>
<name>Arthur Conan Doyle</name>
</author>
<title>The Sign of the Four</title>
</book>
</library>
Title ⊑  value.string
Year ⊑  value.integer
RQ2
77
on formal logics based transformations
OWL axioms extracted out of XML Schemas
explicitly
implicitly
formally underpin transformations
to formally define and model semantics in a semantically correct way
complete extraction of XML Schemas' structural information
XML can directly be validated against semantically rich OWL axioms
any XML Schema is convertible to OWL
minimized effort designing OWL domain ontologies
contributions
IJMSO, 8(3)
RQ2
78
ISWC 2012
ICITST 2011
OCAS (ISWC 2011)
RQ2
79
1. step of approach
executed generic test cases created out of the XML Schema meta-model
transformed XML Schemas of 6 XML standards
2. step of approach
specified SWRL rules for 3 OWL domain ontologies
verified hypothesis
determined effort for traditional manual approach
estimated effort for semi-automatic approach
DDI-RDF serves as OWL domain ontology
The effort and the time needed to deliver high quality domain ontologies from scratch
by reusing information of already existing XML Schemas is much less than
creating domain ontologies completely manually and from the ground up.
evaluation
IJMSO, 8(3)
RQ2
www.kit.edu
80
research question 5
81
What is the role reasoning plays in practical data validation and
for which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5
RQ5
SEMANTiCS 2015
82
What is the role reasoning plays in practical data validation?
research question 5-1
RQ5
83
reasoning may resolve violations
Book ⊑  author.Person
Book(Huckleberry-Finn)
author(Huckleberry-Finn, Mark-Twain)
→ Person(Mark-Twain)
RQ5
84
reasoning may cause violations
Publication ⊑ ∃ publisher.Publisher
Book(Huckleberry-Finn)
Book ⊑ Publication
RQ5
85
reasoning solves redundency
Publication ⊑ ∃ publicationDate . xsd:date
Book ⊑ Publication
Conference-Proceeding ⊑ Publication
Journal-Article ⊑ Publication
RQ5
86
For which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5-2
RQ5
87
> 2/5 of constraint types
property domains (R-25):
constraint types with reasoning
∃ author.⊤ ⊑ Publication
author(Alices-Adventures-In-Wonderland, Lewis-Carroll)
→ rdf:type(Alices-Adventures-In-Wonderland, Publication)
RQ5
88
< 3/5 of constraint types
literal pattern matching (R-44):
constraint types without reasoning
RQ5
ISBN a rdfs:Datatype ;
owl:equivalentClass [ a rdfs:Datatype ;
owl:onDatatype xsd:string ;
owl:withRestrictions
([ xsd:pattern "^d{9}[d|X]$" ])] .
Book ⊑  identifier.ISBN
89
For which constraint types validation results differ
(1) if the CWA or the OWA and
(2) if the UNA or the nUNA is assumed?
CWA dependent: 56.8%
UNA dependent: 66.6%
research question 5-3
RQ5
90
56.8% of constraint types
minimum qualified cardinality restrictions (R-75):
CWA dependent constraint types
RQ5
Book ⊑ ∃ title.⊤
91
disjoint classes (R-7):
CWA independent constraint types
RQ5
Book ⊓ JournalArticle ⊑ ⊥
92
66.6% of constraint types
functional properties (R-57/65):
UNA dependent constraint types
RQ5
funct(title)
title(The-Adventures-of-Huckleberry-Finn,
"The Adventures of Huckleberry Finn")
title(The-Adventures-of-Huckleberry-Finn,
"Die Abenteuer des Huckleberry Finn")
93
literal value comparison (R-43):
UNA independent constraint types
RQ5
birthDate(Albert-Einstein, "1955-04-18")
deathDate(Albert-Einstein, "1879-03-14")
birthDate(Albert_Einstein, "1879-03-14")
deathDate(Albert_Einstein, "1955-04-18")
owl:sameAs(Albert-Einstein,
Albert_Einstein)
www.kit.edu
94
evaluation
95
collected, classified, and implemented 115 constraints
from vocabularies or domain experts
on 3 common vocabularies
well-established (QB, SKOS)
under development (DDI-RDF)
evaluation
evaluation
IJSC, 10(2)
ICSC 2016
33 SPARQL endpoints
96
classification of constraint types
RDFS/OWL based
constraint language based
SPARQL based
classification of constraints
informational
warning
error
evaluation
classification
97
RDFS/OWL based
evaluation
classification of constraint types
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :author ;
owl:allValuesFrom :Person ] .
98
constraint language based
evaluation
classification of constraint types
:Publication {
( :isbn xsd:string, :title xsd:string )
|
( :issn xsd:string, :title xsd:string )}
99
SPARQL based
evaluation
classification of constraint types
SELECT ?concept
WHERE {
?concept a [ rdfs:subClassOf* skos:Concept ] .
FILTER NOT EXISTS {
?concept ?p ?o .
FILTER ( ?p IN (
skos:related,
skos:relatedMatch,
skos:broader, ... ) ) . } }
100
C (constraints), CV (constraint violations)
values in %
evaluation
finding 1
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
101
C (constraints), CV (constraint violations)
values in %
evaluation
finding 2
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
102
C (constraints), CV (constraint violations)
values in %
evaluation
finding 3
C CV
Info 42.3 31.3
Warning 18.7 62.7
Error 39.0 6.1
www.kit.edu
103
future work
104
future work: RQ1
publication of RDF vocabularies
DDI Alliance specifications
W3C recommendation for DDI-RDF
DDI-Lifecycle MD (Model-Driven)
new requirements based on experiences with DDI-RDF
international working group: DDI Moving Forward Project
individual contributions
formalize conceptual model (using UML 2)
conceptualize and implement diverse model serializations (e.g., RDFS/OWL)
future work
105
aligning PHDD and CSV on the WEB
overlap in the description of tabular data in CSV format
broader scope of PHDD
description of tabular data with fixed record length
description of tabular data with multiple records per case
evaluation for use in DDI-Lifecycle MD
future work: RQ1
future work
106
future work: RQ2
bidirectional transformations from models of any meta-model to OWL
generalize from XSD meta-model based unidirectional transformations
from XSD models into OWL models
enable to validate any data against constraints extractable from models of
any meta-model using common RDF validation tools
future work
107
future work: validation database and framework
maintain and extend RDF validation database
collect case studies and use cases
extract requirements
publish constraint types
keep framework in sync
evaluate solutions
future work
http://purl.org/net/rdf-validation
108
future work: combine framework with SHACL
derive SHACL extensions
define mappings from SHACL to the abstraction layer and back
maintain consistency of implementations of constraint types
future work
W3C RDF Data Shapes
Working Group
DCMI RDF Application
Profiles Task Group

More Related Content

What's hot

Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextIsabelle Augenstein
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSJenn Riley
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKANandrea huang
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Vu Semantic Web Meeting 20091123
Vu Semantic Web Meeting 20091123Vu Semantic Web Meeting 20091123
Vu Semantic Web Meeting 20091123Rinke Hoekstra
 

What's hot (7)

Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
Vu Semantic Web Meeting 20091123
Vu Semantic Web Meeting 20091123Vu Semantic Web Meeting 20091123
Vu Semantic Web Meeting 20091123
 

Viewers also liked

Métodos anticonceptivos
Métodos anticonceptivosMétodos anticonceptivos
Métodos anticonceptivosManuel Franco
 
Claves de la comunicación interna
Claves de la comunicación internaClaves de la comunicación interna
Claves de la comunicación internaMASmedios com
 
Вовлечение и удержание аудитории.
Вовлечение и удержание аудитории. Вовлечение и удержание аудитории.
Вовлечение и удержание аудитории. Egor Abaturov
 
Five thousand reasons intranets suck and five ways to fix them
Five thousand reasons intranets suck and five ways to fix themFive thousand reasons intranets suck and five ways to fix them
Five thousand reasons intranets suck and five ways to fix themAudun Rundberg
 
Aprendre junts
Aprendre juntsAprendre junts
Aprendre juntsemmsantboi
 
Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...
Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...
Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...AndreasErdel
 
Case Study: Telecom Materials Management Outsourcing
Case Study: Telecom Materials Management OutsourcingCase Study: Telecom Materials Management Outsourcing
Case Study: Telecom Materials Management OutsourcingPaul Adamson
 
The SoMoLo Imperative
The SoMoLo ImperativeThe SoMoLo Imperative
The SoMoLo ImperativeRedPrairie
 
Mac129 med102 med122 Television, video and the internet
Mac129 med102 med122 Television, video and the internetMac129 med102 med122 Television, video and the internet
Mac129 med102 med122 Television, video and the internetRob Jewitt
 
Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]
Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]
Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]Tommy Darker
 
American rap presentation
American rap presentationAmerican rap presentation
American rap presentationmireiaxorto
 
Tics
TicsTics
Ticssof17
 
Detetive da escrita
Detetive da escritaDetetive da escrita
Detetive da escritaFSBA
 
Impacto de la nueva ley sobre teletrabajo en el Perú
Impacto de la nueva ley sobre teletrabajo en el PerúImpacto de la nueva ley sobre teletrabajo en el Perú
Impacto de la nueva ley sobre teletrabajo en el PerúCYNTIA
 
Montes tianmen
Montes tianmenMontes tianmen
Montes tianmenPelo Siro
 
One piece volume 43(410-419)
One piece volume 43(410-419)One piece volume 43(410-419)
One piece volume 43(410-419)Marcos Donato
 

Viewers also liked (20)

Métodos anticonceptivos
Métodos anticonceptivosMétodos anticonceptivos
Métodos anticonceptivos
 
Claves de la comunicación interna
Claves de la comunicación internaClaves de la comunicación interna
Claves de la comunicación interna
 
Вовлечение и удержание аудитории.
Вовлечение и удержание аудитории. Вовлечение и удержание аудитории.
Вовлечение и удержание аудитории.
 
Five thousand reasons intranets suck and five ways to fix them
Five thousand reasons intranets suck and five ways to fix themFive thousand reasons intranets suck and five ways to fix them
Five thousand reasons intranets suck and five ways to fix them
 
Mcdi u1 ea_lula
Mcdi u1 ea_lulaMcdi u1 ea_lula
Mcdi u1 ea_lula
 
Sustainable Libraries - Shades of Green, Introduction
Sustainable Libraries - Shades of Green, IntroductionSustainable Libraries - Shades of Green, Introduction
Sustainable Libraries - Shades of Green, Introduction
 
Glenthorne Media Academy
Glenthorne Media AcademyGlenthorne Media Academy
Glenthorne Media Academy
 
Aprendre junts
Aprendre juntsAprendre junts
Aprendre junts
 
Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...
Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...
Andreas Erdel (www.andreas-erdel.com), translater, Swedish, German, English, ...
 
Case Study: Telecom Materials Management Outsourcing
Case Study: Telecom Materials Management OutsourcingCase Study: Telecom Materials Management Outsourcing
Case Study: Telecom Materials Management Outsourcing
 
The SoMoLo Imperative
The SoMoLo ImperativeThe SoMoLo Imperative
The SoMoLo Imperative
 
Mac129 med102 med122 Television, video and the internet
Mac129 med102 med122 Television, video and the internetMac129 med102 med122 Television, video and the internet
Mac129 med102 med122 Television, video and the internet
 
Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]
Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]
Stereo Mike - Techno-artistic Autonomy [Darker Music Talks April 2015]
 
American rap presentation
American rap presentationAmerican rap presentation
American rap presentation
 
Tics
TicsTics
Tics
 
Detetive da escrita
Detetive da escritaDetetive da escrita
Detetive da escrita
 
Impacto de la nueva ley sobre teletrabajo en el Perú
Impacto de la nueva ley sobre teletrabajo en el PerúImpacto de la nueva ley sobre teletrabajo en el Perú
Impacto de la nueva ley sobre teletrabajo en el Perú
 
Montes tianmen
Montes tianmenMontes tianmen
Montes tianmen
 
Chapter 52
Chapter 52Chapter 52
Chapter 52
 
One piece volume 43(410-419)
One piece volume 43(410-419)One piece volume 43(410-419)
One piece volume 43(410-419)
 

Similar to Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...Dr.-Ing. Thomas Hartmann
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudRichard Cyganiak
 
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Kerstin Forsberg
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetAlexandre Rademaker
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic webMarakana Inc.
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactJean-Paul Calbimonte
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approachesDave Reynolds
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
 

Similar to Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016) (20)

2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
 
semanticweb
semanticwebsemanticweb
semanticweb
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic web
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approaches
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 

More from Dr.-Ing. Thomas Hartmann

2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...
2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...
2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...Dr.-Ing. Thomas Hartmann
 
2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)
2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)
2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)Dr.-Ing. Thomas Hartmann
 
2014.10 - How to Formulate and Validate Constraints (DC 2014)
2014.10 - How to Formulate and Validate Constraints (DC 2014)2014.10 - How to Formulate and Validate Constraints (DC 2014)
2014.10 - How to Formulate and Validate Constraints (DC 2014)Dr.-Ing. Thomas Hartmann
 
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...Dr.-Ing. Thomas Hartmann
 
2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)
2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)
2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)Dr.-Ing. Thomas Hartmann
 
The Next Generation of the Microdata Information System MISSY - An Integrated...
The Next Generation of the Microdata Information System MISSY - An Integrated...The Next Generation of the Microdata Information System MISSY - An Integrated...
The Next Generation of the Microdata Information System MISSY - An Integrated...Dr.-Ing. Thomas Hartmann
 
The New Microdata Information System (MISSY) - Integration of DDI-based Data ...
The New Microdata Information System (MISSY) - Integration of DDI-based Data ...The New Microdata Information System (MISSY) - Integration of DDI-based Data ...
The New Microdata Information System (MISSY) - Integration of DDI-based Data ...Dr.-Ing. Thomas Hartmann
 
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...Dr.-Ing. Thomas Hartmann
 
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]Dr.-Ing. Thomas Hartmann
 
2013.02 - 7th Workshop of German Panel Surveys
2013.02 - 7th Workshop of German Panel Surveys2013.02 - 7th Workshop of German Panel Surveys
2013.02 - 7th Workshop of German Panel SurveysDr.-Ing. Thomas Hartmann
 
2012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 32012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 3Dr.-Ing. Thomas Hartmann
 

More from Dr.-Ing. Thomas Hartmann (20)

2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...
2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...
2015.09 - Guidance, Please! Towards a Framework for RDF-Based Constraint Lang...
 
2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)
2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)
2015.03 - The RDF Validator - A Tool to Validate RDF Data (KIM)
 
2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)
 
2014.12 - Let's Disco (EDDI 2014)
2014.12 - Let's Disco (EDDI 2014)2014.12 - Let's Disco (EDDI 2014)
2014.12 - Let's Disco (EDDI 2014)
 
2014.10 - How to Formulate and Validate Constraints (DC 2014)
2014.10 - How to Formulate and Validate Constraints (DC 2014)2014.10 - How to Formulate and Validate Constraints (DC 2014)
2014.10 - How to Formulate and Validate Constraints (DC 2014)
 
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
 
2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)
2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)
2014.10 - Requirements on RDF Constraint Formulation and Validation (DC 2014)
 
The Next Generation of the Microdata Information System MISSY - An Integrated...
The Next Generation of the Microdata Information System MISSY - An Integrated...The Next Generation of the Microdata Information System MISSY - An Integrated...
The Next Generation of the Microdata Information System MISSY - An Integrated...
 
The New Microdata Information System (MISSY) - Integration of DDI-based Data ...
The New Microdata Information System (MISSY) - Integration of DDI-based Data ...The New Microdata Information System (MISSY) - Integration of DDI-based Data ...
The New Microdata Information System (MISSY) - Integration of DDI-based Data ...
 
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
 
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
 
2013.05 - IASSIST 2013 - 3
2013.05 - IASSIST 2013 - 32013.05 - IASSIST 2013 - 3
2013.05 - IASSIST 2013 - 3
 
2013.05 - IASSIST 2013 - 2
2013.05 - IASSIST 2013 - 22013.05 - IASSIST 2013 - 2
2013.05 - IASSIST 2013 - 2
 
2013.05 - IASSIST 2013
2013.05 - IASSIST 20132013.05 - IASSIST 2013
2013.05 - IASSIST 2013
 
2013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 20132013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 2013
 
2013.02 - 7th Workshop of German Panel Surveys
2013.02 - 7th Workshop of German Panel Surveys2013.02 - 7th Workshop of German Panel Surveys
2013.02 - 7th Workshop of German Panel Surveys
 
2012.12 - EDDI 2012 - Poster Demo
2012.12 - EDDI 2012 - Poster Demo2012.12 - EDDI 2012 - Poster Demo
2012.12 - EDDI 2012 - Poster Demo
 
2012.12 - EDDI 2012 - Workshop
2012.12 - EDDI 2012 - Workshop2012.12 - EDDI 2012 - Workshop
2012.12 - EDDI 2012 - Workshop
 
2012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 32012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 3
 
2012.11 - ISWC 2012 - DC - 2
2012.11 - ISWC 2012 - DC -  22012.11 - ISWC 2012 - DC -  2
2012.11 - ISWC 2012 - DC - 2
 

Recently uploaded

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

  • 1. KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Validation Framework for RDF-based Constraint Languages M.Sc. (TUM) Thomas Hartmann Professor Dr. York Sure-Vetter Professor Dr. Kai Eckert (Stuttgart Media University) Professor Dr. Rudi Studer Professor Dr. Andreas Geyer-Schulz Disputation, 08.07.2016
  • 2. 2 enthusiasm for SW technologies problem statement
  • 3. 3 common need for RDF Validation problem statement
  • 4. 4 common needs of data practitioners 2013: W3C RDF Validation Workshop 2014: 2 international working groups on RDF validation constraint languages SPARQL Query Language for RDF SPARQL Inferencing Notation (SPIN) Web Ontology Language (OWL) Shape Expressions (ShEx) Resource Shapes (ReSh) Description Set Profiles (DSP) Shapes Constraint Language (SHACL) none of these languages meets all requirements RDF validation as research field problem statement W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group
  • 5. 5 Resource Description Framework (RDF) 5problem statement
  • 6. 6 constraints of running example 6problem statement
  • 7. 7 constraints of running example 7problem statement
  • 8. 8 constraints of running example 8problem statement
  • 9. 9 constraints of running example 9problem statement
  • 10. 10 constraints of running example 10problem statement
  • 11. 11 provide a basis for continued research RDF validation development of constraint languages further development of constraint languages based on commonly approved requirements incorporate the findings into the working groups thesis objectives thesis objectives
  • 13. 13 Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? research question 1 RQ1 IASSIST Quarterly, 38(4) & 39(1), 7-16 IASSIST Quarterly, 38(4) & 39(1), 17-24 IASSIST Quarterly, 38(4) & 39(1), 25-37 IASSIST Quarterly, 38(4) & 39(1), 38-46 LDOW (WWW 2013) SemStats (ISWC 2013) DC 2012 ESWC 2011 (Poster) DDI Moving Forward Project RDF Vocabularies Working Group
  • 14. 14 How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? research question 2 RQ2 IJMSO, 8(3) ISWC 2012 ICITST 2011 OCAS (ISWC 2011)
  • 17. 17RQ3
  • 18. 18RQ3
  • 19. 19RQ3
  • 20. 20RQ3
  • 21. 21 Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data? research question 3 RQ3
  • 22. 22 a constraint is instantiated from a constraint type each constraint type corresponds to a requirement 81 constraint types types of constraints on RDF data RQ3
  • 24. 24 ShEx: ReSh: SHACL: :Book { :author @:Person{1, } } :Book a rs:ResourceShape ; rs:property [ rs:propertyDefinition :author ; rs:valueShape :Person ; rs:occurs rs:One-or-many ; ] . minimum qualified cardinality restrictions (R-75) :BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] . :PersonShape a sh:Shape ; sh:scopeClass :Person . RQ4
  • 25. 25 SPARQL and SPIN: CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE { ?subject a ?C1 ; ?predicate ?object . BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ). BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) . FILTER ( ?cardinality < ?minimumCardinality ) . FILTER ( ?minimumCardinality = 1 ) . FILTER ( ?C1 = :Book ) . FILTER ( ?C2 = :Person ) . FILTER ( ?predicate = :author ) . } SELECT ( COUNT ( ?arg1 ) AS ?c ) WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . } RQ4 minimum qualified cardinality restrictions (R-75)
  • 26. 26 minimum qualified cardinality restrictions (R-75) OWL: DSP: :Book rdfs:subClassOf [ a owl:Restriction ; owl:minQualifiedCardinality 1 ; owl:onProperty :author ; owl:onClass :Person ] . [ dsp:resourceClass :Book ; dsp:statementTemplate [ dsp:minOccur 1 ; dsp:property :author ; dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] . RQ4
  • 27. 27 high-level constraint languages either lack an implementation or are based on different implementations How to consistently validate RDF data against constraints of any constraint type expressed in any RDF-based constraint language? research question 4-1 RQ4
  • 28. 28 validation environment constraint language implementation (SPIN mapping): :MinimumQualifiedCardinalityRestrictions a spin:ConstructTemplate ; spin:body [ ... CONSTRUCT { ... } WHERE { ... } ... ] . RQ4
  • 37. 37 full implementations for all OWL 2 and DSP language constructs all constraint types expressible in OWL 2 and DSP major constraint types representable by ShEx and ReSh RDF serialization for DSP validation environment http://purl.org/net/rdfval-demo RQ4
  • 39. 39 constraints and constraint language constructs must be representable in RDF constraint languages and supported constraint types must be expressible in SPARQL limitations RQ4
  • 40. 40 How to represent constraints of any constraint type and how to reduce the representation of constraints of any constraint type to the absolute minimum? research question 4-2 RQ4 DSP ReSh ShEx SHACL OWL 2 SPARQL 17.3 (14) 25.9 (21) 29.6 (24) 51.9 (42) 67.9 (55) 100.0 (81)
  • 41. 41 intermediate abstraction layer based on formal logics enables to express any constraint type enables straight-forward mappings from high-level constraint languages reduces the representation of constraints to the absolute minimum validation framework for RDF-based constraint languages RQ4
  • 50. 50 How to ensure for any constraint type that RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages? framework is solely based on the abstract definitions of constraint types just 1 SPIN mapping for each constraint type research question 4-3 RQ4
  • 52. 52 How to ensure for any constraint type that semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another? gc = mα (cα) cβ = m'β (gc) RQ4 research question 4-4
  • 53. 53 What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5 RQ5 SEMANTiCS 2015
  • 54. 54 collected, classified, and implemented 115 constraints from vocabularies or domain experts on 3 common vocabularies well-established (QB, SKOS) under development (DDI-RDF) evaluation evaluation IJSC, 10(2) ICSC 2016 33 SPARQL endpoints
  • 55. 55 future work: validation database and framework maintain and extend RDF validation database collect case studies and use cases extract requirements publish constraint types keep framework in sync evaluate solutions future work http://purl.org/net/rdf-validation
  • 56. 56 future work: combine framework with SHACL derive SHACL extensions define mappings from SHACL to the abstraction layer and back maintain consistency of implementations of constraint types future work W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group
  • 57. 57 summary of main contributions development of 3 RDF vocabularies direct validation of XML using common RDF validation tools publication of 81 constraint types validation framework for RDF-based constraint languages role of reasoning for RDF validation THANK YOU!
  • 58. 58 acknowledgements, publications, research data 30 publications 6 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports 1. author of all (except 1) journal articles, conference articles, workshop articles research data and results KIT research data repository: http://dx.doi.org/10.5445/BWDD/11 GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis 4 international working groups DCMI RDF Application Profiles Task Group part of the editorial board RDF Vocabularies Working Group editor for DDI-RDF and PHDD W3C RDF Data Shapes Working Group DDI Moving Forward Project THANK YOU!
  • 60. 60 publications: journal articles 1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing, 10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc 2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4 3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4 4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4 5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4 6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760 Please note that in 2015, my last name changed from Bosch to Hartmann.
  • 61. 61 publications: articles in conference proceedings 1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/ 2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368 3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867 4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257 5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc- 2014/paper/view/270 Please note that in 2015, my last name changed from Bosch to Hartmann.
  • 62. 62 publications: articles in conference proceedings 6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654 7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34 8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html 9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html Please note that in 2015, my last name changed from Bosch to Hartmann.
  • 63. 63 publications: articles in workshop proceedings Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/ 2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings 3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/
  • 64. 64 publications: specifications Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery 2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html
  • 65. 65 publications: technical reports Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062 2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02 3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements 4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable 5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933
  • 66. 66 publications: technical reports Please note that in 2015, my last name changed from Bosch to Hartmann. 6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479 7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478 8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470 9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/ 10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series
  • 67. 67 research questions 1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? 2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? 3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data? 4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another? 5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? appendix
  • 68. 68 summary of contributions 1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies 2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains 3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints 4.1 Consistent validation across RDF-based constraint languages 4.2 Minimal representation of constraints of any type 4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages 4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another 5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type (1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics 6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality appendix
  • 69. 69 summary of limitations 1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way 2. Constraints of supported constraint types and constraint language constructs must be representable in RDF 3. Constraint languages and supported constraint types must be expressible in SPARQL 4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies appendix
  • 71. 71 Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? research question 1 RQ1 IASSIST Quarterly, 38(4) & 39(1), 7-16 IASSIST Quarterly, 38(4) & 39(1), 17-24 IASSIST Quarterly, 38(4) & 39(1), 25-37 IASSIST Quarterly, 38(4) & 39(1), 38-46 LDOW (WWW 2013) SemStats (ISWC 2013) DC 2012 ESWC 2011 (Poster) DDI Moving Forward Project RDF Vocabularies Working Group
  • 72. 72 development of 3 RDF vocabularies: 1. DDI-RDF Discovery Vocabulary (DDI-RDF) to describe unit-record data 2. Physical Data Description (PHDD) to describe data in tabular format and its physical properties 3. The SKOS Extension for Statistics (XKOS) to describe the structure and textual properties of formal statistical classifications to describe relations between classifications and concepts and among concepts contribution RQ1
  • 74. 74 XML, XML Schema (XSD) RDF, Web Ontology Language (OWL) XML Schemas > OWL ontologies time-consuming work designing domain ontologies from scratch by hand reuse information contained in XML Schemas designing OWL domain ontologies RQ2
  • 75. 75 How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? research question 2 RQ2 IJMSO, 8(3) ISWC 2012 ICITST 2011 OCAS (ISWC 2011)
  • 76. 76 sub-class relationships OWL hasValue restrictions on data properties OWL universal restrictions on object properties semantically rich OWL axioms <library> <book year="February 1890"> <author> <name>Arthur Conan Doyle</name> </author> <title>The Sign of the Four</title> </book> </library> Title ⊑  value.string Year ⊑  value.integer RQ2
  • 77. 77 on formal logics based transformations OWL axioms extracted out of XML Schemas explicitly implicitly formally underpin transformations to formally define and model semantics in a semantically correct way complete extraction of XML Schemas' structural information XML can directly be validated against semantically rich OWL axioms any XML Schema is convertible to OWL minimized effort designing OWL domain ontologies contributions IJMSO, 8(3) RQ2
  • 78. 78 ISWC 2012 ICITST 2011 OCAS (ISWC 2011) RQ2
  • 79. 79 1. step of approach executed generic test cases created out of the XML Schema meta-model transformed XML Schemas of 6 XML standards 2. step of approach specified SWRL rules for 3 OWL domain ontologies verified hypothesis determined effort for traditional manual approach estimated effort for semi-automatic approach DDI-RDF serves as OWL domain ontology The effort and the time needed to deliver high quality domain ontologies from scratch by reusing information of already existing XML Schemas is much less than creating domain ontologies completely manually and from the ground up. evaluation IJMSO, 8(3) RQ2
  • 81. 81 What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5 RQ5 SEMANTiCS 2015
  • 82. 82 What is the role reasoning plays in practical data validation? research question 5-1 RQ5
  • 83. 83 reasoning may resolve violations Book ⊑  author.Person Book(Huckleberry-Finn) author(Huckleberry-Finn, Mark-Twain) → Person(Mark-Twain) RQ5
  • 84. 84 reasoning may cause violations Publication ⊑ ∃ publisher.Publisher Book(Huckleberry-Finn) Book ⊑ Publication RQ5
  • 85. 85 reasoning solves redundency Publication ⊑ ∃ publicationDate . xsd:date Book ⊑ Publication Conference-Proceeding ⊑ Publication Journal-Article ⊑ Publication RQ5
  • 86. 86 For which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5-2 RQ5
  • 87. 87 > 2/5 of constraint types property domains (R-25): constraint types with reasoning ∃ author.⊤ ⊑ Publication author(Alices-Adventures-In-Wonderland, Lewis-Carroll) → rdf:type(Alices-Adventures-In-Wonderland, Publication) RQ5
  • 88. 88 < 3/5 of constraint types literal pattern matching (R-44): constraint types without reasoning RQ5 ISBN a rdfs:Datatype ; owl:equivalentClass [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ([ xsd:pattern "^d{9}[d|X]$" ])] . Book ⊑  identifier.ISBN
  • 89. 89 For which constraint types validation results differ (1) if the CWA or the OWA and (2) if the UNA or the nUNA is assumed? CWA dependent: 56.8% UNA dependent: 66.6% research question 5-3 RQ5
  • 90. 90 56.8% of constraint types minimum qualified cardinality restrictions (R-75): CWA dependent constraint types RQ5 Book ⊑ ∃ title.⊤
  • 91. 91 disjoint classes (R-7): CWA independent constraint types RQ5 Book ⊓ JournalArticle ⊑ ⊥
  • 92. 92 66.6% of constraint types functional properties (R-57/65): UNA dependent constraint types RQ5 funct(title) title(The-Adventures-of-Huckleberry-Finn, "The Adventures of Huckleberry Finn") title(The-Adventures-of-Huckleberry-Finn, "Die Abenteuer des Huckleberry Finn")
  • 93. 93 literal value comparison (R-43): UNA independent constraint types RQ5 birthDate(Albert-Einstein, "1955-04-18") deathDate(Albert-Einstein, "1879-03-14") birthDate(Albert_Einstein, "1879-03-14") deathDate(Albert_Einstein, "1955-04-18") owl:sameAs(Albert-Einstein, Albert_Einstein)
  • 95. 95 collected, classified, and implemented 115 constraints from vocabularies or domain experts on 3 common vocabularies well-established (QB, SKOS) under development (DDI-RDF) evaluation evaluation IJSC, 10(2) ICSC 2016 33 SPARQL endpoints
  • 96. 96 classification of constraint types RDFS/OWL based constraint language based SPARQL based classification of constraints informational warning error evaluation classification
  • 97. 97 RDFS/OWL based evaluation classification of constraint types :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  • 98. 98 constraint language based evaluation classification of constraint types :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  • 99. 99 SPARQL based evaluation classification of constraint types SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  • 100. 100 C (constraints), CV (constraint violations) values in % evaluation finding 1 C CV SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8
  • 101. 101 C (constraints), CV (constraint violations) values in % evaluation finding 2 C CV SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8
  • 102. 102 C (constraints), CV (constraint violations) values in % evaluation finding 3 C CV Info 42.3 31.3 Warning 18.7 62.7 Error 39.0 6.1
  • 104. 104 future work: RQ1 publication of RDF vocabularies DDI Alliance specifications W3C recommendation for DDI-RDF DDI-Lifecycle MD (Model-Driven) new requirements based on experiences with DDI-RDF international working group: DDI Moving Forward Project individual contributions formalize conceptual model (using UML 2) conceptualize and implement diverse model serializations (e.g., RDFS/OWL) future work
  • 105. 105 aligning PHDD and CSV on the WEB overlap in the description of tabular data in CSV format broader scope of PHDD description of tabular data with fixed record length description of tabular data with multiple records per case evaluation for use in DDI-Lifecycle MD future work: RQ1 future work
  • 106. 106 future work: RQ2 bidirectional transformations from models of any meta-model to OWL generalize from XSD meta-model based unidirectional transformations from XSD models into OWL models enable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools future work
  • 107. 107 future work: validation database and framework maintain and extend RDF validation database collect case studies and use cases extract requirements publish constraint types keep framework in sync evaluate solutions future work http://purl.org/net/rdf-validation
  • 108. 108 future work: combine framework with SHACL derive SHACL extensions define mappings from SHACL to the abstraction layer and back maintain consistency of implementations of constraint types future work W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group