SlideShare a Scribd company logo
1 of 45
Creative Commons CC BY 3.0:
allowed to share & remix
(also commercial)
but must attribute
Frank van Harmelen
The empirical turn
in
Knowledge Representation
Contributions from many people
in the KR&R group over many years.
And thanks to NWO for
a 750k€ TOP grant for this
KR in the pre-empirical era
Handbook of Knowledge Representation
(1000 pages, ToC alone is 14 pages)
• propositional logic &
satisfiability solvers
• first order logic &
resolution
• description logic
• constraint (logic)
programming
• nonmonotonic reasoning
• belief revision
• qualitative reasoning
• model-based diagnosis
• bayesian networks
• temporal logic
• spatial reasoning
• epistemic logic
• deontic logic
• situation calculus
• default logic
• event calculus
• ……
KR metrics
in the pre-empirical era
KR = logic
• Show small examples
• Prove properties
(expressivity, complexity)
• Give algorithms
(sound, complete)
KR = engineering
• Build applications
• Show high performance
• Show low engineering
costs
BUT AN EXPERIMENT
IN THE PAST 10 YEARS
MADE IT POSSIBLE
TO DO SOMETHING VERY DIFFERENT:
OBSERVE HOW
KNOWLEDGE REPRESENTATIONS BEHAVE
AT VERY LARGE SCALE
Rest of the talk
• Which KR’s were part of the experiment?
• How much of it was there to observe?
• How did we manage to observe it?
• What did we learn from observing it?
Which KR’s ?
RDF (for non-logicians)
RDF (for logicians)
• ground binary predicate: 𝑃(𝑂1, 𝑂2)
• Limited existential variables:
∃𝑥: 𝑃 𝐶1, 𝑥 ∧ 𝑃 𝐶2, 𝑥
• Type is unary predicate: 𝑇𝑖 𝑥
• Subtypes ∀𝑥: 𝑇1 𝑥 → 𝑇2(𝑥)
• Type restrictions ∀𝑥, 𝑦: 𝑃 𝑥, 𝑦 → 𝑇1 𝑥 ∧ 𝑇2(𝑦)
• Equality: 𝑂1= 𝑂2
• Extensions to DL:
– Distjointness of types
– Cardinality restrictions (0,1)
– always decidable: sub-FOL.
RDF deduction
OWL Semantics
How much is there
to observe?
± 45-100 billion facts
1 fact
How big is 100 billion
Denny Vrandečić – AIFB, Universität Karlsruhe ≈ 1 fact per web-page
100 billion golfballs ≈ Jupiter
x T
[<x> IsOfType <T>]
different
owners & locations
< analgesic >
BTW: How did it get so big?
On the Web,
anybody can say anything about anything
BTW: How did it get so big?
On the Web,
anybody can say anything about anything
x T
R
How did you
manage to
observe it?
LOD Laundromat
Beek & Rietveld et al. 2014,
LOD laundromat: a uniform way of
publishing other people's dirty data
http://lodlaundromat.org/pdf/lodla
undry.pdf
HDT
Fernández & Martínez-Prieto &
Gutiérrez, 2013, Binary RDF
representation for publication and
exchange (HDT)
LDF
Verborgh & Vander Sande et al.
2014, Web-Scale Querying through
Linked Data Fragments
LOD-a-lot
http://lod-a-lot.lod.labs.vu.nl/
Surprisingly efficient
1 file
28,362,198,927 unique triples
>650K data documents
524 GB of disk space
16 GB of RAM
Only €305,- hardware cost
Meta-Data for a lot of LOD
http://www.semantic-web-journal.net/content/meta-data-lot-lod-2
Statistics (boring)
triples 28,362,198,927
subject 3,214,347,198
predicates 1,168,932
objects 3,178,409,386
literals 5.3B
Re-use is fairly high… or not…
Analysing
Logical identity
Joe Raad Wouter Beek
ESWC2018, under submission
Identity clusters
LOD-a-lot File
http: //lod-a-lot.lod.labs.vu.nl
[Fernández 2017]
558 millions owl:sameAs (309 millions distinct terms)
≈ 4 hours
1. Extracting all owl:sameAs statements on the LOD
HDT File
(4.5 GB)
HDT File
(4.5 GB)
Identity
Closure
1
Identity
Closure
2
Identity
Closure
89 387 082…
- The largest Identity Closure contains 177 794 terms
(contains all the countries in the world, Albert Enstein, « empty string », etc.)
- The smallest Identity Closure contains 2 terms
x owl:sameAs y
z owl:sameAs y
Identity Closure x y z
2. Generating the Identity Closure
Identity Closure « Cities »
3. Detecting Communities (using the Louvain Algorithm)
This network (i.e. identity closure) has a community structure, as it can be grouped into
different sets of nodes, with each set of nodes being densely connected internally.
Goal: Find (and later Evaluate) the most “suspicious” identity links (i.e. the links
between different communities)
4. Application: debugging identity statements
Identity closure
containing the term
“dbpedia.org/page/Barack_Obama”
This Identity Closure contains 388 terms
(i.e. 387 distinct terms are owl:sameAs this term)
95 communities detected
largest community = 99 terms
4. Application: debugging identity statements
comm
0
comm
3
2 links
Community 0
1. dbpedia.org/resource/B_hussein_obama
2. dbpedia.org/resource/Barack_H_Obama,_Jr
3. dbpedia.org/resource/Barak_hussein_obama
4. dbpedia.org/resource/President_Barack
5. dbpedia.org/resource/Senator_Barack_Obama
6. dbpedia.org/resource/Obama
…
99. dbpedia.org/resource/Hussein_Obama
Community 3
1. dbpedia.org/resource/Presidency_of_Barack_Obama
2. dbpedia.org/resource/Barack_Obama_Administration
3. dbpedia.org/resource/Barack_Obama_Cabinet
4. dbpedia.org/resource/Obama_White_House
5. dbpedia.org/resource/Obama_regime
6. dbpedia.org/resource/America_under_Obama
…
52. dbpedia.org/resource/Presidential_transition_of_Barac
k_Obama
Symbols or words?
Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016)
http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
Symbols or words?
Symbol names are supposed to be meaningless
Aspirin headache
analgesic pain
symptomdrug
treats
treats
Measure mutual information content
between string and semantics of a symbol
E(x) = efficient encoding of x
Mutual information content
M(x,y) =E(x) + E(y) – E(x,y)
Take x = symbol name of x as a string
Take 𝑦1 = {types of x} ≈ semantics of x
Take 𝑦2 = {properties of x} ≈ semantics of x
Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols
in 600k datasets
But variables do encode meaning!
Fraction of datasets with redundancy for types/predicates
at significance level > 0.99
BTW, this is 600.000 datapoints (RDF docs)
Very different
network structures
for different predicates
Tobias Kuhn Wouter Beek
http://ceur-ws.org/Vol-1946/paper-05.pdf
skos:exactMatch
foaf:knows
osspr:contains
Geopolitics:hasborderWith
Summary
&
So what…
• We now have larger KB’s than ever before
• We now have the instruments
to observe and analyse these very large KB’s
• We can use these insights for better tools:
– query & inference
– publish & maintain
– visualise & explain
– …
But my secret hope is that this will help us
to understand the patterns of knowledge:
AI as a computational theory of knowledge

More Related Content

What's hot

One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebVictor de Boer
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples Victor de Boer
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemNIT Durgapur
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF AgeINRIA-OAK
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data ScienceKrishna Sankar
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLJane Frazier
 
20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedom20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedomStefan Gradmann
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenSören Auer
 
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...Matt Stubbs
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationRobert Sanderson
 

What's hot (19)

One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
SWT Lecture Session 8 - Rules
SWT Lecture Session 8 - RulesSWT Lecture Session 8 - Rules
SWT Lecture Session 8 - Rules
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedom20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedom
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für Unternehmen
 
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
 

Similar to The Empirical Turn in Knowledge Representation

Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked DataJane Stevenson
 
(PROJEKTURA) open data big data @tgg osijek
(PROJEKTURA) open data big data @tgg osijek(PROJEKTURA) open data big data @tgg osijek
(PROJEKTURA) open data big data @tgg osijekRatko Mutavdzic
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1manujam
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards GapDan Brickley
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataPat Kenny
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark GreavesMediabistro
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked DataHyun Namgoong
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities Getaneh Alemu
 

Similar to The Empirical Turn in Knowledge Representation (20)

Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
(PROJEKTURA) open data big data @tgg osijek
(PROJEKTURA) open data big data @tgg osijek(PROJEKTURA) open data big data @tgg osijek
(PROJEKTURA) open data big data @tgg osijek
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Providing geospatial information as Linked Open Data
Providing geospatial information as Linked Open DataProviding geospatial information as Linked Open Data
Providing geospatial information as Linked Open Data
 
Sem web tutorial general
Sem web tutorial generalSem web tutorial general
Sem web tutorial general
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 

More from Frank van Harmelen

The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"Frank van Harmelen
 
Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Frank van Harmelen
 
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Frank van Harmelen
 
Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Frank van Harmelen
 
The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)Frank van Harmelen
 
On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...Frank van Harmelen
 
The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)Frank van Harmelen
 
Linked Open Data for Medical Guidelines Interactions
Linked Open Data for Medical  Guidelines InteractionsLinked Open Data for Medical  Guidelines Interactions
Linked Open Data for Medical Guidelines InteractionsFrank van Harmelen
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Frank van Harmelen
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural scienceFrank van Harmelen
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic WebFrank van Harmelen
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Frank van Harmelen
 
Ontology mapping needs context & approximation
Ontology mapping needs context & approximationOntology mapping needs context & approximation
Ontology mapping needs context & approximationFrank van Harmelen
 
Ontology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel TowerOntology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel TowerFrank van Harmelen
 

More from Frank van Harmelen (20)

The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"
 
Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)
 
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019
 
Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019
 
The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)
 
On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...
 
The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)
 
Linked Open Data for Medical Guidelines Interactions
Linked Open Data for Medical  Guidelines InteractionsLinked Open Data for Medical  Guidelines Interactions
Linked Open Data for Medical Guidelines Interactions
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
Het slimme Web 3.0
Het slimme Web 3.0Het slimme Web 3.0
Het slimme Web 3.0
 
OWL briefing
OWL briefingOWL briefing
OWL briefing
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...
 
Ontology mapping needs context & approximation
Ontology mapping needs context & approximationOntology mapping needs context & approximation
Ontology mapping needs context & approximation
 
Ontology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel TowerOntology Mapping - Out Of The Babel Tower
Ontology Mapping - Out Of The Babel Tower
 

Recently uploaded

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialMarkus Roggen
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 

Recently uploaded (20)

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s Potential
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 

The Empirical Turn in Knowledge Representation

  • 1. Creative Commons CC BY 3.0: allowed to share & remix (also commercial) but must attribute Frank van Harmelen The empirical turn in Knowledge Representation Contributions from many people in the KR&R group over many years. And thanks to NWO for a 750k€ TOP grant for this
  • 2. KR in the pre-empirical era
  • 3. Handbook of Knowledge Representation (1000 pages, ToC alone is 14 pages) • propositional logic & satisfiability solvers • first order logic & resolution • description logic • constraint (logic) programming • nonmonotonic reasoning • belief revision • qualitative reasoning • model-based diagnosis • bayesian networks • temporal logic • spatial reasoning • epistemic logic • deontic logic • situation calculus • default logic • event calculus • ……
  • 4. KR metrics in the pre-empirical era KR = logic • Show small examples • Prove properties (expressivity, complexity) • Give algorithms (sound, complete) KR = engineering • Build applications • Show high performance • Show low engineering costs
  • 5. BUT AN EXPERIMENT IN THE PAST 10 YEARS MADE IT POSSIBLE TO DO SOMETHING VERY DIFFERENT: OBSERVE HOW KNOWLEDGE REPRESENTATIONS BEHAVE AT VERY LARGE SCALE
  • 6.
  • 7. Rest of the talk • Which KR’s were part of the experiment? • How much of it was there to observe? • How did we manage to observe it? • What did we learn from observing it?
  • 10. RDF (for logicians) • ground binary predicate: 𝑃(𝑂1, 𝑂2) • Limited existential variables: ∃𝑥: 𝑃 𝐶1, 𝑥 ∧ 𝑃 𝐶2, 𝑥 • Type is unary predicate: 𝑇𝑖 𝑥 • Subtypes ∀𝑥: 𝑇1 𝑥 → 𝑇2(𝑥) • Type restrictions ∀𝑥, 𝑦: 𝑃 𝑥, 𝑦 → 𝑇1 𝑥 ∧ 𝑇2(𝑦) • Equality: 𝑂1= 𝑂2 • Extensions to DL: – Distjointness of types – Cardinality restrictions (0,1) – always decidable: sub-FOL.
  • 13. How much is there to observe?
  • 15. 1 fact How big is 100 billion
  • 16. Denny Vrandečić – AIFB, Universität Karlsruhe ≈ 1 fact per web-page 100 billion golfballs ≈ Jupiter
  • 17. x T [<x> IsOfType <T>] different owners & locations < analgesic > BTW: How did it get so big? On the Web, anybody can say anything about anything
  • 18. BTW: How did it get so big? On the Web, anybody can say anything about anything x T R
  • 19. How did you manage to observe it?
  • 20.
  • 21.
  • 22. LOD Laundromat Beek & Rietveld et al. 2014, LOD laundromat: a uniform way of publishing other people's dirty data http://lodlaundromat.org/pdf/lodla undry.pdf HDT Fernández & Martínez-Prieto & Gutiérrez, 2013, Binary RDF representation for publication and exchange (HDT) LDF Verborgh & Vander Sande et al. 2014, Web-Scale Querying through Linked Data Fragments
  • 24. Surprisingly efficient 1 file 28,362,198,927 unique triples >650K data documents 524 GB of disk space 16 GB of RAM Only €305,- hardware cost Meta-Data for a lot of LOD http://www.semantic-web-journal.net/content/meta-data-lot-lod-2
  • 25. Statistics (boring) triples 28,362,198,927 subject 3,214,347,198 predicates 1,168,932 objects 3,178,409,386 literals 5.3B
  • 26. Re-use is fairly high… or not…
  • 27. Analysing Logical identity Joe Raad Wouter Beek ESWC2018, under submission
  • 28. Identity clusters LOD-a-lot File http: //lod-a-lot.lod.labs.vu.nl [Fernández 2017] 558 millions owl:sameAs (309 millions distinct terms) ≈ 4 hours 1. Extracting all owl:sameAs statements on the LOD HDT File (4.5 GB)
  • 29. HDT File (4.5 GB) Identity Closure 1 Identity Closure 2 Identity Closure 89 387 082… - The largest Identity Closure contains 177 794 terms (contains all the countries in the world, Albert Enstein, « empty string », etc.) - The smallest Identity Closure contains 2 terms x owl:sameAs y z owl:sameAs y Identity Closure x y z 2. Generating the Identity Closure
  • 30.
  • 31. Identity Closure « Cities » 3. Detecting Communities (using the Louvain Algorithm) This network (i.e. identity closure) has a community structure, as it can be grouped into different sets of nodes, with each set of nodes being densely connected internally. Goal: Find (and later Evaluate) the most “suspicious” identity links (i.e. the links between different communities)
  • 32. 4. Application: debugging identity statements Identity closure containing the term “dbpedia.org/page/Barack_Obama” This Identity Closure contains 388 terms (i.e. 387 distinct terms are owl:sameAs this term) 95 communities detected largest community = 99 terms
  • 33. 4. Application: debugging identity statements comm 0 comm 3 2 links Community 0 1. dbpedia.org/resource/B_hussein_obama 2. dbpedia.org/resource/Barack_H_Obama,_Jr 3. dbpedia.org/resource/Barak_hussein_obama 4. dbpedia.org/resource/President_Barack 5. dbpedia.org/resource/Senator_Barack_Obama 6. dbpedia.org/resource/Obama … 99. dbpedia.org/resource/Hussein_Obama Community 3 1. dbpedia.org/resource/Presidency_of_Barack_Obama 2. dbpedia.org/resource/Barack_Obama_Administration 3. dbpedia.org/resource/Barack_Obama_Cabinet 4. dbpedia.org/resource/Obama_White_House 5. dbpedia.org/resource/Obama_regime 6. dbpedia.org/resource/America_under_Obama … 52. dbpedia.org/resource/Presidential_transition_of_Barac k_Obama
  • 34. Symbols or words? Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016) http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
  • 35. Symbols or words? Symbol names are supposed to be meaningless Aspirin headache analgesic pain symptomdrug treats treats
  • 36. Measure mutual information content between string and semantics of a symbol E(x) = efficient encoding of x Mutual information content M(x,y) =E(x) + E(y) – E(x,y) Take x = symbol name of x as a string Take 𝑦1 = {types of x} ≈ semantics of x Take 𝑦2 = {properties of x} ≈ semantics of x Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols in 600k datasets
  • 37. But variables do encode meaning! Fraction of datasets with redundancy for types/predicates at significance level > 0.99 BTW, this is 600.000 datapoints (RDF docs)
  • 38. Very different network structures for different predicates Tobias Kuhn Wouter Beek http://ceur-ws.org/Vol-1946/paper-05.pdf
  • 44. • We now have larger KB’s than ever before • We now have the instruments to observe and analyse these very large KB’s • We can use these insights for better tools: – query & inference – publish & maintain – visualise & explain – …
  • 45. But my secret hope is that this will help us to understand the patterns of knowledge: AI as a computational theory of knowledge