SlideShare a Scribd company logo
1 of 14
Download to read offline
Knowledge Graph
Futures
Paul Groth | @pgroth | pgroth.com | indelab.org

ESWC 2020 Panel on Knowledge Graphs
Source:

Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor.
2019. Industry-scale Knowledge Graphs: Lessons and Challenges. Queue 17, 2, pages 20
(April 2019), 28 pages. DOI: https://doi.org/10.1145/3329781.3332266
The KG Assumption
Source:

Stephen H. Bach et al. 2019. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. In
Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA,
362-375. DOI: https://doi.org/10.1145/3299869.3314036
Simplicity & Openness
ML + KGs
Grappling with softness
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2129–2137
Marseille, 11–16 May 2020
c European Language Resources Association (ELRA), licensed under CC-BY-NC
2129
Towards Entity Spaces
Marieke van Erp⇤
, Paul Groth†
⇤
KNAW Humanities Cluster - DHLab, Amsterdam, NL
†
University of Amsterdam, Amsterdam, NL
marieke.van.erp@dh.huc.knaw.nl, p.groth@uva.nl
Abstract
Entities are a central element of knowledge bases and are important input to many knowledge-centric tasks including text analysis.
For example, they allow us to find documents relevant to a specific entity irrespective of the underlying syntactic expression within a
document. However, the entities that are commonly represented in knowledge bases are often a simplification of what is truly being
referred to in text. For example, in a knowledge base, we may have an entity for Germany as a country but not for the more fuzzy concept
of Germany that covers notions of German Population, German Drivers, and the German Government. Inspired by recent advances in
contextual word embeddings, we introduce the concept of entity spaces - specific representations of a set of associated entities with
near-identity. Thus, these entity spaces provide a handle to an amorphous grouping of entities. We developed a proof-of-concept for
English showing how, through the introduction of entity spaces in the form of disambiguation pages, the recall of entity linking can be
improved.
Keywords: entity, identity, knowledge representation, entity linking
1. Introduction
Entities are a central element for knowledge bases and
text analysis tasks (Balog, 2018). However, the way in
which entities are represented in knowledge bases and
how subsequent tools use these representations are a sim-
plification of the complexity of many entities. For ex-
ample, the entity Germany in Wikidata as represented
by wikidata:Q183 focuses on its properties as a location
and geopolitical entity due to its membership as an in-
stance of sovereign state, country, federal state, repub-
lic, social state, legal state, and administrative territorial
entity. Similarly, in DBpedia (version 2016-10), Ger-
many is represented as entity of type populated place and
some subtypes such as yago:WikicatFederalCountries and
yago:WikicatMemberStatesOfTheEuropeanUnion.1
However, when the term Germany is used in text, it can
take on many meanings that all have ‘something to do’ with
Germany as it is represented in knowledge bases, but are all
not quite the same:
(1) Germany imported 47,600 sheep from Britain last
year, nearly half of total imports.
(2) German July car registrations up 14.2 pct yr / yr.
(3) Australia last won the Davis Cup in 1986, but they
were beaten finalists against Germany three years
ago under Fraser’s guidance.
In Example (1), Germany refers partly to the location, but
a location usually cannot take on an active role, such that
the entity ‘importing’ the sheep is most likely a referent to
the German meat industry. Germany in Example (2), refers
to the German population buying and registering more cars
than a year before. Finally, in Example (3), Germany refers
to the German Davis Cup team from 1993 (the news article
is from 1996). In the AIDA-YAGO dataset, this entity is
tagged as dbp:Germany Davis Cup team but this presents
1
Germany also has rdf:type dbo:Person but we assume this
is a glitch.
us with another layer of identities, namely that every year,
or every couple of years, the German Davis cup team con-
sists of different players. In 1993, the German Davis cup
team consisted of Michael Stich and Marc-Kevin Goellner,
in 1996 of David Prinosil and Hendrik Dreekmann and at
the time of writing this article in 2019 of Alexander Zverev
and Philipp Kohlschreiber. Both MAG (Moussallem et al.,
2017) and DBpedia spotlight (Daiber et al., 2013a) annotate
Australia and Germany in Example (3) as dbp:Australia
and dbp:Germany respectively. While both the annotations
and automatic linkages are close to the identity of the en-
tity in resolving these referents to dbp:Germany, we argue
this is an underspecification and highlights a larger problem
with identity representation in knowledge bases.
Collapsing of identities has been a frequent topic within
Semantic Web discourse. However, most discussions have
focused on issues with owl:sameAs links (McCusker and
McGuinness, 2010; Raad et al., 2018). However, the prob-
lem of simplified entity representations (e.g. the collaps-
ing of identities) also occurs before the creation of such
owl:sameAs links. Specifically, with the fact that most
knowledge bases represent a single or limited number of
an entity’s facets. In this paper, we analyse the extent of
the problem by connecting Semantic Web representations
of identity to linguistic representations of entities, namely
coreference and near-identity. To overcome this identity
problem, we argue for the introduction of explicit represen-
tations of near-identity within knowledge bases. We term
these explicit representations - entity spaces. We illustrate
how the introduction of entity spaces can boost the perfor-
mance of state-of-the-art entity linking pipelines.
Our contributions are: 1) the definition of entity spaces; 2)
a prototype showing the use of entity spaces over multiple
entity linking pipelines; and 3) experiments on 13 English
entity linking datasets showing the impact of a more toler-
ant approach to entity linking made possible through entity
spaces.
Our code and experimental results are available via https:
//github.com/MvanErp/entity-spaces.
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
Concept
Resolution
14M
SD articles
475 M
triples
3.3 million
relations
49 M
relations
~15k ->
1M
entries
Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel
“Applying Universal Schemas for Domain Specific Ontology Expansion”
5th Workshop on Automated Knowledge Base Construction (AKBC) 2016
KG = Complicated Pipelines
Humans & Machines
knowledgescientist.org
Knowledge Engineering Revisited
• Knowledge graphs are built ad-hoc 

• 100s of components (extractors, scrapers, quality,
scoring,  user feedback, ….)

• Unique for each organization

• Existing knowledge engineering theory does not apply:

• Assumes small scale

• Assumes slow change

• People-centric

• Expressive representations 

• an updated theory and methods for knowledge
engineering designed for the demands of modern
knowledge graphs
Conclusion
• The change in frame to knowledge graphs has opened up thinking

• Research challenge #1 - the ramifications of language models

• Research challenge #2 - knowledge engineering theory revisited
Paul Groth | @pgroth | pgroth.com | indelab.org

Thanks to Daniel Daza, Thiviyan Thanapalsingam, Marieke van Erp and 

Frank van Harmelen

We’re
hiring!

More Related Content

What's hot

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructure20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructureStefan Gradmann
 
20130711 linked datascholarship_madrid
20130711 linked datascholarship_madrid20130711 linked datascholarship_madrid
20130711 linked datascholarship_madridStefan Gradmann
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madridStefan Gradmann
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofNurfadhlina Mohd Sharef
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphIntegrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphJennifer D'Souza
 

What's hot (20)

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructure20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructure
 
20130711 linked datascholarship_madrid
20130711 linked datascholarship_madrid20130711 linked datascholarship_madrid
20130711 linked datascholarship_madrid
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphIntegrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
 

Similar to Knowledge Graph Futures

Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Text mining and its association analysis.pdf
Text mining and its association analysis.pdfText mining and its association analysis.pdf
Text mining and its association analysis.pdfKyusonLim
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Reconciling Event-Based Knowledge through RDF2VEC
Reconciling Event-Based Knowledge through RDF2VECReconciling Event-Based Knowledge through RDF2VEC
Reconciling Event-Based Knowledge through RDF2VECMehwish Alam
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Oscar Corcho
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"Dr. Mirko Kämpf
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel FunctionBeibei Yang
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesFreddy Limpens
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataOscar Corcho
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docxPPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docxChantellPantoja184
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Fernando de Assis Rodrigues
 
Character-based Neural Embeddings for Tweet Clustering
Character-based  Neural Embeddings for Tweet ClusteringCharacter-based  Neural Embeddings for Tweet Clustering
Character-based Neural Embeddings for Tweet ClusteringSvitlana Vakulenko
 
rworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global Datarworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global DataDr. Volkan OBAN
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic webTony Dobaj
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 

Similar to Knowledge Graph Futures (20)

Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Text mining and its association analysis.pdf
Text mining and its association analysis.pdfText mining and its association analysis.pdf
Text mining and its association analysis.pdf
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Reconciling Event-Based Knowledge through RDF2VEC
Reconciling Event-Based Knowledge through RDF2VECReconciling Event-Based Knowledge through RDF2VEC
Reconciling Event-Based Knowledge through RDF2VEC
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel Function
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Topical_Facets
Topical_FacetsTopical_Facets
Topical_Facets
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docxPPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Character-based Neural Embeddings for Tweet Clustering
Character-based  Neural Embeddings for Tweet ClusteringCharacter-based  Neural Embeddings for Tweet Clustering
Character-based Neural Embeddings for Tweet Clustering
 
rworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global Datarworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global Data
 
AI Science
AI Science AI Science
AI Science
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

More from Paul Groth (15)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Knowledge Graph Futures

  • 1. Knowledge Graph Futures Paul Groth | @pgroth | pgroth.com | indelab.org ESWC 2020 Panel on Knowledge Graphs
  • 2. Source: Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale Knowledge Graphs: Lessons and Challenges. Queue 17, 2, pages 20 (April 2019), 28 pages. DOI: https://doi.org/10.1145/3329781.3332266
  • 3.
  • 4. The KG Assumption Source: Stephen H. Bach et al. 2019. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 362-375. DOI: https://doi.org/10.1145/3299869.3314036
  • 6.
  • 8.
  • 9. Grappling with softness Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2129–2137 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC 2129 Towards Entity Spaces Marieke van Erp⇤ , Paul Groth† ⇤ KNAW Humanities Cluster - DHLab, Amsterdam, NL † University of Amsterdam, Amsterdam, NL marieke.van.erp@dh.huc.knaw.nl, p.groth@uva.nl Abstract Entities are a central element of knowledge bases and are important input to many knowledge-centric tasks including text analysis. For example, they allow us to find documents relevant to a specific entity irrespective of the underlying syntactic expression within a document. However, the entities that are commonly represented in knowledge bases are often a simplification of what is truly being referred to in text. For example, in a knowledge base, we may have an entity for Germany as a country but not for the more fuzzy concept of Germany that covers notions of German Population, German Drivers, and the German Government. Inspired by recent advances in contextual word embeddings, we introduce the concept of entity spaces - specific representations of a set of associated entities with near-identity. Thus, these entity spaces provide a handle to an amorphous grouping of entities. We developed a proof-of-concept for English showing how, through the introduction of entity spaces in the form of disambiguation pages, the recall of entity linking can be improved. Keywords: entity, identity, knowledge representation, entity linking 1. Introduction Entities are a central element for knowledge bases and text analysis tasks (Balog, 2018). However, the way in which entities are represented in knowledge bases and how subsequent tools use these representations are a sim- plification of the complexity of many entities. For ex- ample, the entity Germany in Wikidata as represented by wikidata:Q183 focuses on its properties as a location and geopolitical entity due to its membership as an in- stance of sovereign state, country, federal state, repub- lic, social state, legal state, and administrative territorial entity. Similarly, in DBpedia (version 2016-10), Ger- many is represented as entity of type populated place and some subtypes such as yago:WikicatFederalCountries and yago:WikicatMemberStatesOfTheEuropeanUnion.1 However, when the term Germany is used in text, it can take on many meanings that all have ‘something to do’ with Germany as it is represented in knowledge bases, but are all not quite the same: (1) Germany imported 47,600 sheep from Britain last year, nearly half of total imports. (2) German July car registrations up 14.2 pct yr / yr. (3) Australia last won the Davis Cup in 1986, but they were beaten finalists against Germany three years ago under Fraser’s guidance. In Example (1), Germany refers partly to the location, but a location usually cannot take on an active role, such that the entity ‘importing’ the sheep is most likely a referent to the German meat industry. Germany in Example (2), refers to the German population buying and registering more cars than a year before. Finally, in Example (3), Germany refers to the German Davis Cup team from 1993 (the news article is from 1996). In the AIDA-YAGO dataset, this entity is tagged as dbp:Germany Davis Cup team but this presents 1 Germany also has rdf:type dbo:Person but we assume this is a glitch. us with another layer of identities, namely that every year, or every couple of years, the German Davis cup team con- sists of different players. In 1993, the German Davis cup team consisted of Michael Stich and Marc-Kevin Goellner, in 1996 of David Prinosil and Hendrik Dreekmann and at the time of writing this article in 2019 of Alexander Zverev and Philipp Kohlschreiber. Both MAG (Moussallem et al., 2017) and DBpedia spotlight (Daiber et al., 2013a) annotate Australia and Germany in Example (3) as dbp:Australia and dbp:Germany respectively. While both the annotations and automatic linkages are close to the identity of the en- tity in resolving these referents to dbp:Germany, we argue this is an underspecification and highlights a larger problem with identity representation in knowledge bases. Collapsing of identities has been a frequent topic within Semantic Web discourse. However, most discussions have focused on issues with owl:sameAs links (McCusker and McGuinness, 2010; Raad et al., 2018). However, the prob- lem of simplified entity representations (e.g. the collaps- ing of identities) also occurs before the creation of such owl:sameAs links. Specifically, with the fact that most knowledge bases represent a single or limited number of an entity’s facets. In this paper, we analyse the extent of the problem by connecting Semantic Web representations of identity to linguistic representations of entities, namely coreference and near-identity. To overcome this identity problem, we argue for the introduction of explicit represen- tations of near-identity within knowledge bases. We term these explicit representations - entity spaces. We illustrate how the introduction of entity spaces can boost the perfor- mance of state-of-the-art entity linking pipelines. Our contributions are: 1) the definition of entity spaces; 2) a prototype showing the use of entity spaces over multiple entity linking pipelines; and 3) experiments on 13 English entity linking datasets showing the impact of a more toler- ant approach to entity linking made possible through entity spaces. Our code and experimental results are available via https: //github.com/MvanErp/entity-spaces.
  • 10. Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction Concept Resolution 14M SD articles 475 M triples 3.3 million relations 49 M relations ~15k -> 1M entries Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel “Applying Universal Schemas for Domain Specific Ontology Expansion” 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 KG = Complicated Pipelines
  • 13. Knowledge Engineering Revisited • Knowledge graphs are built ad-hoc • 100s of components (extractors, scrapers, quality, scoring,  user feedback, ….) • Unique for each organization • Existing knowledge engineering theory does not apply: • Assumes small scale • Assumes slow change • People-centric • Expressive representations • an updated theory and methods for knowledge engineering designed for the demands of modern knowledge graphs
  • 14. Conclusion • The change in frame to knowledge graphs has opened up thinking • Research challenge #1 - the ramifications of language models • Research challenge #2 - knowledge engineering theory revisited Paul Groth | @pgroth | pgroth.com | indelab.org Thanks to Daniel Daza, Thiviyan Thanapalsingam, Marieke van Erp and Frank van Harmelen We’re hiring!