SlideShare a Scribd company logo
1 of 43
7/2/19 Heiko Paulheim 1
Machine Learning & Embeddings
for Large Knowledge Graphs
Heiko Paulheim
7/2/19 Heiko Paulheim 2
Crossing the Bridge from the Other Side
7/2/19 Heiko Paulheim 3
Crossing the Bridge from the Other Side
• There are plenty of established ML and DM toolkits...
– Weka
– RapidMiner
– scikit-learn
– R
• ...implementing all your favorite algorithms...
– Naive Bayes
– Random Forests
– SVMs
– (Deep) Neural Networks
– ...
• ...but they all work on feature vectors, not graphs!
7/2/19 Heiko Paulheim 4
Typical Tasks
• Knowledge Graph Internal
– Type prediction
– Link prediction
– Link validation
• Knowledge Graph External
– i.e., using the KG as background knowledge in some other task
– e.g., content-based recommender systems
– e.g., predictive modeling
●
who is the next nobel prize winner?
Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics.
Scientific Programming, 2014
Xu et al.: Explainable Reasoning over Knowledge Graphs for Recommendation. ebay tech blog, 2019
7/2/19 Heiko Paulheim 5
Example: Knowledge Graph Internal
• Type prediction
– Many instances in KGs are not typed or have very abstract types
– e.g., many actors are just typed as persons
• Classic approach
– Exploit ontology
– Shown to be rather sensitive to noise
• Example: ontology-based typing of Germany in DBpedia
– Airport, Award, Building, City, Country, Ethnic Group, Genre,
Language, Military Conflict, Mountain, Mountain Range, Person
Function, Place, Populated Place, Race, Route of Transportation,
Settlement, Stadium, Wine Region
Paulheim & Bizer: Type Inference on Noisy RDF Data. ISWC, 2013
Melo et al.: Type Prediction in Noisy RDF Knowledge Bases using Hierarchical Multilabel Classification with
Graph and Latent Features. IJAIT, 2017
7/2/19 Heiko Paulheim 6
Example: Knowledge Graph Internal
• Alternative: learn model for type prediction
– Train classifier to predict types (binary or hierarchical)
– More noise tolerant
Paulheim & Bizer: Improving the quality of linked data using statistical distributions. IJSWIS, 2014
7/2/19 Heiko Paulheim 7
Example: Knowledge Graph External
• Example machine learning task: predicting book sales
ISBN City Sold
3-2347-3427-1 Darmstadt 124
3-43784-324-2 Mannheim 493
3-145-34587-0 Roßdorf 14
...
ISBN City Population ... Genre Publisher ... Sold
3-2347-3427-1 Darm-
stadt
144402 ... Crime Bloody
Books
... 124
3-43784-324-2 Mann-
heim
291458 … Crime Guns Ltd. … 493
3-145-34587-0 Roß-
dorf
12019 ... Travel Up&Away ... 14
...
→ Crime novels sell better in larger cities
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012
7/2/19 Heiko Paulheim 8
Example: The FeGeLOD Framework
IS B N
3 -2 3 4 7 -3 4 2 7 -1
C ity
D a r m s ta d t
# s o ld
1 2 4
N a m e d E n t it y
R e c o g n it io n
IS B N
3 -2 3 4 7 -3 4 2 7 - 1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t
F e a t u r e
G e n e r a t io n
IS B N
3 - 2 3 4 7 -3 4 2 7 -1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e / D a r m s ta d t
C ity _ U R I_ d b p e d ia -o w l: p o p u la tio n T o ta l
1 4 1 4 7 1
C ity _ U R I_ ...
...
F e a t u r e
S e le c t io n
IS B N
3 -2 3 4 7 -3 4 2 7 - 1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t
C ity _ U R I_ d b p e d ia -o w l:p o p u la tio n T o ta l
1 4 1 4 7 1
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012
7/2/19 Heiko Paulheim 9
The FeGeLOD Framework
• Entity Recognition
– Simple approach: guess DBpedia URIs
– Hit rate >95% for cities and countries (by English name)
• Feature Generation
– augmenting the dataset with additional attributes from KG
• Feature Selection
– Filter noise: >95% unknown, identical, or different nominals
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012
7/2/19 Heiko Paulheim 10
Propositionalization
• Bridge Problem: Knowledge Graphs
vs. ML algorithms expecting Feature Vectors
→ wanted: a transformation from nodes to sets of features
?
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014
7/2/19 Heiko Paulheim 11
Propositionalization
• Bridge Problem: Knowledge Graphs
vs. ML algorithms expecting Feature Vectors
→ wanted: a transformation from nodes to sets of features
• Basic strategies:
– literal values (e.g., population) are used directly
– instance types become binary features
– relations are counted (absolute, relative, TF-IDF)
– combinations of relations and object types are counted
(absolute, relative, TF-IDF)
– ...
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014
7/2/19 Heiko Paulheim 12
Propositionalization ctd.
• Observations
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
●
Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
●
Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
●
…
– But
●
The search space is enormous!
●
Generate first, filter later does not scale well
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014
7/2/19 Heiko Paulheim 13
From Naive Propositionalization to
Knowledge Graph Embeddings
• Reconsidering the previous examples:
– We want to predict some attribute of a KG entity
●
e.g., types
●
e.g., sales figures of books
– ...given the entity’s vector representation
• How do we get a “good” vector representation for an entity?
– ...and: what is “good” in the first place?
7/2/19 Heiko Paulheim 14
From Naive Propositionalization to
Knowledge Graph Embeddings
• How do we get a “good” vector representation for an entity?
– ...and: what is “good” in the first place?
• “good” for machine learning means separable
– similar entities are close together
– different entities are further away
https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/
7/2/19 Heiko Paulheim 15
A Brief Excursion to word2vec
• A vector space model for words
• Introduced in 2013
• Each word becomes a vector
– similar words are close
– relations are preserved
– vector arithmetics are possible
https://www.adityathakker.com/introduction-to-word2vec-how-it-works/
7/2/19 Heiko Paulheim 16
A Brief Excursion to word2vec
• Assumption:
– Similar words appear in similar contexts
{Bush,Obama,Trump} was elected president of the United States
United States president {Bush,Obama,Trump} announced…
…
• Idea
– Train a network that can predict a word from its context (CBOW)
or the context from a word (Skip Gram)
Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
7/2/19 Heiko Paulheim 17
A Brief Excursion to word2vec
• Skip Gram: train a neural network with one hidden layer
• Use output values at hidden layer as vector representation
• Observation:
– Bush, Obama, Trump will activate similar context words
– i.e., their output weights at the projection layer have to be similar
Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
7/2/19 Heiko Paulheim 18
From word2vec to RDF2vec
• Word2vec operates on sentences, i.e., sequences of words
• Idea of RDF2vec
– First extract “sentences” from a graph
– Then train embedding using RDF2vec
• “Sentences” are extracted by performing random graph walks:
Year Zero Nine Inch Nails Trent
Reznor
• Experiments
– RDF2vec can be trained on large KGs (DBpedia, Wikidata)
– 300-500 dimensional vectors outperform
other propositionalization strategies
artist member
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
7/2/19 Heiko Paulheim 19
From word2vec to RDF2vec
• RDF2vec example
– similar instances form clusters
– direction of relations is stable
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
7/2/19 Heiko Paulheim 20
From word2vec to RDF2vec
• RecSys example: using proximity in latent RDF2vec feature space
Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
7/2/19 Heiko Paulheim 21
Extensions of RDF2vec
• Maybe random walks are not such a good idea
– They may give too much weight on less-known entities and facts
●
Strategies:
– Prefer edges with more frequent predicates
– Prefer nodes with higher indegree
– Prefer nodes with higher PageRank
– …
– They may cover less-known entities and facts too little
●
Strategies:
– The opposite of all of the above strategies
• Bottom line of experimental evaluation:
– Not one strategy fits all
Cochez et al.: Biased Graph Walks for RDF Graph Embeddings. WIMS, 2017
7/2/19 Heiko Paulheim 22
Other Word Embedding Methods
• GloVe (Global Word Embedding Vectors)
• Computes embeddings out of co-occurence statistics
– Using matrix factorization
• Has been applied to random RDF walks as well
• Experimental evaluation:
– In some cases, RDFGloVe outperforms RDF2vec
https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-
glove.html
Cochez et al.: Global RDF Vector Space Embeddings, ISWC, 2017
7/2/19 Heiko Paulheim 23
Other Word Embedding Methods
• There is a lot of promising stuff not yet tried
– e.g., biasing walks based on human factors
– e.g., more recent word embedding methods such as ELMo and BERT
https://www.nbcnews.com/feature/nbc-out/bert-ernie-are-gay-couple-sesame-street-writer-claims-n910701
7/2/19 Heiko Paulheim 24
TransE and its Descendants
• In RDF2vec, relation preservation is a by-product
• TransE: direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
●
across all triples <s,p,o>
Σ ||s+p-o|| is minimized
●
try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016
7/2/19 Heiko Paulheim 25
Limitations of TransE
• Symmetric properties
– we have to minimize
||Barack + spouse – Michelle|| and ||Michelle + spouse – Barack||
simultaneously
– ideally, Barack + spouse = Michelle and Michelle + spouse = Barack
●
Michelle and Barack become infinitely close
●
spouse becomes 0 vector
Michelle
Barack
7/2/19 Heiko Paulheim 26
Limitations of TransE
• Transitive Properties
– we have to minimize
||Miami + partOf – Florida|| and ||Florida + partOf – USA||, but also
||Miami + partOf – USA||
– ideally, Miami + partOf = Florida, Florida + partOf = USA,
Miami + partOf = USA
●
Again: all three become infinitely close
●
partOf becomes 0 vector
Florida
Miami
USA
7/2/19 Heiko Paulheim 27
Limitations of TransE
• One to many properties
– we have to minimize
||New York + partOf – USA||, ||Florida + partOf – USA||,
||Ohio + partOf – USA||, …
– ideally, NewYork + partOf = USA, Florida + partOf = USA,
Ohio + partOf = USA
●
all the subjects become infinitely close
Florida
USA
New York
Ohio
7/2/19 Heiko Paulheim 28
Limitations of TransE
• Reflexive properties
– we have to minimize
||Tom + knows - Tom||
– ideally, Tom + knows = Tom
●
Knows becomes 0 vector
Tom
7/2/19 Heiko Paulheim 29
TransE
RDF2Vec
HolE
DistMult
RESCAL
NTN
TransR
TransH
TransD
KG2E
ComplEx
Limitations of TransE
• Numerous variants of TransE have been proposed
to overcome limitations (e.g., TransH, TransR, TransD, …)
• Plus: embedding approaches based on tensor factorization etc.
7/2/19 Heiko Paulheim 30
Are we Driving on the Wrong Side of the Road?
7/2/19 Heiko Paulheim 31
Are we Driving on the Wrong Side of the Road?
• Original ideas:
– Assign meaning to data
– Allow for machine inference
– Explain inference results to the user
Berners-Lee et al: The Semantic Web. Scientific American, May 2001
7/2/19 Heiko Paulheim 32
Running Example: Recommender Systems
• Content based recommender systems backed by Semantic Web
data
– (today: knowledge graphs)
• Advantages
– use rich background information about recommended items (for free)
– justifications can be generated (e.g., you like movies by that director)
https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/
7/2/19 Heiko Paulheim 33
The 2009 Semantic Web Layer Cake
7/2/19 Heiko Paulheim 34
The 2019 Semantic Web Layer Cake
Embeddings
7/2/19 Heiko Paulheim 35
Towards Semantic Vector Space Embeddings
cartoon
superhero
Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
7/2/19 Heiko Paulheim 36
The Holy Grail
• Combine semantics and embeddings
– e.g., directly create meaningful dimensions
– e.g., learn interpretation of dimensions a posteriori
– ...
7/2/19 Heiko Paulheim 37
A New Design Space
quantitative
performance
semantic
interpretability
7/2/19 Heiko Paulheim 38
Software to Check Out
• http://openke.thunlp.org/
– Implements many embedding approaches
– Pre-trained vectors available, e.g., for Wikidata
7/2/19 Heiko Paulheim 39
Software to Check Out
• Loading RDF in Python: https://github.com/RDFLib/rdflib
7/2/19 Heiko Paulheim 40
RapidMiner Linked Open Data Extension
caution: works
only until RM6! :-(
7/2/19 Heiko Paulheim 41
References (1)
• Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5),
28-37.
• Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating
embeddings for modeling multi-relational data. In NIPS (pp. 2787-2795).
• Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Biased graph walks for RDF
graph embeddings. In WIMS (p. 21). ACM.
• Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
• Melo, A., Völker, J., & Paulheim, H. (2017). Type prediction in noisy RDF knowledge bases using
hierarchical multilabel classification with graph and latent features. IJAIT, 26(02).
• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
• Paulheim, H., & Fümkranz, J. (2012). Unsupervised generation of data mining features from
linked open data. In WIMS (p. 31). ACM.
• Paulheim, H., & Bizer, C. (2013). Type inference on noisy RDF data. In International semantic
web conference (pp. 510-525). Springer, Berlin, Heidelberg.
7/2/19 Heiko Paulheim 42
References (2)
• Paulheim, H., & Bizer, C. (2014). Improving the quality of linked data using statistical
distributions. IJSWIS, 10(2), 63-86.
• Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation
methods. Semantic web, 8(3), 489-508.
• Paulheim, H. (2018). Make Embeddings Semantic Again! ISWC (Blue Sky Track)
• Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018).
Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
• Ristoski, P., & Paulheim, H. (2014). A comparison of propositionalization strategies for creating
features from linked open data. Linked Data for Knowledge Discovery, 6.
• Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer. Web
Semantics: Science, Services and Agents on the World Wide Web, 35, 142-151.
• Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A
comprehensive survey. Web semantics, 36, 1-22.
• Ristoski, P., & Paulheim, H. (2016). RDF2vec: RDF graph embeddings for data mining. In
International Semantic Web Conference (pp. 498-514). Springer, Cham.
• Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., & Paulheim, H. (2019). RDF2Vec: RDF graph
embeddings and their applications. Semantic Web, 10(4), 1-32.
7/2/19 Heiko Paulheim 43
Machine Learning & Embeddings
for Large Knowledge Graphs
Heiko Paulheim

More Related Content

What's hot

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

What's hot (20)

Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Cnn
CnnCnn
Cnn
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
 
딥러닝의 기본
딥러닝의 기본딥러닝의 기본
딥러닝의 기본
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And Applications
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 

Similar to Machine Learning & Embeddings for Large Knowledge Graphs

Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
Shenghui Wang
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Gezim Sejdiu
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 

Similar to Machine Learning & Embeddings for Large Knowledge Graphs (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data Mining
 
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the masses
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Data-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDData-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCD
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 

More from Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Machine Learning & Embeddings for Large Knowledge Graphs

  • 1. 7/2/19 Heiko Paulheim 1 Machine Learning & Embeddings for Large Knowledge Graphs Heiko Paulheim
  • 2. 7/2/19 Heiko Paulheim 2 Crossing the Bridge from the Other Side
  • 3. 7/2/19 Heiko Paulheim 3 Crossing the Bridge from the Other Side • There are plenty of established ML and DM toolkits... – Weka – RapidMiner – scikit-learn – R • ...implementing all your favorite algorithms... – Naive Bayes – Random Forests – SVMs – (Deep) Neural Networks – ... • ...but they all work on feature vectors, not graphs!
  • 4. 7/2/19 Heiko Paulheim 4 Typical Tasks • Knowledge Graph Internal – Type prediction – Link prediction – Link validation • Knowledge Graph External – i.e., using the KG as background knowledge in some other task – e.g., content-based recommender systems – e.g., predictive modeling ● who is the next nobel prize winner? Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics. Scientific Programming, 2014 Xu et al.: Explainable Reasoning over Knowledge Graphs for Recommendation. ebay tech blog, 2019
  • 5. 7/2/19 Heiko Paulheim 5 Example: Knowledge Graph Internal • Type prediction – Many instances in KGs are not typed or have very abstract types – e.g., many actors are just typed as persons • Classic approach – Exploit ontology – Shown to be rather sensitive to noise • Example: ontology-based typing of Germany in DBpedia – Airport, Award, Building, City, Country, Ethnic Group, Genre, Language, Military Conflict, Mountain, Mountain Range, Person Function, Place, Populated Place, Race, Route of Transportation, Settlement, Stadium, Wine Region Paulheim & Bizer: Type Inference on Noisy RDF Data. ISWC, 2013 Melo et al.: Type Prediction in Noisy RDF Knowledge Bases using Hierarchical Multilabel Classification with Graph and Latent Features. IJAIT, 2017
  • 6. 7/2/19 Heiko Paulheim 6 Example: Knowledge Graph Internal • Alternative: learn model for type prediction – Train classifier to predict types (binary or hierarchical) – More noise tolerant Paulheim & Bizer: Improving the quality of linked data using statistical distributions. IJSWIS, 2014
  • 7. 7/2/19 Heiko Paulheim 7 Example: Knowledge Graph External • Example machine learning task: predicting book sales ISBN City Sold 3-2347-3427-1 Darmstadt 124 3-43784-324-2 Mannheim 493 3-145-34587-0 Roßdorf 14 ... ISBN City Population ... Genre Publisher ... Sold 3-2347-3427-1 Darm- stadt 144402 ... Crime Bloody Books ... 124 3-43784-324-2 Mann- heim 291458 … Crime Guns Ltd. … 493 3-145-34587-0 Roß- dorf 12019 ... Travel Up&Away ... 14 ... → Crime novels sell better in larger cities Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  • 8. 7/2/19 Heiko Paulheim 8 Example: The FeGeLOD Framework IS B N 3 -2 3 4 7 -3 4 2 7 -1 C ity D a r m s ta d t # s o ld 1 2 4 N a m e d E n t it y R e c o g n it io n IS B N 3 -2 3 4 7 -3 4 2 7 - 1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t F e a t u r e G e n e r a t io n IS B N 3 - 2 3 4 7 -3 4 2 7 -1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e / D a r m s ta d t C ity _ U R I_ d b p e d ia -o w l: p o p u la tio n T o ta l 1 4 1 4 7 1 C ity _ U R I_ ... ... F e a t u r e S e le c t io n IS B N 3 -2 3 4 7 -3 4 2 7 - 1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t C ity _ U R I_ d b p e d ia -o w l:p o p u la tio n T o ta l 1 4 1 4 7 1 Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  • 9. 7/2/19 Heiko Paulheim 9 The FeGeLOD Framework • Entity Recognition – Simple approach: guess DBpedia URIs – Hit rate >95% for cities and countries (by English name) • Feature Generation – augmenting the dataset with additional attributes from KG • Feature Selection – Filter noise: >95% unknown, identical, or different nominals Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  • 10. 7/2/19 Heiko Paulheim 10 Propositionalization • Bridge Problem: Knowledge Graphs vs. ML algorithms expecting Feature Vectors → wanted: a transformation from nodes to sets of features ? Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  • 11. 7/2/19 Heiko Paulheim 11 Propositionalization • Bridge Problem: Knowledge Graphs vs. ML algorithms expecting Feature Vectors → wanted: a transformation from nodes to sets of features • Basic strategies: – literal values (e.g., population) are used directly – instance types become binary features – relations are counted (absolute, relative, TF-IDF) – combinations of relations and object types are counted (absolute, relative, TF-IDF) – ... Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  • 12. 7/2/19 Heiko Paulheim 12 Propositionalization ctd. • Observations – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements ● Combinations of relations and individuals – e.g., movies directed by Steven Spielberg ● Combinations of relations and types – e.g., movies directed by Oscar-winning directors ● … – But ● The search space is enormous! ● Generate first, filter later does not scale well Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  • 13. 7/2/19 Heiko Paulheim 13 From Naive Propositionalization to Knowledge Graph Embeddings • Reconsidering the previous examples: – We want to predict some attribute of a KG entity ● e.g., types ● e.g., sales figures of books – ...given the entity’s vector representation • How do we get a “good” vector representation for an entity? – ...and: what is “good” in the first place?
  • 14. 7/2/19 Heiko Paulheim 14 From Naive Propositionalization to Knowledge Graph Embeddings • How do we get a “good” vector representation for an entity? – ...and: what is “good” in the first place? • “good” for machine learning means separable – similar entities are close together – different entities are further away https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/
  • 15. 7/2/19 Heiko Paulheim 15 A Brief Excursion to word2vec • A vector space model for words • Introduced in 2013 • Each word becomes a vector – similar words are close – relations are preserved – vector arithmetics are possible https://www.adityathakker.com/introduction-to-word2vec-how-it-works/
  • 16. 7/2/19 Heiko Paulheim 16 A Brief Excursion to word2vec • Assumption: – Similar words appear in similar contexts {Bush,Obama,Trump} was elected president of the United States United States president {Bush,Obama,Trump} announced… … • Idea – Train a network that can predict a word from its context (CBOW) or the context from a word (Skip Gram) Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
  • 17. 7/2/19 Heiko Paulheim 17 A Brief Excursion to word2vec • Skip Gram: train a neural network with one hidden layer • Use output values at hidden layer as vector representation • Observation: – Bush, Obama, Trump will activate similar context words – i.e., their output weights at the projection layer have to be similar Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
  • 18. 7/2/19 Heiko Paulheim 18 From word2vec to RDF2vec • Word2vec operates on sentences, i.e., sequences of words • Idea of RDF2vec – First extract “sentences” from a graph – Then train embedding using RDF2vec • “Sentences” are extracted by performing random graph walks: Year Zero Nine Inch Nails Trent Reznor • Experiments – RDF2vec can be trained on large KGs (DBpedia, Wikidata) – 300-500 dimensional vectors outperform other propositionalization strategies artist member Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 19. 7/2/19 Heiko Paulheim 19 From word2vec to RDF2vec • RDF2vec example – similar instances form clusters – direction of relations is stable Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 20. 7/2/19 Heiko Paulheim 20 From word2vec to RDF2vec • RecSys example: using proximity in latent RDF2vec feature space Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
  • 21. 7/2/19 Heiko Paulheim 21 Extensions of RDF2vec • Maybe random walks are not such a good idea – They may give too much weight on less-known entities and facts ● Strategies: – Prefer edges with more frequent predicates – Prefer nodes with higher indegree – Prefer nodes with higher PageRank – … – They may cover less-known entities and facts too little ● Strategies: – The opposite of all of the above strategies • Bottom line of experimental evaluation: – Not one strategy fits all Cochez et al.: Biased Graph Walks for RDF Graph Embeddings. WIMS, 2017
  • 22. 7/2/19 Heiko Paulheim 22 Other Word Embedding Methods • GloVe (Global Word Embedding Vectors) • Computes embeddings out of co-occurence statistics – Using matrix factorization • Has been applied to random RDF walks as well • Experimental evaluation: – In some cases, RDFGloVe outperforms RDF2vec https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data- glove.html Cochez et al.: Global RDF Vector Space Embeddings, ISWC, 2017
  • 23. 7/2/19 Heiko Paulheim 23 Other Word Embedding Methods • There is a lot of promising stuff not yet tried – e.g., biasing walks based on human factors – e.g., more recent word embedding methods such as ELMo and BERT https://www.nbcnews.com/feature/nbc-out/bert-ernie-are-gay-couple-sesame-street-writer-claims-n910701
  • 24. 7/2/19 Heiko Paulheim 24 TransE and its Descendants • In RDF2vec, relation preservation is a by-product • TransE: direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that ● across all triples <s,p,o> Σ ||s+p-o|| is minimized ● try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  • 25. 7/2/19 Heiko Paulheim 25 Limitations of TransE • Symmetric properties – we have to minimize ||Barack + spouse – Michelle|| and ||Michelle + spouse – Barack|| simultaneously – ideally, Barack + spouse = Michelle and Michelle + spouse = Barack ● Michelle and Barack become infinitely close ● spouse becomes 0 vector Michelle Barack
  • 26. 7/2/19 Heiko Paulheim 26 Limitations of TransE • Transitive Properties – we have to minimize ||Miami + partOf – Florida|| and ||Florida + partOf – USA||, but also ||Miami + partOf – USA|| – ideally, Miami + partOf = Florida, Florida + partOf = USA, Miami + partOf = USA ● Again: all three become infinitely close ● partOf becomes 0 vector Florida Miami USA
  • 27. 7/2/19 Heiko Paulheim 27 Limitations of TransE • One to many properties – we have to minimize ||New York + partOf – USA||, ||Florida + partOf – USA||, ||Ohio + partOf – USA||, … – ideally, NewYork + partOf = USA, Florida + partOf = USA, Ohio + partOf = USA ● all the subjects become infinitely close Florida USA New York Ohio
  • 28. 7/2/19 Heiko Paulheim 28 Limitations of TransE • Reflexive properties – we have to minimize ||Tom + knows - Tom|| – ideally, Tom + knows = Tom ● Knows becomes 0 vector Tom
  • 29. 7/2/19 Heiko Paulheim 29 TransE RDF2Vec HolE DistMult RESCAL NTN TransR TransH TransD KG2E ComplEx Limitations of TransE • Numerous variants of TransE have been proposed to overcome limitations (e.g., TransH, TransR, TransD, …) • Plus: embedding approaches based on tensor factorization etc.
  • 30. 7/2/19 Heiko Paulheim 30 Are we Driving on the Wrong Side of the Road?
  • 31. 7/2/19 Heiko Paulheim 31 Are we Driving on the Wrong Side of the Road? • Original ideas: – Assign meaning to data – Allow for machine inference – Explain inference results to the user Berners-Lee et al: The Semantic Web. Scientific American, May 2001
  • 32. 7/2/19 Heiko Paulheim 32 Running Example: Recommender Systems • Content based recommender systems backed by Semantic Web data – (today: knowledge graphs) • Advantages – use rich background information about recommended items (for free) – justifications can be generated (e.g., you like movies by that director) https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/
  • 33. 7/2/19 Heiko Paulheim 33 The 2009 Semantic Web Layer Cake
  • 34. 7/2/19 Heiko Paulheim 34 The 2019 Semantic Web Layer Cake Embeddings
  • 35. 7/2/19 Heiko Paulheim 35 Towards Semantic Vector Space Embeddings cartoon superhero Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
  • 36. 7/2/19 Heiko Paulheim 36 The Holy Grail • Combine semantics and embeddings – e.g., directly create meaningful dimensions – e.g., learn interpretation of dimensions a posteriori – ...
  • 37. 7/2/19 Heiko Paulheim 37 A New Design Space quantitative performance semantic interpretability
  • 38. 7/2/19 Heiko Paulheim 38 Software to Check Out • http://openke.thunlp.org/ – Implements many embedding approaches – Pre-trained vectors available, e.g., for Wikidata
  • 39. 7/2/19 Heiko Paulheim 39 Software to Check Out • Loading RDF in Python: https://github.com/RDFLib/rdflib
  • 40. 7/2/19 Heiko Paulheim 40 RapidMiner Linked Open Data Extension caution: works only until RM6! :-(
  • 41. 7/2/19 Heiko Paulheim 41 References (1) • Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5), 28-37. • Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In NIPS (pp. 2787-2795). • Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Biased graph walks for RDF graph embeddings. In WIMS (p. 21). ACM. • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. • Melo, A., Völker, J., & Paulheim, H. (2017). Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. IJAIT, 26(02). • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • Paulheim, H., & Fümkranz, J. (2012). Unsupervised generation of data mining features from linked open data. In WIMS (p. 31). ACM. • Paulheim, H., & Bizer, C. (2013). Type inference on noisy RDF data. In International semantic web conference (pp. 510-525). Springer, Berlin, Heidelberg.
  • 42. 7/2/19 Heiko Paulheim 42 References (2) • Paulheim, H., & Bizer, C. (2014). Improving the quality of linked data using statistical distributions. IJSWIS, 10(2), 63-86. • Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508. • Paulheim, H. (2018). Make Embeddings Semantic Again! ISWC (Blue Sky Track) • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. • Ristoski, P., & Paulheim, H. (2014). A comparison of propositionalization strategies for creating features from linked open data. Linked Data for Knowledge Discovery, 6. • Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer. Web Semantics: Science, Services and Agents on the World Wide Web, 35, 142-151. • Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A comprehensive survey. Web semantics, 36, 1-22. • Ristoski, P., & Paulheim, H. (2016). RDF2vec: RDF graph embeddings for data mining. In International Semantic Web Conference (pp. 498-514). Springer, Cham. • Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., & Paulheim, H. (2019). RDF2Vec: RDF graph embeddings and their applications. Semantic Web, 10(4), 1-32.
  • 43. 7/2/19 Heiko Paulheim 43 Machine Learning & Embeddings for Large Knowledge Graphs Heiko Paulheim