SlideShare a Scribd company logo
1 of 36
Download to read offline
Towards a unied framework for distributed data management
across the Semantic Web
Silvia Giannini
(Supervisor: Prof. Eugenio Di Sciascio)
Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI),
Politecnico di Bari, Bari, Italy
s.giannini@deemail.poliba.it
8th
ICCL Summer School Workshop (ICCL 2013)
Semantic Web - Ontology Languages and Their Use
Dresden, Germany | 26 August, 2013
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
Motivations
State of Art
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
The Linking Open Data (LOD) project
A global Uniform Resource Identier for each entity on the web (URIs)
A standardized access mechanism (HTTP URIs)
A machine-readable, open and standardized data format (RDF)
A mechanism for linking dierent data sources (RDF-links)
Relationship Links
Identity Links
Vocabulary Links
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
The Linking Open Data (LOD) project
As of September 2011
Music
Brainz
(zitgist)
P20
Turismo
de
Zaragoza
yovisto
Yahoo!
Geo
Planet
YAGO
World
Fact-
book
El
Viajero
Tourism
WordNet
(W3C)
WordNet
(VUA)
VIVO UF
VIVO
Indiana
VIVO
Cornell
VIAF
URI
Burner
Sussex
Reading
Lists
Plymouth
Reading
Lists
UniRef
UniProt
UMBEL
UK Post-
codes
legislation
data.gov.uk
Uberblic
UB
Mann-
heim
TWC LOGD
Twarql
transport
data.gov.
uk
Traffic
Scotland
theses.
fr
Thesau-
rus W
totl.net
Tele-
graphis
TCM
Gene
DIT
Taxon
Concept
Open
Library
(Talis)
tags2con
delicious
t4gm
info
Swedish
Open
Cultural
Heritage
Surge
Radio
Sudoc
STW
RAMEAU
SH
statistics
data.gov.
uk
St.
Andrews
Resource
Lists
ECS
South-
ampton
EPrints
SSW
Thesaur
us
Smart
Link
Slideshare
2RDF
semantic
web.org
Semantic
Tweet
Semantic
XBRL
SW
Dog
Food
Source Code
Ecosystem
Linked Data
US SEC
(rdfabout)
Sears
Scotland
Geo-
graphy
Scotland
Pupils 
Exams
Scholaro-
meter
WordNet
(RKB
Explorer)
Wiki
UN/
LOCODE
Ulm
ECS
(RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-
castle
LAAS
KISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP
(RKB
Explorer)
Crime
Reports
UK
Course-
ware
CORDIS
(RKB
Explorer)
CiteSeer
Budapest
ACM
riese
Revyu
research
data.gov.
ukRen.
Energy
Genera-
tors
reference
data.gov.
uk
Recht-
spraak.
nl
RDF
ohloh
Last.FM
(rdfize)
RDF
Book
Mashup
Rådata
nå!
PSH
Product
Types
Ontology
Product
DB
PBAC
Poké-
pédia
patents
data.go
v.uk
Ox
Points
Ord-
nance
Survey
Openly
Local
Open
Library
Open
Cyc
Open
Corpo-
rates
Open
Calais
OpenEI
Open
Election
Data
Project
Open
Data
Thesau-
rus
Ontos
News
Portal
OGOLOD
Janus
AMP
Ocean
Drilling
Codices
New
York
Times
NVD
ntnusc
NTU
Resource
Lists
Norwe-
gian
MeSH
NDL
subjects
ndlna
my
Experi-
ment
Italian
Museums
medu-
cator
MARC
Codes
List
Man-
chester
Reading
Lists
Lotico
Weather
Stations
London
Gazette
LOIUS
Linked
Open
Colors
lobid
Resources
lobid
Organi-
sations
LEM
Linked
MDB
LinkedL
CCN
Linked
GeoData
LinkedCT
Linked
User
Feedback
LOV
Linked
Open
Numbers
LODE
Eurostat
(Ontology
Central)
Linked
EDGAR
(Ontology
Central)
Linked
Crunch-
base
lingvoj
Lichfield
Spen-
ding
LIBRIS
Lexvo
LCSH
DBLP
(L3S)
Linked
Sensor Data
(Kno.e.sis)
Klapp-
stuhl-
club
Good-
win
Family
National
Radio-
activity
JP
Jamendo
(DBtune)
Italian
public
schools
ISTAT
Immi-
gration
iServe
IdRef
Sudoc
NSZL
Catalog
Hellenic
PD
Hellenic
FBD
Piedmont
Accomo-
dations
GovTrack
GovWILD
Google
Art
wrapper
gnoss
GESIS
GeoWord
Net
Geo
Species
Geo
Names
Geo
Linked
Data
GEMET
GTAA
STITCH
SIDER
Project
Guten-
berg
Medi
Care
Euro-
stat
(FUB)
EURES
Drug
Bank
Disea-
some
DBLP
(FU
Berlin)
Daily
Med
CORDIS
(FUB)
Freebase
flickr
wrappr
Fishes
of Texas
Finnish
Munici-
palities
ChEMBL
FanHubz
Event
Media
EUTC
Produc-
tions
Eurostat
Europeana
EUNIS
EU
Insti-
tutions
ESD
stan-
dards
EARTh
Enipedia
Popula-
tion (En-
AKTing)
NHS
(En-
AKTing) Mortality
(En-
AKTing)
Energy
(En-
AKTing)
Crime
(En-
AKTing)
CO2
Emission
(En-
AKTing)
EEA
SISVU
educatio
n.data.g
ov.uk
ECS
South-
ampton
ECCO-
TCP
GND
Didactal
ia
DDC Deutsche
Bio-
graphie
data
dcs
Music
Brainz
(DBTune)
Magna-
tune
John
Peel
(DBTune)
Classical
(DB
Tune)
Audio
Scrobbler
(DBTune)
Last.FM
artists
(DBTune)
DB
Tropes
Portu-
guese
DBpedia
dbpedia
lite
Greek
DBpedia
DBpedia
data-
open-
ac-uk
SMC
Journals
Pokedex
Airports
NASA
(Data
Incu-
bator)
Music
Brainz
(Data
Incubator)
Moseley
Folk
Metoffice
Weather
Forecasts
Discogs
(Data
Incubator)
Climbing
data.gov.uk
intervals
Data
Gov.ie
data
bnf.fr
Cornetto
reegle
Chronic-
ling
America
Chem2
Bio2RDF
Calames
business
data.gov.
uk
Bricklink
Brazilian
Poli-
ticians
BNB
UniSTS
UniPath
way
UniParc
Taxono
my
UniProt
(Bio2RDF)
SGD
Reactome
PubMed
Pub
Chem
PRO-
SITE
ProDom
Pfam
PDB
OMIM
MGI
KEGG
Reaction
KEGG
Pathway
KEGG
Glycan
KEGG
Enzyme
KEGG
Drug
KEGG
Com-
pound
InterPro
Homolo
Gene
HGNC
Gene
Ontology
GeneID
Affy-
metrix
bible
ontology
BibBase
FTS
BBC
Wildlife
Finder
BBC
Program
mes BBC
Music
Alpine
Ski
Austria
LOCAH
Amster-
dam
Museum
AGROV
OC
AEMET
US Census
(rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF: the big picture
DBpedia1
extract
dbpedia:Dresden
dbpedia-owl:country
328.8
dbpedia-owl:areaTotal
dbpedia:Germany
Graph-structured knowledge representation (data-model)
Resource: concrete or abstract entity of the real world, identied by
dereferenceable URI
Description: representation of properties or relationships among resources
Framework: combination of web based protocols and formal semantics
Facts in Triple-form: subject - predicate - object
http://dbpedia.org/resource/Dresden http://dbpedia.org/property/country
http://dbpedia.org/resource/Germany.
1http://dbpedia.org
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF: the big picture
DBpedia extract
dbpedia:Dresden
dbpedia-owl:country
328.8
dbpedia-owl:areaTotal
rdf:type rdf:type
rdf:type
rdfs:rangerdfs:domain
dbpedia-owl:country
RDF data model
RDF Schema
dbpedia:Germany
dbpedia-owl:PopulatedPlace dbpedia-owl:Country
owl:ObjectProperty
RDF Schema: Explicit semantics of content and links
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
Motivations
State of Art
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Motivations
RDF Data Management Challenges
LOD cloud statistic: 31 billions facts, 500 million links, at October 2011
How to eciently:
Develop services on the top of the RDF data-model for
browsing data;
query answering;
supporting expressive search (approximate matching);
Speed up data access and query response times over distributed machines
CLUSTERING
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Motivations
Contributions
Clustering semantic web resources (RDF graphs)
Discovering homogeneous groups of resources
Summarizing the original graph content in a meaningful way
Revealing possible hierachies of clusters
Identing a concept description or discriminating features for each cluster
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Data clustering methods
pairwise distance metric
agglomerative
partitional (K-Means)
- Number or size of clusters to be set
RDF data-model not suited for traditional data-clustering techniques
application over real-life RDF datasets!
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: graph-based approach
A set of resources with large intra-cluster similarity
and large inter-cluster dissimilarity
Graph clustering methods
vertex connectivity
neighborhood similarity
spectral analysis of the adjacency matrix
- Number or size of clusters to be set
http://sydney.edu.au/engineering/it/~shhong/img/cluster1.png
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extraction
Subgraph relevant for a resource representation (DESCRIBE SPARQL2
-query)
1 Immediate Properties
+ simple, quick
- loss of information
2 Concise Bounded Description (CBD)
+ better body of knowledge
- domain dependent (use of blank
nodes)
3 Depth Limited Crawling
+ stable over input data with well
limiting subgraph
- nd a tradeo between size and
information content (data
dependent)
G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extraction
Subgraph relevant for a resource representation (DESCRIBE SPARQL2
-query)
1 Immediate Properties
+ simple, quick
- loss of information
2 Concise Bounded Description (CBD)
+ better body of knowledge
- domain dependent (use of blank
nodes)
3 Depth Limited Crawling
+ stable over input data with well
limiting subgraph
- nd a tradeo between size and
information content (data
dependent)
G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extraction
Subgraph relevant for a resource representation (DESCRIBE SPARQL2
-query)
1 Immediate Properties
+ simple, quick
- loss of information
2 Concise Bounded Description (CBD)
+ better body of knowledge
- domain dependent (use of blank
nodes)
3 Depth Limited Crawling
+ stable over input data with well
limiting subgraph
- nd a tradeo between size and
information content (data
dependent)
G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instances distance computation
Comparing two RDF graphs with the resources as root nodes
1 feature-vector based
mappings: (feature → shortest path; value → set of reachable nodes)
similarity measure: e.g., Dice coecient
2 graph based
conceptual similarity: overlapping of nodes
relational similarity: overlapping of edges
3 ontology based3 (well dened ontology and conforming instance data)
taxonomy similarity: semantic distance between metadata in a concept
hierarchy
relation similarity: similarity of the instances related to the two considered
resources
attribute similarity: similarity of attribute values (numeric, literal, etc.)
Determine the appropriate number of clusters
3A. Maedche, and V. Zacharias. Clustering ontology-based metadata in the semantic
web. Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2002.
348-360.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Requirements
Ideal clustering of graph-structured data:
cohesive intra-cluster structure
homogeneous intra-cluster properties
Parameter free algorithm:
number and size of partitions extracted from data
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
How does community detection algorithms behave over RDF(S) graphs?
Community Discovery Algorithms
Graph mining techniques for extracting knowledge from large graphs
Exploit native graph features (topology ) of the RDF model
Why:
If two sets of entities are strongly related, they exhibit more connections
than other sets of entities
Benets:
+ Automatically discover the number and size of modules
+ Can handle uncertainty in clustering (overlapping communities)
+ Faster than data-clustering inspired techniques (no instances extraction)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
What is a community
A subgraph of a network whose nodes are more tightly connected with each
other than with nodes outside the subgraph.
Similarity : cohesion degree of subsets of vertices
- No overlapping capabilities
C = {C1, . . . , Cn}, Ci ∩ Cj = ∅ ∀i, j ∈ {1, . . . , n}, i = j
In labeled graphs (like RDF graphs), each link models only one specic relation
Overlapping Communities Analysis
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
From Node to Link Perspective
Community : A set of nodes with more external than internal connections, i.e.,
a set of closely interrelated links.
Benets:
+ Captures multiple memberships between nodes
+ Unies hierarchical and overlapping clustering
It is always possible to move from a link partition P = {P1, . . . , Pm},
Pi ∩ Pj = ∅ ∀i, j ∈ {1, . . . , m}, i = j to m nodes clusters, with possible
overlapping.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Datasets
SP2
Bench4
: A SPARQL Performance Benchmark
data generator for arbitrarily large DBLP-like RDF documents creation
mirrors key characteristics and social-world distributions of original DBLP
dataset
publicy available
4M. Schmidt, et al. SP2Bench: SPARQL performance benchmark. Semantic Web
Information Management. Springer Berlin Heidelberg, 2010. 371-393.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Node communities
SP2
Bench: 720 triples
Paul_ErdoesPaul_Erdoes
ArticleArticle
PersonPerson
ArticleArticle
Paul_ErdoesPaul_Erdoes
PersonPerson
V.D. Blondel, et al. Fast unfolding of communities
in large networks. Journal of Statistical Mechanics:
Theory and Experiment 2008.10 (2008): P10008.
Tool: Gephi (https://gephi.org)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Link Communities
Given an undirected graph G = (V, E), the set of neighbors of node i is
Ni = {j ∈ V|eij ∈ E}.
Similarity
5
: S(eik, ejk) =
|Ni∩Nj |
|Ni∪Nj |
Link Dendrogram: hierarchical agglomerative algorithm
Optimization of Partition density : cut level optimizes link density inside
communities
DP = 2
M c mc
mc−(nc−1)
(nc−2)(nc−1)
,
5Y.Y. Ahn, J.P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity
in networks. Nature 466.7307 (2010): 761-764.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering6
Article1
_:x1
dc:creator
Adamanta Schlitt
foaf:name
dc:title
richer dwelling
scrapped
swrc:pages
140
_:x1
_:x2
_:x3
foaf:Person
rdf:type
rdf:type
rdf:type
rdf:type
rdf:type
swrc:journal
swrc:journal
rdf:type
rdf:type
swrc:journal
dc:creator
dc:creator
dc:creator
SIGNATURE: subject SIGNATURE: (predicate, object) SIGNATURE: {(predicate_1, object_1), ... (predicate_n, object_n)}
Different background colours reveal the hierarchy of clusters
REPLICATED NODES REVEALING OVERLAPPING CLUSTERS
LINKS BELONGING TO OTHER CLUSTERS
rdf:type
Article20
Article13
Paul_Erdoes
swrc:journal
swrc:journal
Article3
Article2
Article1
Journal1
bench:Article
TYPE 1. CLUSTER (a) TYPE 2. CLUSTER (b) TYPE 3. CLUSTER (c)
6S. Giannini, RDF Data Clustering. Springer Berlin Heidelberg, 2013. BIS 2013
Workshop, LNBIP 160: 220231.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
Cluster of type 2.
Aggregation of resources (xed predicate - xed object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
ex:Article15 swrc:pages 139
ex:Article15 dc:title equalled bewitchment cheaters
ex:Article15 dc:creator ex:node17r3ptqpmx16
ex:Article15 rdfs:seeAlso http://www.skeins.tld/sandwiching/bewitchment.html
ex:Article15 foaf:homepage http://www.sandwiching.tld/cheaters/ried.html
Cluster of type 2.
Aggregation of resources (predicate - object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
Cluster of type 2.
Aggregation of resources (xed predicate - xed object)
ex:Article9 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article7 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1945
ex:Article10 swrc:journal http://localhost/publications/journals/Journal1/1945
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (xed subject)
Cluster of type 2.
Aggregation of resources (xed predicate - xed object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
ex:Article8 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article8 rdf:type http://localhost/vocabulary/bench/Article
ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article5 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article5 rdf:type http://localhost/vocabulary/bench/Article
ex:Article5 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article4 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article4 rdf:type http://localhost/vocabulary/bench/Article
ex:Article4 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article3 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article3 rdf:type http://localhost/vocabulary/bench/Article
ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article2 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article2 rdf:type http://localhost/vocabulary/bench/Article
ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1942
ex:Article1 dc:creator http://localhost/persons/Paul_Erdoes
ex:Article1 rdf:type http://localhost/vocabulary/bench/Article
ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1942
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Advantages and Emerging issues
Tests over 266, 720, and 5362 triples datasets
Number of obtained clusters: 53, 277, 3437
+ Good behaviour in presence of blank nodes
http://localhost/vocabulary/bench/PhDThesis rdfs:subClassOf foaf:Document
http://localhost/vocabulary/bench/Www rdfs:subClassOf foaf:Document
http://localhost/vocabulary/bench/Book rdfs:subClassOf foaf:Document
_:node17rocfnblx296 rdf:_3 misc:UnknownDocument_c
_:node17rocfnblx296 rdf:_2 misc:UnknownDocument_b
_:node17rocfnblx296 rdf:_1 misc:UnknownDocument_a
misc:UnknownDocument_c rdf:type foaf:Document
misc:UnknownDocument_b rdf:type foaf:Document
misc:UnknownDocument_a rdf:type foaf:Document
http://localhost/vocabulary/bench/MastersThesis rdfs:subClassOf foaf:Document
- A post-processing phase is needed (links replication)
If Paul Erdoes is a Person included in a type 2. cluster with signature (rdf:type -
prex:Person), this property will not appear in the cluster of type 1. describing the
resource Paul_Erdoes
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Conclusions and Future Works
Community detection algorithms are a promising candidate for:
semantic web resources clustering
instances extraction from RDF graphs
Ongoing and future works:
A more comprehensive experimental evaluation on dierent datasets
Analysis of cut threshold
Better denition of post-processing phase
Comparison with existing approaches
Combination of (1) graph clustering techniques, and (2) reasoning services
1 Identify communities of closely related resources
2 Extract a semantic description of them
Experimentation of property-driven clustering
Dynamics and evolution of clusters
Silvia Giannini RDF data clustering

More Related Content

What's hot

Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Ana Roxin
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Mappings Validation
Mappings ValidationMappings Validation
Mappings Validationandimou
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Towards a comprehensive call ontology for research 2.0
Towards a comprehensive call ontology for research 2.0Towards a comprehensive call ontology for research 2.0
Towards a comprehensive call ontology for research 2.0Vladimir Tomberg
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionSören Auer
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 

What's hot (17)

Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Mappings Validation
Mappings ValidationMappings Validation
Mappings Validation
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Towards a comprehensive call ontology for research 2.0
Towards a comprehensive call ontology for research 2.0Towards a comprehensive call ontology for research 2.0
Towards a comprehensive call ontology for research 2.0
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Semantic web an overview and projects
Semantic web   an  overview and projectsSemantic web   an  overview and projects
Semantic web an overview and projects
 

Viewers also liked

Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic WebMachine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Webpauldix
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceBarry Norton
 
Pair of linear equation in two variables
Pair of linear equation in two variablesPair of linear equation in two variables
Pair of linear equation in two variablesdayalan chotto
 
Herramientas para el mantenimiento del pc
Herramientas para el mantenimiento del pcHerramientas para el mantenimiento del pc
Herramientas para el mantenimiento del pcangyjohannagt
 
2015 Cadillac ATS e-brochure
2015 Cadillac ATS e-brochure2015 Cadillac ATS e-brochure
2015 Cadillac ATS e-brochureDoug Caywood
 
Scripted Interface Redesign Preview
Scripted Interface Redesign PreviewScripted Interface Redesign Preview
Scripted Interface Redesign PreviewScripted.com
 
Andrew MCSA40BH
Andrew MCSA40BHAndrew MCSA40BH
Andrew MCSA40BHsavomir
 
Visiting Lecturer Reports
Visiting Lecturer ReportsVisiting Lecturer Reports
Visiting Lecturer ReportsMarie Cutler
 
3430_3440Brochure_Final_PRINT
3430_3440Brochure_Final_PRINT3430_3440Brochure_Final_PRINT
3430_3440Brochure_Final_PRINTNeil Hunt
 
системы создания и публикации презентаций
системы создания и публикации презентацийсистемы создания и публикации презентаций
системы создания и публикации презентацийkatrindakatrin
 
Corporate Criminal Compliance in Spain
Corporate Criminal Compliance in SpainCorporate Criminal Compliance in Spain
Corporate Criminal Compliance in SpainTAG Alliances
 

Viewers also liked (16)

Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic WebMachine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
Pair of linear equation in two variables
Pair of linear equation in two variablesPair of linear equation in two variables
Pair of linear equation in two variables
 
PWIMS_Brochure
PWIMS_BrochurePWIMS_Brochure
PWIMS_Brochure
 
Herramientas para el mantenimiento del pc
Herramientas para el mantenimiento del pcHerramientas para el mantenimiento del pc
Herramientas para el mantenimiento del pc
 
2015 Cadillac ATS e-brochure
2015 Cadillac ATS e-brochure2015 Cadillac ATS e-brochure
2015 Cadillac ATS e-brochure
 
ww2
ww2ww2
ww2
 
Крутакова Мария
Крутакова МарияКрутакова Мария
Крутакова Мария
 
Scripted Interface Redesign Preview
Scripted Interface Redesign PreviewScripted Interface Redesign Preview
Scripted Interface Redesign Preview
 
Andrew MCSA40BH
Andrew MCSA40BHAndrew MCSA40BH
Andrew MCSA40BH
 
Question 3
Question 3Question 3
Question 3
 
Visiting Lecturer Reports
Visiting Lecturer ReportsVisiting Lecturer Reports
Visiting Lecturer Reports
 
3430_3440Brochure_Final_PRINT
3430_3440Brochure_Final_PRINT3430_3440Brochure_Final_PRINT
3430_3440Brochure_Final_PRINT
 
системы создания и публикации презентаций
системы создания и публикации презентацийсистемы создания и публикации презентаций
системы создания и публикации презентаций
 
δαλάι λάμα
δαλάι λάμαδαλάι λάμα
δαλάι λάμα
 
Corporate Criminal Compliance in Spain
Corporate Criminal Compliance in SpainCorporate Criminal Compliance in Spain
Corporate Criminal Compliance in Spain
 

Similar to RDF data clustering

Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1ErhardRahm
 
RDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachRDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachhorvadam
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017Edward Blurock
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkGezim Sejdiu
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
 
State and future of linked data in learning analytics
State and future of linked data in learning analyticsState and future of linked data in learning analytics
State and future of linked data in learning analyticsMathieu d'Aquin
 
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...andimou
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark GreavesMediabistro
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
 

Similar to RDF data clustering (20)

Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
RDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachRDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approach
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
STI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDFSTI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDF
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
State and future of linked data in learning analytics
State and future of linked data in learning analyticsState and future of linked data in learning analytics
State and future of linked data in learning analytics
 
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

RDF data clustering

  • 1. Towards a unied framework for distributed data management across the Semantic Web Silvia Giannini (Supervisor: Prof. Eugenio Di Sciascio) Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI), Politecnico di Bari, Bari, Italy s.giannini@deemail.poliba.it 8th ICCL Summer School Workshop (ICCL 2013) Semantic Web - Ontology Languages and Their Use Dresden, Germany | 26 August, 2013
  • 2. The scenario RDF clustering Proposal Preliminary Results Conclusions Outline 1 The scenario 2 RDF clustering Motivations State of Art 3 Proposal 4 Preliminary Results 5 Conclusions Silvia Giannini RDF data clustering
  • 3. The scenario RDF clustering Proposal Preliminary Results Conclusions Outline 1 The scenario 2 RDF clustering 3 Proposal 4 Preliminary Results 5 Conclusions Silvia Giannini RDF data clustering
  • 4. The scenario RDF clustering Proposal Preliminary Results Conclusions The Linking Open Data (LOD) project A global Uniform Resource Identier for each entity on the web (URIs) A standardized access mechanism (HTTP URIs) A machine-readable, open and standardized data format (RDF) A mechanism for linking dierent data sources (RDF-links) Relationship Links Identity Links Vocabulary Links Silvia Giannini RDF data clustering
  • 5. The scenario RDF clustering Proposal Preliminary Results Conclusions The Linking Open Data (LOD) project As of September 2011 Music Brainz (zitgist) P20 Turismo de Zaragoza yovisto Yahoo! Geo Planet YAGO World Fact- book El Viajero Tourism WordNet (W3C) WordNet (VUA) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UniRef UniProt UMBEL UK Post- codes legislation data.gov.uk Uberblic UB Mann- heim TWC LOGD Twarql transport data.gov. uk Traffic Scotland theses. fr Thesau- rus W totl.net Tele- graphis TCM Gene DIT Taxon Concept Open Library (Talis) tags2con delicious t4gm info Swedish Open Cultural Heritage Surge Radio Sudoc STW RAMEAU SH statistics data.gov. uk St. Andrews Resource Lists ECS South- ampton EPrints SSW Thesaur us Smart Link Slideshare 2RDF semantic web.org Semantic Tweet Semantic XBRL SW Dog Food Source Code Ecosystem Linked Data US SEC (rdfabout) Sears Scotland Geo- graphy Scotland Pupils Exams Scholaro- meter WordNet (RKB Explorer) Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF New- castle LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Crime Reports UK Course- ware CORDIS (RKB Explorer) CiteSeer Budapest ACM riese Revyu research data.gov. ukRen. Energy Genera- tors reference data.gov. uk Recht- spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup Rådata nå! PSH Product Types Ontology Product DB PBAC Poké- pédia patents data.go v.uk Ox Points Ord- nance Survey Openly Local Open Library Open Cyc Open Corpo- rates Open Calais OpenEI Open Election Data Project Open Data Thesau- rus Ontos News Portal OGOLOD Janus AMP Ocean Drilling Codices New York Times NVD ntnusc NTU Resource Lists Norwe- gian MeSH NDL subjects ndlna my Experi- ment Italian Museums medu- cator MARC Codes List Man- chester Reading Lists Lotico Weather Stations London Gazette LOIUS Linked Open Colors lobid Resources lobid Organi- sations LEM Linked MDB LinkedL CCN Linked GeoData LinkedCT Linked User Feedback LOV Linked Open Numbers LODE Eurostat (Ontology Central) Linked EDGAR (Ontology Central) Linked Crunch- base lingvoj Lichfield Spen- ding LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Klapp- stuhl- club Good- win Family National Radio- activity JP Jamendo (DBtune) Italian public schools ISTAT Immi- gration iServe IdRef Sudoc NSZL Catalog Hellenic PD Hellenic FBD Piedmont Accomo- dations GovTrack GovWILD Google Art wrapper gnoss GESIS GeoWord Net Geo Species Geo Names Geo Linked Data GEMET GTAA STITCH SIDER Project Guten- berg Medi Care Euro- stat (FUB) EURES Drug Bank Disea- some DBLP (FU Berlin) Daily Med CORDIS (FUB) Freebase flickr wrappr Fishes of Texas Finnish Munici- palities ChEMBL FanHubz Event Media EUTC Produc- tions Eurostat Europeana EUNIS EU Insti- tutions ESD stan- dards EARTh Enipedia Popula- tion (En- AKTing) NHS (En- AKTing) Mortality (En- AKTing) Energy (En- AKTing) Crime (En- AKTing) CO2 Emission (En- AKTing) EEA SISVU educatio n.data.g ov.uk ECS South- ampton ECCO- TCP GND Didactal ia DDC Deutsche Bio- graphie data dcs Music Brainz (DBTune) Magna- tune John Peel (DBTune) Classical (DB Tune) Audio Scrobbler (DBTune) Last.FM artists (DBTune) DB Tropes Portu- guese DBpedia dbpedia lite Greek DBpedia DBpedia data- open- ac-uk SMC Journals Pokedex Airports NASA (Data Incu- bator) Music Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs (Data Incubator) Climbing data.gov.uk intervals Data Gov.ie data bnf.fr Cornetto reegle Chronic- ling America Chem2 Bio2RDF Calames business data.gov. uk Bricklink Brazilian Poli- ticians BNB UniSTS UniPath way UniParc Taxono my UniProt (Bio2RDF) SGD Reactome PubMed Pub Chem PRO- SITE ProDom Pfam PDB OMIM MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Com- pound InterPro Homolo Gene HGNC Gene Ontology GeneID Affy- metrix bible ontology BibBase FTS BBC Wildlife Finder BBC Program mes BBC Music Alpine Ski Austria LOCAH Amster- dam Museum AGROV OC AEMET US Census (rdfabout) Media Geographic Publications Government Cross-domain Life sciences User-generated content Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Silvia Giannini RDF data clustering
  • 6. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF: the big picture DBpedia1 extract dbpedia:Dresden dbpedia-owl:country 328.8 dbpedia-owl:areaTotal dbpedia:Germany Graph-structured knowledge representation (data-model) Resource: concrete or abstract entity of the real world, identied by dereferenceable URI Description: representation of properties or relationships among resources Framework: combination of web based protocols and formal semantics Facts in Triple-form: subject - predicate - object http://dbpedia.org/resource/Dresden http://dbpedia.org/property/country http://dbpedia.org/resource/Germany. 1http://dbpedia.org Silvia Giannini RDF data clustering
  • 7. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF: the big picture DBpedia extract dbpedia:Dresden dbpedia-owl:country 328.8 dbpedia-owl:areaTotal rdf:type rdf:type rdf:type rdfs:rangerdfs:domain dbpedia-owl:country RDF data model RDF Schema dbpedia:Germany dbpedia-owl:PopulatedPlace dbpedia-owl:Country owl:ObjectProperty RDF Schema: Explicit semantics of content and links Silvia Giannini RDF data clustering
  • 8. The scenario RDF clustering Proposal Preliminary Results Conclusions Outline 1 The scenario 2 RDF clustering Motivations State of Art 3 Proposal 4 Preliminary Results 5 Conclusions Silvia Giannini RDF data clustering
  • 9. The scenario RDF clustering Proposal Preliminary Results Conclusions Motivations RDF Data Management Challenges LOD cloud statistic: 31 billions facts, 500 million links, at October 2011 How to eciently: Develop services on the top of the RDF data-model for browsing data; query answering; supporting expressive search (approximate matching); Speed up data access and query response times over distributed machines CLUSTERING Silvia Giannini RDF data clustering
  • 10. The scenario RDF clustering Proposal Preliminary Results Conclusions Motivations Contributions Clustering semantic web resources (RDF graphs) Discovering homogeneous groups of resources Summarizing the original graph content in a meaningful way Revealing possible hierachies of clusters Identing a concept description or discriminating features for each cluster Silvia Giannini RDF data clustering
  • 11. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art What is a cluster: data-based approach A set of resources with large intra-cluster similarity and large inter-cluster dissimilarity Data clustering methods pairwise distance metric agglomerative partitional (K-Means) - Number or size of clusters to be set Silvia Giannini RDF data clustering
  • 12. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art What is a cluster: data-based approach A set of resources with large intra-cluster similarity and large inter-cluster dissimilarity Data clustering methods pairwise distance metric agglomerative partitional (K-Means) - Number or size of clusters to be set Silvia Giannini RDF data clustering
  • 13. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art What is a cluster: data-based approach A set of resources with large intra-cluster similarity and large inter-cluster dissimilarity Data clustering methods pairwise distance metric agglomerative partitional (K-Means) - Number or size of clusters to be set Silvia Giannini RDF data clustering
  • 14. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art What is a cluster: data-based approach A set of resources with large intra-cluster similarity and large inter-cluster dissimilarity Data clustering methods pairwise distance metric agglomerative partitional (K-Means) - Number or size of clusters to be set RDF data-model not suited for traditional data-clustering techniques application over real-life RDF datasets! Silvia Giannini RDF data clustering
  • 15. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art What is a cluster: graph-based approach A set of resources with large intra-cluster similarity and large inter-cluster dissimilarity Graph clustering methods vertex connectivity neighborhood similarity spectral analysis of the adjacency matrix - Number or size of clusters to be set http://sydney.edu.au/engineering/it/~shhong/img/cluster1.png Silvia Giannini RDF data clustering
  • 16. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art RDF clustering: literature Instance extraction Subgraph relevant for a resource representation (DESCRIBE SPARQL2 -query) 1 Immediate Properties + simple, quick - loss of information 2 Concise Bounded Description (CBD) + better body of knowledge - domain dependent (use of blank nodes) 3 Depth Limited Crawling + stable over input data with well limiting subgraph - nd a tradeo between size and information content (data dependent) G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317. 2http://www.w3.org/TR/rdf-sparql-query/ Silvia Giannini RDF data clustering
  • 17. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art RDF clustering: literature Instance extraction Subgraph relevant for a resource representation (DESCRIBE SPARQL2 -query) 1 Immediate Properties + simple, quick - loss of information 2 Concise Bounded Description (CBD) + better body of knowledge - domain dependent (use of blank nodes) 3 Depth Limited Crawling + stable over input data with well limiting subgraph - nd a tradeo between size and information content (data dependent) G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317. 2http://www.w3.org/TR/rdf-sparql-query/ Silvia Giannini RDF data clustering
  • 18. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art RDF clustering: literature Instance extraction Subgraph relevant for a resource representation (DESCRIBE SPARQL2 -query) 1 Immediate Properties + simple, quick - loss of information 2 Concise Bounded Description (CBD) + better body of knowledge - domain dependent (use of blank nodes) 3 Depth Limited Crawling + stable over input data with well limiting subgraph - nd a tradeo between size and information content (data dependent) G.A. Grimnes, P. Edwards, and A. Preece. Instance based clustering of semantic web resources. The Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317. 2http://www.w3.org/TR/rdf-sparql-query/ Silvia Giannini RDF data clustering
  • 19. The scenario RDF clustering Proposal Preliminary Results Conclusions State of Art RDF clustering: literature Instances distance computation Comparing two RDF graphs with the resources as root nodes 1 feature-vector based mappings: (feature → shortest path; value → set of reachable nodes) similarity measure: e.g., Dice coecient 2 graph based conceptual similarity: overlapping of nodes relational similarity: overlapping of edges 3 ontology based3 (well dened ontology and conforming instance data) taxonomy similarity: semantic distance between metadata in a concept hierarchy relation similarity: similarity of the instances related to the two considered resources attribute similarity: similarity of attribute values (numeric, literal, etc.) Determine the appropriate number of clusters 3A. Maedche, and V. Zacharias. Clustering ontology-based metadata in the semantic web. Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2002. 348-360. Silvia Giannini RDF data clustering
  • 20. The scenario RDF clustering Proposal Preliminary Results Conclusions Outline 1 The scenario 2 RDF clustering 3 Proposal 4 Preliminary Results 5 Conclusions Silvia Giannini RDF data clustering
  • 21. The scenario RDF clustering Proposal Preliminary Results Conclusions Requirements Ideal clustering of graph-structured data: cohesive intra-cluster structure homogeneous intra-cluster properties Parameter free algorithm: number and size of partitions extracted from data Silvia Giannini RDF data clustering
  • 22. The scenario RDF clustering Proposal Preliminary Results Conclusions How does community detection algorithms behave over RDF(S) graphs? Community Discovery Algorithms Graph mining techniques for extracting knowledge from large graphs Exploit native graph features (topology ) of the RDF model Why: If two sets of entities are strongly related, they exhibit more connections than other sets of entities Benets: + Automatically discover the number and size of modules + Can handle uncertainty in clustering (overlapping communities) + Faster than data-clustering inspired techniques (no instances extraction) Silvia Giannini RDF data clustering
  • 23. The scenario RDF clustering Proposal Preliminary Results Conclusions What is a community A subgraph of a network whose nodes are more tightly connected with each other than with nodes outside the subgraph. Similarity : cohesion degree of subsets of vertices - No overlapping capabilities C = {C1, . . . , Cn}, Ci ∩ Cj = ∅ ∀i, j ∈ {1, . . . , n}, i = j In labeled graphs (like RDF graphs), each link models only one specic relation Overlapping Communities Analysis Silvia Giannini RDF data clustering
  • 24. The scenario RDF clustering Proposal Preliminary Results Conclusions From Node to Link Perspective Community : A set of nodes with more external than internal connections, i.e., a set of closely interrelated links. Benets: + Captures multiple memberships between nodes + Unies hierarchical and overlapping clustering It is always possible to move from a link partition P = {P1, . . . , Pm}, Pi ∩ Pj = ∅ ∀i, j ∈ {1, . . . , m}, i = j to m nodes clusters, with possible overlapping. Silvia Giannini RDF data clustering
  • 25. The scenario RDF clustering Proposal Preliminary Results Conclusions Datasets SP2 Bench4 : A SPARQL Performance Benchmark data generator for arbitrarily large DBLP-like RDF documents creation mirrors key characteristics and social-world distributions of original DBLP dataset publicy available 4M. Schmidt, et al. SP2Bench: SPARQL performance benchmark. Semantic Web Information Management. Springer Berlin Heidelberg, 2010. 371-393. Silvia Giannini RDF data clustering
  • 26. The scenario RDF clustering Proposal Preliminary Results Conclusions Node communities SP2 Bench: 720 triples Paul_ErdoesPaul_Erdoes ArticleArticle PersonPerson ArticleArticle Paul_ErdoesPaul_Erdoes PersonPerson V.D. Blondel, et al. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008.10 (2008): P10008. Tool: Gephi (https://gephi.org) Silvia Giannini RDF data clustering
  • 27. The scenario RDF clustering Proposal Preliminary Results Conclusions Link Communities Given an undirected graph G = (V, E), the set of neighbors of node i is Ni = {j ∈ V|eij ∈ E}. Similarity 5 : S(eik, ejk) = |Ni∩Nj | |Ni∪Nj | Link Dendrogram: hierarchical agglomerative algorithm Optimization of Partition density : cut level optimizes link density inside communities DP = 2 M c mc mc−(nc−1) (nc−2)(nc−1) , 5Y.Y. Ahn, J.P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature 466.7307 (2010): 761-764. Silvia Giannini RDF data clustering
  • 28. The scenario RDF clustering Proposal Preliminary Results Conclusions Outline 1 The scenario 2 RDF clustering 3 Proposal 4 Preliminary Results 5 Conclusions Silvia Giannini RDF data clustering
  • 29. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF clustering6 Article1 _:x1 dc:creator Adamanta Schlitt foaf:name dc:title richer dwelling scrapped swrc:pages 140 _:x1 _:x2 _:x3 foaf:Person rdf:type rdf:type rdf:type rdf:type rdf:type swrc:journal swrc:journal rdf:type rdf:type swrc:journal dc:creator dc:creator dc:creator SIGNATURE: subject SIGNATURE: (predicate, object) SIGNATURE: {(predicate_1, object_1), ... (predicate_n, object_n)} Different background colours reveal the hierarchy of clusters REPLICATED NODES REVEALING OVERLAPPING CLUSTERS LINKS BELONGING TO OTHER CLUSTERS rdf:type Article20 Article13 Paul_Erdoes swrc:journal swrc:journal Article3 Article2 Article1 Journal1 bench:Article TYPE 1. CLUSTER (a) TYPE 2. CLUSTER (b) TYPE 3. CLUSTER (c) 6S. Giannini, RDF Data Clustering. Springer Berlin Heidelberg, 2013. BIS 2013 Workshop, LNBIP 160: 220231. Silvia Giannini RDF data clustering
  • 30. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF clustering Cluster of type 1. Instance extraction (xed subject) Cluster of type 2. Aggregation of resources (xed predicate - xed object) Mixed-type clusters Set of clusters of type 1. (or equivalently, of type 2.) Silvia Giannini RDF data clustering
  • 31. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF clustering Cluster of type 1. Instance extraction (xed subject) ex:Article15 swrc:pages 139 ex:Article15 dc:title equalled bewitchment cheaters ex:Article15 dc:creator ex:node17r3ptqpmx16 ex:Article15 rdfs:seeAlso http://www.skeins.tld/sandwiching/bewitchment.html ex:Article15 foaf:homepage http://www.sandwiching.tld/cheaters/ried.html Cluster of type 2. Aggregation of resources (predicate - object) Mixed-type clusters Set of clusters of type 1. (or equivalently, of type 2.) Silvia Giannini RDF data clustering
  • 32. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF clustering Cluster of type 1. Instance extraction (xed subject) Cluster of type 2. Aggregation of resources (xed predicate - xed object) ex:Article9 swrc:journal http://localhost/publications/journals/Journal1/1945 ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1945 ex:Article7 swrc:journal http://localhost/publications/journals/Journal1/1945 ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1945 ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1945 ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1945 ex:Article10 swrc:journal http://localhost/publications/journals/Journal1/1945 Mixed-type clusters Set of clusters of type 1. (or equivalently, of type 2.) Silvia Giannini RDF data clustering
  • 33. The scenario RDF clustering Proposal Preliminary Results Conclusions RDF clustering Cluster of type 1. Instance extraction (xed subject) Cluster of type 2. Aggregation of resources (xed predicate - xed object) Mixed-type clusters Set of clusters of type 1. (or equivalently, of type 2.) ex:Article8 dc:creator http://localhost/persons/Paul_Erdoes ex:Article8 rdf:type http://localhost/vocabulary/bench/Article ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1942 ex:Article5 dc:creator http://localhost/persons/Paul_Erdoes ex:Article5 rdf:type http://localhost/vocabulary/bench/Article ex:Article5 swrc:journal http://localhost/publications/journals/Journal1/1942 ex:Article4 dc:creator http://localhost/persons/Paul_Erdoes ex:Article4 rdf:type http://localhost/vocabulary/bench/Article ex:Article4 swrc:journal http://localhost/publications/journals/Journal1/1942 ex:Article3 dc:creator http://localhost/persons/Paul_Erdoes ex:Article3 rdf:type http://localhost/vocabulary/bench/Article ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1942 ex:Article2 dc:creator http://localhost/persons/Paul_Erdoes ex:Article2 rdf:type http://localhost/vocabulary/bench/Article ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1942 ex:Article1 dc:creator http://localhost/persons/Paul_Erdoes ex:Article1 rdf:type http://localhost/vocabulary/bench/Article ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1942 Silvia Giannini RDF data clustering
  • 34. The scenario RDF clustering Proposal Preliminary Results Conclusions Advantages and Emerging issues Tests over 266, 720, and 5362 triples datasets Number of obtained clusters: 53, 277, 3437 + Good behaviour in presence of blank nodes http://localhost/vocabulary/bench/PhDThesis rdfs:subClassOf foaf:Document http://localhost/vocabulary/bench/Www rdfs:subClassOf foaf:Document http://localhost/vocabulary/bench/Book rdfs:subClassOf foaf:Document _:node17rocfnblx296 rdf:_3 misc:UnknownDocument_c _:node17rocfnblx296 rdf:_2 misc:UnknownDocument_b _:node17rocfnblx296 rdf:_1 misc:UnknownDocument_a misc:UnknownDocument_c rdf:type foaf:Document misc:UnknownDocument_b rdf:type foaf:Document misc:UnknownDocument_a rdf:type foaf:Document http://localhost/vocabulary/bench/MastersThesis rdfs:subClassOf foaf:Document - A post-processing phase is needed (links replication) If Paul Erdoes is a Person included in a type 2. cluster with signature (rdf:type - prex:Person), this property will not appear in the cluster of type 1. describing the resource Paul_Erdoes Silvia Giannini RDF data clustering
  • 35. The scenario RDF clustering Proposal Preliminary Results Conclusions Outline 1 The scenario 2 RDF clustering 3 Proposal 4 Preliminary Results 5 Conclusions Silvia Giannini RDF data clustering
  • 36. The scenario RDF clustering Proposal Preliminary Results Conclusions Conclusions and Future Works Community detection algorithms are a promising candidate for: semantic web resources clustering instances extraction from RDF graphs Ongoing and future works: A more comprehensive experimental evaluation on dierent datasets Analysis of cut threshold Better denition of post-processing phase Comparison with existing approaches Combination of (1) graph clustering techniques, and (2) reasoning services 1 Identify communities of closely related resources 2 Extract a semantic description of them Experimentation of property-driven clustering Dynamics and evolution of clusters Silvia Giannini RDF data clustering