SlideShare a Scribd company logo
1 of 126
Download to read offline
Web Data Management in RDF Age 
M. Tamer Ozsu 
University of Waterloo 
David R. Cheriton School of Computer Science 
PKU/2014-08-28 1
Acknowledgements 
This presentation draws upon collaborative research and 
discussions with the following colleagues (in alphabetical order) 
Gunes Aluc, University of Waterloo 
Khuzaima Daudjee, University of Waterloo 
Olaf Hartig, University of Waterloo 
Lei Chen, Hong Kong University of Science  Technology 
Lei Zou, Peking University 
PKU/2014-08-28 2
Web Data Management 
A long term research interest in the DB community 
2000 2004 
2011 2011 
PKU/2014-08-28 3
Interest Due to Properties of Web Data 
Lack of a schema 
Data is at best semi-structured 
Missing data, additional attributes, similar data but not 
identical 
Volatility 
Changes frequently 
May conform to one schema now, but not later 
Scale 
Does it make sense to talk about a schema for Web? 
How do you capture everything? 
Querying diculty 
What is the user language? 
What are the primitives? 
Arent search engines or metasearch engines sucient? 
PKU/2014-08-28 4
More Recent Approaches to Web Querying 
Fusion Tables 
Users contribute data in spreadsheet, CVS, KML format 
Possible joins between multiple data sets 
Extensive visualization 
PKU/2014-08-28 8
More Recent Approaches to Web Querying 
Fusion Tables 
Users contribute data in spreadsheet, CVS, KML format 
Possible joins between multiple data sets 
Extensive visualization 
XML 
Data exchange language 
Primarily tree based structure 
list title=MOVIES 
film 
titleThe Shining/title 
directorStanley Kubrick/director 
actorJack Nicholson/actor 
/film 
film 
titleSpartacus/title 
directorStanley Kubrick/director 
/film 
film 
titleThe Passenger/title 
actorJack Nicholson/actor 
/film 
... 
/list 
root
lm 
title 
The Shining 
director 
Stanley Kubrick
lm 
actor 
... 
Jack Nicholson
lm 
title 
The Passenger 
actor 
Jack Nicholson 
PKU/2014-08-28 8
More Recent Approaches to Web Querying 
Fusion Tables 
Users contribute data in spreadsheet, CVS, KML format 
Possible joins between multiple data sets 
Extensive visualization 
XML 
Data exchange language 
Primarily tree based structure 
RDF (Resource Description Framework)  SPARQL 
W3C recommendation 
Simple, self-descriptive model 
Building block of semantic web  Linked Open Data (LOD) 
PKU/2014-08-28 8
RDF and Semantic Web 
RDF is a language for the conceptual modeling of information 
about resources (web resources in our context) 
A building block of semantic web 
Facilitates exchange of information 
Search engine results can be more focused and structured 
Facilitates data integration (mashes) 
Machine understandable 
Understand the information on the web and the 
interrelationships among them 
PKU/2014-08-28 9
RDF Uses 
Yago and DBpedia extract facts from Wikipedia  represent 
as RDF ! structural queries 
Communities build RDF data 
E.g., biologists: Bio2RDF and Uniprot RDF 
Web data integration 
Linked Open Data Cloud 
. . . 
PKU/2014-08-28 10
RDF Data Volumes . . . 
. . . are growing { and fast 
Linked data cloud currently consists of 325 datasets with 
25B triples 
Size almost doubling every year 
PKU/2014-08-28 11
RDF Data Volumes . . . 
. . . are growing { and fast 
Linked data cloud currently consists of 325 datasets with 
25B triples 
Size almost doubling every year 
As of March 2009 
LinkedCT 
Reactome 
Taxonomy 
KEGG 
GeneID 
PubMed 
Pfam 
UniProt 
OMIM 
PDB 
BBC 
Later + 
TOTP 
riese 
Symbol 
ChEBI 
Daily 
Med 
Disea-some 
CAS 
HGNC 
Inter 
Pro 
Drug 
Bank 
UniParc 
UniRef 
ProDom 
PROSITE 
Gene 
Ontology 
Homolo 
Gene 
Pub 
Chem 
MGI 
UniSTS 
GEO 
Species 
Jamendo 
BBC 
Programm 
es 
Music-brainz 
Magna-tune 
Surge 
Radio 
MySpace 
Wrapper 
Audio- 
Scrobbler 
Linked 
MDB 
BBC 
John 
Peel 
BBC 
Playcount 
Data 
Gov- 
Track 
US 
Census 
Data 
Geo-names 
lingvoj 
World 
Fact-book 
Euro-stat 
IRIT 
Toulouse 
SW 
Conference 
Corpus 
RDF Book 
Mashup 
Project 
Guten-berg 
DBLP 
Hannover 
DBLP 
Berlin 
LAAS-CNRS 
Buda-pest 
BME 
IEEE 
IBM 
Resex 
Pisa 
New-castle 
RAE 
2001 
CiteSeer 
ACM 
DBLP 
RKB 
Explorer 
eprints 
LIBRIS 
Semantic 
Web.org Eurécom 
ECS 
South-ampton 
SIOC Revyu 
Sites 
Doap-space 
Flickr 
exporter 
FOAF 
profiles 
flickr 
wrappr 
Crunch 
Base 
Sem- 
Web- 
Central 
Open- 
Guides 
Wiki-company 
QDOS 
Pub 
Guide 
Open 
Calais 
RDF 
ohloh 
W3C 
WordNet 
Open 
Cyc 
UMBEL 
Yago 
DBpedia 
Freebase 
Virtuoso 
Sponger 
March '09: 
89 datasets 
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. 
http://lod-cloud.net/ 
PKU/2014-08-28 11
RDF Data Volumes . . . 
. . . are growing { and fast 
Linked data cloud currently consists of 325 datasets with 
25B triples 
Size almost doubling every year 
New-castle 
User-generated content 
As of September 2010 
Audio-scrobbler 
(DBTune) 
Music 
Brainz 
(zitgist) 
P20 
YAGO 
Chronic-ling 
America 
World 
Fact-book 
(FUB) 
Geo 
Names 
Moseley 
Folk 
WordNet 
(VUA) 
WordNet 
(W3C) 
VIVO UF 
VIVO 
Indiana 
VIVO 
Cornell 
VIAF 
URI 
Burner 
Sussex 
Reading 
Lists 
Plymouth 
Reading 
Lists 
UMBEL 
UK Post-codes 
legislation 
.gov.uk 
Uberblic 
UB 
Mann-heim 
TWC LOGD 
GTAA 
BBC 
Program 
mes 
Twarql 
transport 
data.gov 
.uk 
totl.net 
Tele-graphis 
TCM 
Gene 
DIT 
Taxon 
Concept 
The Open 
Library 
(Talis) 
t4gm 
Surge 
Radio 
RAMEAU 
SH 
STW 
statistics 
data.gov 
.uk 
St. 
Andrews 
Resource 
Lists 
ECS 
South-ampton 
EPrints 
Semantic 
Crunch 
Base 
semantic 
web.org 
Semantic 
XBRL 
SW 
Dog 
Food 
rdfabout 
US SEC 
Wiki 
UN/ 
LOCODE 
Ulm 
ECS 
(RKB 
Explorer) 
Roma 
RISKS 
RESEX 
RAE2001 
Pisa 
OS 
OAI 
NSF 
LAAS 
KISTI 
JISC 
IRIT 
IEEE 
IBM 
Eurécom 
ERA 
ePrints 
dotAC 
DEPLOY 
DBLP 
(RKB 
Explorer) 
Course-ware 
CORDIS 
CiteSeer 
Budapest 
ACM 
riese 
Revyu 
research 
data.gov 
.uk 
reference 
data.gov 
.uk 
Recht-spraak. 
nl 
RDF 
ohloh 
Last.FM 
(rdfize) 
RDF 
Book 
Mashup 
PSH 
lingvoj 
Product 
DB 
Poké-pédia 
PBAC 
Ord-nance 
Survey 
Openly 
Local 
The Open 
Library 
Open 
Cyc 
Open 
Calais 
OpenEI 
New 
York 
Times 
NTU 
Resource 
Lists 
NDL 
subjects 
MARC 
Codes 
List 
Man-chester 
Reading 
Lists 
Lotico 
The 
London 
Gazette 
LOIUS 
lobid 
Resources 
lobid 
Organi-sations 
Linked 
MDB 
Linked 
LCCN 
Linked 
GeoData 
Linked 
CT 
Linked 
Open 
Numbers 
LIBRIS 
Lexvo 
LCSH 
DBLP 
(L3S) 
Linked 
Sensor Data 
(Kno.e.sis) 
Good-win 
Family 
Jamendo 
iServe 
NSZL 
Catalog 
GovTrack 
GESIS 
Geo 
Species 
Geo 
Linked 
Data 
(es) 
STITCH 
Project 
Guten-berg 
(FUB) 
SIDER 
Medi 
Care 
Euro-stat 
(FUB) 
Drug 
Bank 
Disea-some 
DBLP 
(FU 
Berlin) 
Daily 
Med 
Freebase 
flickr 
wrappr 
Fishes 
of Texas 
FanHubz 
Event- 
Media 
EUTC 
Produc-tions 
Eurostat 
EUNIS 
ESD 
stan-dards 
Popula-tion 
(En- 
AKTing) 
NHS 
(EnAKTing) 
Mortality 
(En- 
AKTing) 
Energy 
(En- 
AKTing) 
CO2 
(En- 
AKTing) 
education 
data.gov 
.uk 
ECS 
South-ampton 
Gem. 
Norm-datei 
data 
dcs 
MySpace 
(DBTune) 
Music 
Brainz 
(DBTune) 
Magna-tune 
John 
Peel 
(DB 
Tune) 
classical 
(DB 
Tune) 
Last.fm 
Artists 
(DBTune) 
DB 
Tropes 
dbpedia 
lite 
DBpedia 
Pokedex 
Airports 
NASA 
(Data 
Incu-bator) 
Music 
Brainz 
(Data 
Incubator) 
Discogs 
(Data In-cubator) 
Climbing 
Linked Data 
for Intervals 
Cornetto 
Chem2 
Bio2RDF 
biz. 
data. 
gov.uk 
UniRef 
UniSTS 
Uni 
Path-way 
Taxo-nomy 
UniParc 
UniProt 
SGD 
Reactome 
PubMed 
Pub 
Chem 
Pfam PDB 
PRO-SITE 
ProDom 
OMIM 
OBO 
MGI 
KEGG 
Reaction 
KEGG 
Drug 
KEGG 
Pathway 
KEGG 
Glycan 
KEGG 
Enzyme 
KEGG 
Cpd 
InterPro 
Homolo 
Gene 
HGNC 
Gene 
Ontology 
GeneID 
Gen 
Bank 
ChEBI 
CAS 
Affy-metrix 
BibBase 
BBC 
Wildlife 
Finder 
BBC 
Music 
rdfabout 
US Census 
Media 
Geographic 
Publications 
Government 
Cross-domain 
Life sciences 
September '10: 
203 datasets 
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. 
http://lod-cloud.net/ 
PKU/2014-08-28 11
RDF Data Volumes . . . 
. . . are growing { and fast 
Linked data cloud currently consists of 325 datasets with 
25B triples 
Size almost doubling every year 
RESEX 
IBM 
User-generated content 
As of September 2011 
Audio 
Scrobbler 
(DBTune) 
Music 
Brainz 
(zitgist) 
P20 
Turismo 
de 
Zaragoza 
yovisto 
Yahoo! 
Geo 
Planet 
World 
Fact-book 
Moseley 
Folk 
YAGO 
El 
Viajero 
Tourism 
BBC 
Program 
mes BBC 
Geo 
Names 
WordNet 
(VUA) 
WordNet 
(W3C) 
VIVO UF 
Calames 
VIVO 
Indiana 
VIVO 
Cornell 
VIAF 
URI 
Burner 
Sussex 
Reading 
Lists 
Plymouth 
Reading 
Lists 
Source Code 
Ecosystem 
Linked Data 
UniProt 
PubMed 
UniRef 
UMBEL 
UK Post-codes 
legislation 
data.gov.uk 
Uberblic 
UB 
Mann-heim 
TWC LOGD 
Twarql 
transport 
data.gov. 
uk 
Traffic 
Scotland 
theses. 
fr 
Thesau-rus 
W 
totl.net 
Tele-graphis 
Semantic 
Tweet 
TCM 
Gene 
DIT 
Taxon 
Concept 
Open 
Library 
(Talis) 
tags2con 
delicious 
t4gm 
info 
Swedish 
Open 
Cultural 
Heritage 
Surge 
Radio 
Sudoc 
STW 
RAMEAU 
SH 
statistics 
data.gov. 
uk 
St. 
Andrews 
Resource 
Lists 
ECS 
South-ampton 
EPrints 
SSW 
Thesaur 
us 
Linked 
User 
Feedback 
gnoss 
Greek 
DBpedia 
Smart 
Link 
Slideshare 
2RDF 
semantic 
web.org 
GovTrack 
Semantic 
XBRL 
SW 
Dog 
Food 
US SEC 
(rdfabout) 
Sears 
Scotland 
Pupils  
Exams 
Scotland 
Geo-graphy 
Scholaro-meter 
WordNet 
(RKB 
Explorer) 
Wiki 
UN/ 
LOCODE 
Ulm 
ECS 
(RKB 
Explorer) 
Roma 
RISKS 
RAE2001 
Pisa 
OS 
OAI 
NSF 
New-castle 
LAAS 
KISTI 
JISC 
IRIT 
IEEE 
Eurécom 
ERA 
ePrints dotAC 
DEPLOY 
DBLP 
(RKB 
Explorer) 
Crime 
Reports 
UK 
Course-ware 
CORDIS 
(RKB 
Explorer) 
CiteSeer 
Budapest 
ACM 
riese 
Revyu 
research 
data.gov. 
Ren. uk 
Energy 
Genera-tors 
reference 
data.gov. 
uk 
Recht-spraak. 
nl 
RDF 
ohloh 
Last.FM 
(rdfize) 
RDF 
Book 
Mashup 
Rådata 
nå! 
PSH 
Product 
Types 
Ontology 
Product 
DB 
PBAC 
Poké-pédia 
patents 
data.go 
v.uk 
Ox 
Points 
Ord-nance 
Survey 
Openly 
Local 
Open 
Library 
Open 
Cyc 
Open 
Corpo-rates 
Open 
Calais 
OpenEI 
Open 
Election 
Data 
Project 
Open 
Data 
Thesau-rus 
Ontos 
News 
Portal 
OGOLOD 
Ocean 
Drilling 
Codices 
Janus 
AMP 
New 
York 
Times 
NVD 
ntnusc 
NTU 
Resource 
Lists 
Norwe-gian 
MeSH 
NDL 
subjects 
ndlna 
my 
Experi-ment 
Italian 
Museums 
medu-cator 
MARC 
Codes 
List 
Man-chester 
Reading 
Lists 
Lotico 
Weather 
Stations 
London 
Gazette 
LOIUS 
Linked 
Open 
Colors 
lobid 
Resources 
lobid 
Organi-sations 
LEM 
Linked 
MDB 
LinkedL 
CCN 
Linked 
GeoData 
LinkedCT 
LOV 
Linked 
Open 
Numbers 
LODE 
Eurostat 
(Ontology 
Central) 
Linked 
EDGAR 
(Ontology 
Central) 
Linked 
Crunch-base 
lingvoj 
Lichfield 
Spen-ding 
LIBRIS 
Lexvo 
LCSH 
DBLP 
(L3S) 
Linked 
Sensor Data 
(Kno.e.sis) 
Klapp-stuhl-club 
Good-win 
Family 
National 
Radio-activity 
JP 
Jamendo 
(DBtune) 
Italian 
public 
schools 
ISTAT 
Immi-gration 
iServe 
IdRef 
Sudoc 
NSZL 
Catalog 
Hellenic 
PD 
Hellenic 
FBD 
Piedmont 
Accomo-dations 
GovWILD 
Google 
Art 
wrapper 
GESIS 
GeoWord 
Net 
Geo 
Species 
Geo 
Linked 
Data 
GEMET 
GTAA 
STITCH 
SIDER 
Project 
Guten-berg 
Medi 
Care 
Euro-stat 
(FUB) 
EURES 
Drug 
Bank 
Disea-some 
DBLP 
(FU 
Berlin) 
Daily 
Med 
CORDIS 
(FUB) 
Freebase 
flickr 
wrappr 
Fishes 
of Texas 
Finnish 
Munici-palities 
ChEMBL 
FanHubz 
Event 
Media 
EUTC 
Produc-tions 
Eurostat 
Europeana 
EUNIS 
EU 
Insti-tutions 
ESD 
stan-dards 
EARTh 
Enipedia 
Popula-tion 
(En- 
AKTing) 
NHS 
(En- 
AKTing) Mortality 
(En- 
AKTing) 
Energy 
(En- 
AKTing) 
Crime 
(En- 
AKTing) 
CO2 
Emission 
(En- 
AKTing) 
EEA 
SISVU 
educatio 
n.data.g 
ov.uk 
ECS 
South-ampton 
ECCO-TCP 
GND 
Didactal 
ia 
DDC Deutsche 
Bio-graphie 
data 
dcs 
Music 
Brainz 
(DBTune) 
Magna-tune 
John 
Peel 
(DBTune) 
Classical 
(DB 
Tune) 
Last.FM 
artists 
(DBTune) 
DB 
Tropes 
Portu-guese 
DBpedia 
dbpedia 
lite 
DBpedia 
data-open-ac- 
uk 
SMC 
Journals 
Pokedex 
Airports 
NASA 
(Data 
Incu-bator) 
Music 
Brainz 
(Data 
Incubator) 
Metoffice 
Weather 
Forecasts 
Discogs 
(Data 
Incubator) 
Climbing 
data.gov.uk 
intervals 
Data 
Gov.ie 
data 
bnf.fr 
Cornetto 
reegle 
Chronic-ling 
America 
Chem2 
Bio2RDF 
business 
data.gov. 
uk 
Bricklink 
Brazilian 
Poli-ticians 
BNB 
UniSTS 
UniPath 
way 
UniParc 
Taxono 
my 
UniProt 
(Bio2RDF) 
SGD 
Reactome 
Pub 
Chem 
PRO-SITE 
ProDom 
Pfam 
PDB 
OMIM 
MGI 
KEGG 
Reaction 
KEGG 
Pathway 
KEGG 
Glycan 
KEGG 
Enzyme 
KEGG 
Drug 
KEGG 
Com-pound 
InterPro 
Homolo 
Gene 
HGNC 
Gene 
Ontology 
GeneID 
Affy-metrix 
bible 
ontology 
BibBase 
FTS 
BBC 
Wildlife 
Finder 
Music 
Alpine 
Ski 
Austria 
LOCAH 
Amster-dam 
Museum 
AGROV 
OC 
AEMET 
US Census 
(rdfabout) 
Media 
Geographic 
Publications 
Government 
Cross-domain 
Life sciences 
September '11: 
295 datasets, 25B 
triples 
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. 
http://lod-cloud.net/ 
PKU/2014-08-28 11
RDF Data Volumes . . . 
. . . are growing { and fast 
Linked data cloud currently consists of 325 datasets with 
25B triples 
Size almost doubling every year 
April '14: 
1091 datasets, ??? 
triples 
Max Schmachtenberg, Christian Bizer, and Heiko Paulheim: Adoption of Linked 
Data Best Practices in Dierent Topical Domains. In Proc. ISWC, 2014. 
PKU/2014-08-28 11
Closer Look 
PKU/2014-08-28 12
Globally Distributed Network of Data 
PKU/2014-08-28 13
Three Approaches 
Data warehousing 
Consolidate data in a repository and query it 
SPARQL federation 
Leverage query services provided by data publishers 
Live Linked Data querying 
Navigate through LOD by looking up URIs at query execution 
time 
PKU/2014-08-28 14
Outline 
1 LOD and RDF Introduction 
2 Data Warehousing Approach 
Relational Approaches 
Graph-Based Approaches 
3 SPARQL Federation Approach 
Distributed RDF Processing 
SPARQL Endpoint Federation 
4 Live Querying Approach 
Traversal-based approaches 
Index-based approaches 
Hybrid approaches 
5 Conclusions 
PKU/2014-08-28 15
Outline 
1 LOD and RDF Introduction 
2 Data Warehousing Approach 
Relational Approaches 
Graph-Based Approaches 
3 SPARQL Federation Approach 
Distributed RDF Processing 
SPARQL Endpoint Federation 
4 Live Querying Approach 
Traversal-based approaches 
Index-based approaches 
Hybrid approaches 
5 Conclusions 
PKU/2014-08-28 16
Traditional Hypertext-based Web Access 
IMDb World 
Book 
Data exposed 
to the Web 
via HTML 
PKU/2014-08-28 17
Linked Data Publishing Principles 
(http://...linkedmdb.../Shining,releaseDate, 23 May 1980) 
(http://...linkedmdb.../Shining,
lmLocation, http://cia.../UK) 
(http://...linkedmdb.../29704,actedIn, http://...linkedmdb.../Shining) 
IMDb World 
Book 
... 
(http://cia.../UK, hasPopulation, 63230000) 
... 
Shining 
UK 
Data model: RDF 
Global identi
er: URI 
Access mechanism: HTTP 
Connection: data links 
PKU/2014-08-28 18
RDF Introduction 
Everything is an uniquely named 
resource 
http://data.linkedmdb.org/resource/actor/JN29704 
PKU/2014-08-28 19
RDF Introduction 
Everything is an uniquely named 
resource 
Pre
xes can be used to shorten the 
names 
xmlns:y=http://data.linkedmdb.org/resource/actor/ 
y:JN29704 
PKU/2014-08-28 19
RDF Introduction 
Everything is an uniquely named 
resource 
Pre
xes can be used to shorten the 
names 
Properties of resources can be 
de
ned 
xmlns:y=http://data.linkedmdb.org/resource/actor/ 
y:JN29704 
y:JN29704:hasName Jack Nicholson 
y:JN29704:BornOnDate 1937-04-22 
PKU/2014-08-28 19
RDF Introduction 
Everything is an uniquely named 
resource 
Pre
xes can be used to shorten the 
names 
Properties of resources can be 
de
ned 
Relationships with other resources 
can be de
ned 
xmlns:y=http://data.linkedmdb.org/resource/actor/ 
y:JN29704 
y:JN29704:hasName Jack Nicholson 
y:JN29704:BornOnDate 1937-04-22 
JN29704:movieActor 
y:TS2014 
y:TS2014:title The Shining 
y:TS2014:releaseDate 1980-05-23 
PKU/2014-08-28 19
RDF Introduction 
Everything is an uniquely named 
resource 
Pre
xes can be used to shorten the 
names 
Properties of resources can be 
de
ned 
Relationships with other resources 
can be de
ned 
Resource descriptions can be 
contributed by dierent 
people/groups and can be located 
anywhere in the web 
Integrated web database 
xmlns:y=http://data.linkedmdb.org/resource/actor/ 
y:JN29704 
y:JN29704:hasName Jack Nicholson 
y:JN29704:BornOnDate 1937-04-22 
JN29704:movieActor 
y:TS2014 
y:TS2014:title The Shining 
y:TS2014:releaseDate 1980-05-23 
PKU/2014-08-28 19
RDF Data Model 
Triple: Subject, Predicate (Property), 
Object (s; p; o) 
Subject: the entity that is described 
(URI or blank node) 
Predicate: a feature of the entity (URI) 
Object: value of the feature (URI, 
blank node or literal) 
(s; p; o) 2 (U [ B)  U  (U [ B [ L) 
Set of RDF triples is called an RDF graph 
U 
Predicate 
Subject Object 
U B U B L 
U: set of URIs 
B: set of blank nodes 
L: set of literals 
Subject Predicate Object 
http://...imdb.../
lm/2014 rdfs:label The Shining 
http://...imdb.../
lm/2014 movie:releaseDate 1980-05-23 
http://...imdb.../29704 movie:actor name Jack Nicholson 
: : : : : : : : : 
PKU/2014-08-28 20
RDF Example Instance 
Pre
xes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/ 
bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/ 
lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/ 
Subject Predicate Object 
mdb:
lm/2014 rdfs:label The Shining 
mdb:
lm/2014 movie:initial release date 1980-05-23' 
mdb:
lm/2014 movie:director mdb:director/8476 
mdb:
lm/2014 movie:actor mdb:actor/29704 
mdb:
lm/2014 movie:actor mdb: actor/30013 
mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 
mdb:
lm/2014 foaf:based near geo:2635167 
mdb:
lm/2014 movie:relatedBook bm:0743424425 
mdb:
lm/2014 movie:language lexvo:iso639-3/eng 
mdb:director/8476 movie:director name Stanley Kubrick 
mdb:
lm/2685 movie:director mdb:director/8476 
mdb:
lm/2685 rdfs:label A Clockwork Orange 
mdb:
lm/424 movie:director mdb:director/8476 
mdb:
lm/424 rdfs:label Spartacus 
mdb:actor/29704 movie:actor name Jack Nicholson 
mdb:
lm/1267 movie:actor mdb:actor/29704 
mdb:
lm/1267 rdfs:label The Last Tycoon 
mdb:
lm/3418 movie:actor mdb:actor/29704 
mdb:
lm/3418 rdfs:label The Passenger 
geo:2635167 gn:name United Kingdom 
geo:2635167 gn:population 62348447 
geo:2635167 gn:wikipediaArticle wp:United Kingdom 
bm:books/0743424425 dc:creator bm:persons/Stephen+King 
bm:books/0743424425 rev:rating 4.7 
bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer 
lexvo:iso639-3/eng rdfs:label English 
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA 
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn 
URI Literal 
URI 
PKU/2014-08-28 21
RDF Graph 
United Kingdom 
gn:name 
The Passenger 
refs:label 
62348447 
gn:population 
mdb:
lm/2014 
movie:initial release date 
1980-05-23 
bm:oers/0743424425amazonOer 
The Shining 
refs:label 
bm:books/0743424425 
4.7 
rev:rating 
geo:2635167 
The Last Tycoon 
refs:label 
movie:actor movie:actor 
mdb:actor/29704 
movie:actor name 
Jack Nicholson 
mdb:
lm/3418 
mdb:
lm/1267 
mdb:director/8476 
Stanley Kubrick 
movie:director name 
mdb:
lm/2685 
refs:label 
A Clockwork Orange 
mdb:
lm/424 
refs:label 
Spartacus 
mdb:actor/30013 
movie:relatedBook 
scam:hasOer 
foaf:based near 
movie:actor 
movie:director 
movie:actor 
movie:director movie:director 
PKU/2014-08-28 22
Linked Data Model [Hartig, 2012] 
Web Document 
Given a countably in
nite set D (documents), a Web of Linked 
Data is a tuple W = (D; adoc; data) where: 
I D  D, 
I adoc is a partial mapping from URIs to D, and 
I data is a total mapping from D to
nite sets of RDF triples. 
PKU/2014-08-28 23
Linked Data Model [Hartig, 2012] 
Web Document 
Given a countably in
nite set D (documents), a Web of Linked 
Data is a tuple W = (D; adoc; data) where: 
I D  D, 
I adoc is a partial mapping from URIs to D, and 
I data is a total mapping from D to
nite sets of RDF triples. 
Web of Linked Data 
A Web of Linked Data W = (D; adoc; data) 
contains a data link from document d 2 D to 
document d0 2 D if there exists a URI u such 
that: 
I u is mentioned in an RDF triple 
t 2 data(d), and 
I d0 = adoc(u). 
PKU/2014-08-28 23
RDF Query Model { SPARQL 
Query Model - SPARQL Protocol and RDF Query Language 
Given U (set of URIs), L (set of literals), and V (set of 
variables), a SPARQL expression is de
ned recursively: 
an atomic triple pattern, which is an element of 
(U [ V)  (U [ V)  (U [ V [ L) 
?x rdfs:label The Shining 
P FILTER R, where P is a graph pattern expression and R is a 
built-in SPARQL condition (i.e., analogous to a SQL predicate) 
?x rev:rating ?p FILTER(?p  3.0) 
P1 AND/OPT/UNION P2, where P1 and P2 are graph 
pattern expressions 
Example: 
SELECT ?name 
WHERE f 
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . 
?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . 
?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . 
FILTER(? r  4 . 0 ) 
g 
PKU/2014-08-28 24
SPARQL Queries 
SELECT ?name 
WHERE f 
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . 
?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . 
?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . 
FILTER(? r  4 . 0 ) 
g 
FILTER(?r  4.0) 
?m ?d 
movie:director 
?name 
rdfs:label 
?b 
movie:relatedBook 
Stanley Kubrick 
movie:director name 
?r 
rev:rating 
PKU/2014-08-28 25
Outline 
1 LOD and RDF Introduction 
2 Data Warehousing Approach 
Relational Approaches 
Graph-Based Approaches 
3 SPARQL Federation Approach 
Distributed RDF Processing 
SPARQL Endpoint Federation 
4 Live Querying Approach 
Traversal-based approaches 
Index-based approaches 
Hybrid approaches 
5 Conclusions 
PKU/2014-08-28 26
Nave Triple Store Design 
SELECT ?name 
WHERE f 
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . 
?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . 
?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . 
FILTER(? r  4 . 0 ) 
g 
Subject Property Object 
mdb:
lm/2014 rdfs:label The Shining 
mdb:
lm/2014 movie:initial release date 1980-05-23 
mdb:
lm/2014 movie:director mdb:director/8476 
mdb:
lm/2014 movie:actor mdb:actor/29704 
mdb:
lm/2014 movie:actor mdb: actor/30013 
mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 
mdb:
lm/2014 foaf:based near geo:2635167 
mdb:
lm/2014 movie:relatedBook bm:0743424425 
mdb:
lm/2014 movie:language lexvo:iso639-3/eng 
mdb:director/8476 movie:director name Stanley Kubrick 
mdb:
lm/2685 movie:director mdb:director/8476 
mdb:
lm/2685 rdfs:label A Clockwork Orange 
mdb:
lm/424 movie:director mdb:director/8476 
mdb:
lm/424 rdfs:label Spartacus 
mdb:actor/29704 movie:actor name Jack Nicholson 
mdb:
lm/1267 movie:actor mdb:actor/29704 
mdb:
lm/1267 rdfs:label The Last Tycoon 
mdb:
lm/3418 movie:actor mdb:actor/29704 
mdb:
lm/3418 rdfs:label The Passenger 
geo:2635167 gn:name United Kingdom 
geo:2635167 gn:population 62348447 
geo:2635167 gn:wikipediaArticle wp:United Kingdom 
bm:books/0743424425 dc:creator bm:persons/Stephen+King 
bm:books/0743424425 rev:rating 4.7 
bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer 
lexvo:iso639-3/eng rdfs:label English 
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA 
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn 
PKU/2014-08-28 27
Nave Triple Store Design 
SELECT ?name 
WHERE f 
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . 
?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . 
?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . 
FILTER(? r  4 . 0 ) 
g 
Subject Property Object 
mdb:
lm/2014 rdfs:label The Shining 
mdb:
lm/2014 movie:initial release date 1980-05-23 
mdb:
lm/2014 movie:director mdb:director/8476 
mdb:
lm/2014 movie:actor mdb:actor/29704 
mdb:
lm/2014 movie:actor mdb: actor/30013 
mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 
mdb:
lm/2014 foaf:based near geo:2635167 
mdb:
lm/2014 movie:relatedBook bm:0743424425 
mdb:
lm/2014 movie:language lexvo:iso639-3/eng 
mdb:director/8476 movie:director name Stanley Kubrick 
mdb:
lm/2685 movie:director mdb:director/8476 
mdb:
lm/2685 rdfs:label A Clockwork Orange 
mdb:
lm/424 movie:director mdb:director/8476 
mdb:
lm/424 rdfs:label Spartacus 
mdb:actor/29704 movie:actor name Jack Nicholson 
mdb:
lm/1267 movie:actor mdb:actor/29704 
mdb:
lm/1267 rdfs:label The Last Tycoon 
mdb:
lm/3418 movie:actor mdb:actor/29704 
mdb:
lm/3418 rdfs:label The Passenger 
geo:2635167 gn:name United Kingdom 
geo:2635167 gn:population 62348447 
geo:2635167 gn:wikipediaArticle wp:United Kingdom 
bm:books/0743424425 dc:creator bm:persons/Stephen+King 
bm:books/0743424425 rev:rating 4.7 
bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer 
lexvo:iso639-3/eng rdfs:label English 
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA 
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn 
SELECT T1 . o b j e c t 
FROM T as T1 , T as T2 , T as T3 , 
T as T4 , T as T5 
WHERE T1 . p= r d f s : l a b e l  
AND T2 . p=movie : r e l a t e dBo o k  
AND T3 . p=movie : d i r e c t o r  
AND T4 . p= r e v : r a t i n g  
AND T5 . p=movie : d i r e c t o r n ame  
AND T1 . s=T2 . s 
AND T1 . s=T3 . s 
AND T2 . o=T4 . s 
AND T3 . o=T5 . s 
AND T4 . o  4 . 0 
AND T5 . o= S t a n l e y Kubr i ck  
PKU/2014-08-28 27
Nave Triple Store Design 
SELECT ?name 
WHERE f 
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . 
?d movie : d i r e c t o r n ame  S t a n l e y Kubr i ck  . 
?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . 
FILTER(? r  4 . 0 ) 
g 
Subject Property Object 
mdb:
lm/2014 rdfs:label The Shining 
mdb:
lm/2014 movie:initial release date 1980-05-23 
mdb:
lm/2014 movie:director mdb:director/8476 
mdb:
lm/2014 movie:actor mdb:actor/29704 
mdb:
lm/2014 movie:actor mdb: actor/30013 
mdb:
lm/2014 movie:music contributor mdb: music contributor/4110 
mdb:
lm/2014 foaf:based near geo:2635167 
mdb:
lm/2014 movie:relatedBook bm:0743424425 
mdb:
lm/2014 movie:language lexvo:iso639-3/eng 
mdb:director/8476 movie:director name Stanley Kubrick 
mdb:
lm/2685 movie:director mdb:director/8476 
mdb:
lm/2685 rdfs:label A Clockwork Orange 
mdb:

More Related Content

What's hot

Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data ArrivesRichard Wallis
 
An Introduction to RDF and the Web of Data
An Introduction to RDF and the Web of DataAn Introduction to RDF and the Web of Data
An Introduction to RDF and the Web of DataOlaf Hartig
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedJakob .
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...Alison Hitchens
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumRobert Sanderson
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgAI4BD GmbH
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)Alison Hitchens
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and librariesAlison Hitchens
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebGerard de Melo
 
Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Gill Hamilton
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesMarin Dimitrov
 

What's hot (20)

Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
SWT Lecture Session 8 - Rules
SWT Lecture Session 8 - RulesSWT Lecture Session 8 - Rules
SWT Lecture Session 8 - Rules
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data Arrives
 
An Introduction to RDF and the Web of Data
An Introduction to RDF and the Web of DataAn Introduction to RDF and the Web of Data
An Introduction to RDF and the Web of Data
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall Forum
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St Petersburg
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic Repositories
 

Similar to Web Data Management with RDF

Approximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of DataApproximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of DataKathrin Dentler
 
Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Jonathan Yu
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Visualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeVisualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeJonathan Yu
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Oscar Corcho
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesXiaogang (Marshall) Ma
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Open library data and embrace the world library linked data
Open library data and embrace the world library linked dataOpen library data and embrace the world library linked data
Open library data and embrace the world library linked data皓仁 柯
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Emanuele Della Valle
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
BHL-Europe_MINERVA_20111116_hrainer
BHL-Europe_MINERVA_20111116_hrainerBHL-Europe_MINERVA_20111116_hrainer
BHL-Europe_MINERVA_20111116_hrainerHeimo Rainer
 
TERN Facility Portals - Stuart Phinn
TERN Facility Portals - Stuart PhinnTERN Facility Portals - Stuart Phinn
TERN Facility Portals - Stuart PhinnTERN Australia
 
What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshotsdatacite
 

Similar to Web Data Management with RDF (20)

Approximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of DataApproximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of Data
 
Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Visualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeVisualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscape
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental Sciences
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Open library data and embrace the world library linked data
Open library data and embrace the world library linked dataOpen library data and embrace the world library linked data
Open library data and embrace the world library linked data
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010
 
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
BHL-Europe_MINERVA_20111116_hrainer
BHL-Europe_MINERVA_20111116_hrainerBHL-Europe_MINERVA_20111116_hrainer
BHL-Europe_MINERVA_20111116_hrainer
 
TERN Facility Portals - Stuart Phinn
TERN Facility Portals - Stuart PhinnTERN Facility Portals - Stuart Phinn
TERN Facility Portals - Stuart Phinn
 
What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshots
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Web Data Management with RDF

  • 1. Web Data Management in RDF Age M. Tamer Ozsu University of Waterloo David R. Cheriton School of Computer Science PKU/2014-08-28 1
  • 2. Acknowledgements This presentation draws upon collaborative research and discussions with the following colleagues (in alphabetical order) Gunes Aluc, University of Waterloo Khuzaima Daudjee, University of Waterloo Olaf Hartig, University of Waterloo Lei Chen, Hong Kong University of Science Technology Lei Zou, Peking University PKU/2014-08-28 2
  • 3. Web Data Management A long term research interest in the DB community 2000 2004 2011 2011 PKU/2014-08-28 3
  • 4. Interest Due to Properties of Web Data Lack of a schema Data is at best semi-structured Missing data, additional attributes, similar data but not identical Volatility Changes frequently May conform to one schema now, but not later Scale Does it make sense to talk about a schema for Web? How do you capture everything? Querying diculty What is the user language? What are the primitives? Arent search engines or metasearch engines sucient? PKU/2014-08-28 4
  • 5. More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization PKU/2014-08-28 8
  • 6. More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure list title=MOVIES film titleThe Shining/title directorStanley Kubrick/director actorJack Nicholson/actor /film film titleSpartacus/title directorStanley Kubrick/director /film film titleThe Passenger/title actorJack Nicholson/actor /film ... /list root
  • 7. lm title The Shining director Stanley Kubrick
  • 8. lm actor ... Jack Nicholson
  • 9. lm title The Passenger actor Jack Nicholson PKU/2014-08-28 8
  • 10. More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure RDF (Resource Description Framework) SPARQL W3C recommendation Simple, self-descriptive model Building block of semantic web Linked Open Data (LOD) PKU/2014-08-28 8
  • 11. RDF and Semantic Web RDF is a language for the conceptual modeling of information about resources (web resources in our context) A building block of semantic web Facilitates exchange of information Search engine results can be more focused and structured Facilitates data integration (mashes) Machine understandable Understand the information on the web and the interrelationships among them PKU/2014-08-28 9
  • 12. RDF Uses Yago and DBpedia extract facts from Wikipedia represent as RDF ! structural queries Communities build RDF data E.g., biologists: Bio2RDF and Uniprot RDF Web data integration Linked Open Data Cloud . . . PKU/2014-08-28 10
  • 13. RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year PKU/2014-08-28 11
  • 14. RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year As of March 2009 LinkedCT Reactome Taxonomy KEGG GeneID PubMed Pfam UniProt OMIM PDB BBC Later + TOTP riese Symbol ChEBI Daily Med Disea-some CAS HGNC Inter Pro Drug Bank UniParc UniRef ProDom PROSITE Gene Ontology Homolo Gene Pub Chem MGI UniSTS GEO Species Jamendo BBC Programm es Music-brainz Magna-tune Surge Radio MySpace Wrapper Audio- Scrobbler Linked MDB BBC John Peel BBC Playcount Data Gov- Track US Census Data Geo-names lingvoj World Fact-book Euro-stat IRIT Toulouse SW Conference Corpus RDF Book Mashup Project Guten-berg DBLP Hannover DBLP Berlin LAAS-CNRS Buda-pest BME IEEE IBM Resex Pisa New-castle RAE 2001 CiteSeer ACM DBLP RKB Explorer eprints LIBRIS Semantic Web.org Eurécom ECS South-ampton SIOC Revyu Sites Doap-space Flickr exporter FOAF profiles flickr wrappr Crunch Base Sem- Web- Central Open- Guides Wiki-company QDOS Pub Guide Open Calais RDF ohloh W3C WordNet Open Cyc UMBEL Yago DBpedia Freebase Virtuoso Sponger March '09: 89 datasets Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
  • 15. RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year New-castle User-generated content As of September 2010 Audio-scrobbler (DBTune) Music Brainz (zitgist) P20 YAGO Chronic-ling America World Fact-book (FUB) Geo Names Moseley Folk WordNet (VUA) WordNet (W3C) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UMBEL UK Post-codes legislation .gov.uk Uberblic UB Mann-heim TWC LOGD GTAA BBC Program mes Twarql transport data.gov .uk totl.net Tele-graphis TCM Gene DIT Taxon Concept The Open Library (Talis) t4gm Surge Radio RAMEAU SH STW statistics data.gov .uk St. Andrews Resource Lists ECS South-ampton EPrints Semantic Crunch Base semantic web.org Semantic XBRL SW Dog Food rdfabout US SEC Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Course-ware CORDIS CiteSeer Budapest ACM riese Revyu research data.gov .uk reference data.gov .uk Recht-spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup PSH lingvoj Product DB Poké-pédia PBAC Ord-nance Survey Openly Local The Open Library Open Cyc Open Calais OpenEI New York Times NTU Resource Lists NDL subjects MARC Codes List Man-chester Reading Lists Lotico The London Gazette LOIUS lobid Resources lobid Organi-sations Linked MDB Linked LCCN Linked GeoData Linked CT Linked Open Numbers LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Good-win Family Jamendo iServe NSZL Catalog GovTrack GESIS Geo Species Geo Linked Data (es) STITCH Project Guten-berg (FUB) SIDER Medi Care Euro-stat (FUB) Drug Bank Disea-some DBLP (FU Berlin) Daily Med Freebase flickr wrappr Fishes of Texas FanHubz Event- Media EUTC Produc-tions Eurostat EUNIS ESD stan-dards Popula-tion (En- AKTing) NHS (EnAKTing) Mortality (En- AKTing) Energy (En- AKTing) CO2 (En- AKTing) education data.gov .uk ECS South-ampton Gem. Norm-datei data dcs MySpace (DBTune) Music Brainz (DBTune) Magna-tune John Peel (DB Tune) classical (DB Tune) Last.fm Artists (DBTune) DB Tropes dbpedia lite DBpedia Pokedex Airports NASA (Data Incu-bator) Music Brainz (Data Incubator) Discogs (Data In-cubator) Climbing Linked Data for Intervals Cornetto Chem2 Bio2RDF biz. data. gov.uk UniRef UniSTS Uni Path-way Taxo-nomy UniParc UniProt SGD Reactome PubMed Pub Chem Pfam PDB PRO-SITE ProDom OMIM OBO MGI KEGG Reaction KEGG Drug KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Cpd InterPro Homolo Gene HGNC Gene Ontology GeneID Gen Bank ChEBI CAS Affy-metrix BibBase BBC Wildlife Finder BBC Music rdfabout US Census Media Geographic Publications Government Cross-domain Life sciences September '10: 203 datasets Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
  • 16. RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year RESEX IBM User-generated content As of September 2011 Audio Scrobbler (DBTune) Music Brainz (zitgist) P20 Turismo de Zaragoza yovisto Yahoo! Geo Planet World Fact-book Moseley Folk YAGO El Viajero Tourism BBC Program mes BBC Geo Names WordNet (VUA) WordNet (W3C) VIVO UF Calames VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists Source Code Ecosystem Linked Data UniProt PubMed UniRef UMBEL UK Post-codes legislation data.gov.uk Uberblic UB Mann-heim TWC LOGD Twarql transport data.gov. uk Traffic Scotland theses. fr Thesau-rus W totl.net Tele-graphis Semantic Tweet TCM Gene DIT Taxon Concept Open Library (Talis) tags2con delicious t4gm info Swedish Open Cultural Heritage Surge Radio Sudoc STW RAMEAU SH statistics data.gov. uk St. Andrews Resource Lists ECS South-ampton EPrints SSW Thesaur us Linked User Feedback gnoss Greek DBpedia Smart Link Slideshare 2RDF semantic web.org GovTrack Semantic XBRL SW Dog Food US SEC (rdfabout) Sears Scotland Pupils Exams Scotland Geo-graphy Scholaro-meter WordNet (RKB Explorer) Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RAE2001 Pisa OS OAI NSF New-castle LAAS KISTI JISC IRIT IEEE Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Crime Reports UK Course-ware CORDIS (RKB Explorer) CiteSeer Budapest ACM riese Revyu research data.gov. Ren. uk Energy Genera-tors reference data.gov. uk Recht-spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup Rådata nå! PSH Product Types Ontology Product DB PBAC Poké-pédia patents data.go v.uk Ox Points Ord-nance Survey Openly Local Open Library Open Cyc Open Corpo-rates Open Calais OpenEI Open Election Data Project Open Data Thesau-rus Ontos News Portal OGOLOD Ocean Drilling Codices Janus AMP New York Times NVD ntnusc NTU Resource Lists Norwe-gian MeSH NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico Weather Stations London Gazette LOIUS Linked Open Colors lobid Resources lobid Organi-sations LEM Linked MDB LinkedL CCN Linked GeoData LinkedCT LOV Linked Open Numbers LODE Eurostat (Ontology Central) Linked EDGAR (Ontology Central) Linked Crunch-base lingvoj Lichfield Spen-ding LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Klapp-stuhl-club Good-win Family National Radio-activity JP Jamendo (DBtune) Italian public schools ISTAT Immi-gration iServe IdRef Sudoc NSZL Catalog Hellenic PD Hellenic FBD Piedmont Accomo-dations GovWILD Google Art wrapper GESIS GeoWord Net Geo Species Geo Linked Data GEMET GTAA STITCH SIDER Project Guten-berg Medi Care Euro-stat (FUB) EURES Drug Bank Disea-some DBLP (FU Berlin) Daily Med CORDIS (FUB) Freebase flickr wrappr Fishes of Texas Finnish Munici-palities ChEMBL FanHubz Event Media EUTC Produc-tions Eurostat Europeana EUNIS EU Insti-tutions ESD stan-dards EARTh Enipedia Popula-tion (En- AKTing) NHS (En- AKTing) Mortality (En- AKTing) Energy (En- AKTing) Crime (En- AKTing) CO2 Emission (En- AKTing) EEA SISVU educatio n.data.g ov.uk ECS South-ampton ECCO-TCP GND Didactal ia DDC Deutsche Bio-graphie data dcs Music Brainz (DBTune) Magna-tune John Peel (DBTune) Classical (DB Tune) Last.FM artists (DBTune) DB Tropes Portu-guese DBpedia dbpedia lite DBpedia data-open-ac- uk SMC Journals Pokedex Airports NASA (Data Incu-bator) Music Brainz (Data Incubator) Metoffice Weather Forecasts Discogs (Data Incubator) Climbing data.gov.uk intervals Data Gov.ie data bnf.fr Cornetto reegle Chronic-ling America Chem2 Bio2RDF business data.gov. uk Bricklink Brazilian Poli-ticians BNB UniSTS UniPath way UniParc Taxono my UniProt (Bio2RDF) SGD Reactome Pub Chem PRO-SITE ProDom Pfam PDB OMIM MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Com-pound InterPro Homolo Gene HGNC Gene Ontology GeneID Affy-metrix bible ontology BibBase FTS BBC Wildlife Finder Music Alpine Ski Austria LOCAH Amster-dam Museum AGROV OC AEMET US Census (rdfabout) Media Geographic Publications Government Cross-domain Life sciences September '11: 295 datasets, 25B triples Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ PKU/2014-08-28 11
  • 17. RDF Data Volumes . . . . . . are growing { and fast Linked data cloud currently consists of 325 datasets with 25B triples Size almost doubling every year April '14: 1091 datasets, ??? triples Max Schmachtenberg, Christian Bizer, and Heiko Paulheim: Adoption of Linked Data Best Practices in Dierent Topical Domains. In Proc. ISWC, 2014. PKU/2014-08-28 11
  • 19. Globally Distributed Network of Data PKU/2014-08-28 13
  • 20. Three Approaches Data warehousing Consolidate data in a repository and query it SPARQL federation Leverage query services provided by data publishers Live Linked Data querying Navigate through LOD by looking up URIs at query execution time PKU/2014-08-28 14
  • 21. Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 15
  • 22. Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 16
  • 23. Traditional Hypertext-based Web Access IMDb World Book Data exposed to the Web via HTML PKU/2014-08-28 17
  • 24. Linked Data Publishing Principles (http://...linkedmdb.../Shining,releaseDate, 23 May 1980) (http://...linkedmdb.../Shining,
  • 25. lmLocation, http://cia.../UK) (http://...linkedmdb.../29704,actedIn, http://...linkedmdb.../Shining) IMDb World Book ... (http://cia.../UK, hasPopulation, 63230000) ... Shining UK Data model: RDF Global identi
  • 26. er: URI Access mechanism: HTTP Connection: data links PKU/2014-08-28 18
  • 27. RDF Introduction Everything is an uniquely named resource http://data.linkedmdb.org/resource/actor/JN29704 PKU/2014-08-28 19
  • 28. RDF Introduction Everything is an uniquely named resource Pre
  • 29. xes can be used to shorten the names xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 PKU/2014-08-28 19
  • 30. RDF Introduction Everything is an uniquely named resource Pre
  • 31. xes can be used to shorten the names Properties of resources can be de
  • 32. ned xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 PKU/2014-08-28 19
  • 33. RDF Introduction Everything is an uniquely named resource Pre
  • 34. xes can be used to shorten the names Properties of resources can be de
  • 35. ned Relationships with other resources can be de
  • 36. ned xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 JN29704:movieActor y:TS2014 y:TS2014:title The Shining y:TS2014:releaseDate 1980-05-23 PKU/2014-08-28 19
  • 37. RDF Introduction Everything is an uniquely named resource Pre
  • 38. xes can be used to shorten the names Properties of resources can be de
  • 39. ned Relationships with other resources can be de
  • 40. ned Resource descriptions can be contributed by dierent people/groups and can be located anywhere in the web Integrated web database xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName Jack Nicholson y:JN29704:BornOnDate 1937-04-22 JN29704:movieActor y:TS2014 y:TS2014:title The Shining y:TS2014:releaseDate 1980-05-23 PKU/2014-08-28 19
  • 41. RDF Data Model Triple: Subject, Predicate (Property), Object (s; p; o) Subject: the entity that is described (URI or blank node) Predicate: a feature of the entity (URI) Object: value of the feature (URI, blank node or literal) (s; p; o) 2 (U [ B) U (U [ B [ L) Set of RDF triples is called an RDF graph U Predicate Subject Object U B U B L U: set of URIs B: set of blank nodes L: set of literals Subject Predicate Object http://...imdb.../
  • 42. lm/2014 rdfs:label The Shining http://...imdb.../
  • 43. lm/2014 movie:releaseDate 1980-05-23 http://...imdb.../29704 movie:actor name Jack Nicholson : : : : : : : : : PKU/2014-08-28 20
  • 45. xes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/ bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/ lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/ Subject Predicate Object mdb:
  • 46. lm/2014 rdfs:label The Shining mdb:
  • 47. lm/2014 movie:initial release date 1980-05-23' mdb:
  • 50. lm/2014 movie:actor mdb: actor/30013 mdb:
  • 51. lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
  • 52. lm/2014 foaf:based near geo:2635167 mdb:
  • 54. lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 56. lm/2685 rdfs:label A Clockwork Orange mdb:
  • 58. lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 60. lm/1267 rdfs:label The Last Tycoon mdb:
  • 62. lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn URI Literal URI PKU/2014-08-28 21
  • 63. RDF Graph United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 64. lm/2014 movie:initial release date 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 66. lm/1267 mdb:director/8476 Stanley Kubrick movie:director name mdb:
  • 67. lm/2685 refs:label A Clockwork Orange mdb:
  • 68. lm/424 refs:label Spartacus mdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director PKU/2014-08-28 22
  • 69. Linked Data Model [Hartig, 2012] Web Document Given a countably in
  • 70. nite set D (documents), a Web of Linked Data is a tuple W = (D; adoc; data) where: I D D, I adoc is a partial mapping from URIs to D, and I data is a total mapping from D to
  • 71. nite sets of RDF triples. PKU/2014-08-28 23
  • 72. Linked Data Model [Hartig, 2012] Web Document Given a countably in
  • 73. nite set D (documents), a Web of Linked Data is a tuple W = (D; adoc; data) where: I D D, I adoc is a partial mapping from URIs to D, and I data is a total mapping from D to
  • 74. nite sets of RDF triples. Web of Linked Data A Web of Linked Data W = (D; adoc; data) contains a data link from document d 2 D to document d0 2 D if there exists a URI u such that: I u is mentioned in an RDF triple t 2 data(d), and I d0 = adoc(u). PKU/2014-08-28 23
  • 75. RDF Query Model { SPARQL Query Model - SPARQL Protocol and RDF Query Language Given U (set of URIs), L (set of literals), and V (set of variables), a SPARQL expression is de
  • 76. ned recursively: an atomic triple pattern, which is an element of (U [ V) (U [ V) (U [ V [ L) ?x rdfs:label The Shining P FILTER R, where P is a graph pattern expression and R is a built-in SPARQL condition (i.e., analogous to a SQL predicate) ?x rev:rating ?p FILTER(?p 3.0) P1 AND/OPT/UNION P2, where P1 and P2 are graph pattern expressions Example: SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g PKU/2014-08-28 24
  • 77. SPARQL Queries SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating PKU/2014-08-28 25
  • 78. Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 26
  • 79. Nave Triple Store Design SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g Subject Property Object mdb:
  • 80. lm/2014 rdfs:label The Shining mdb:
  • 81. lm/2014 movie:initial release date 1980-05-23 mdb:
  • 84. lm/2014 movie:actor mdb: actor/30013 mdb:
  • 85. lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
  • 86. lm/2014 foaf:based near geo:2635167 mdb:
  • 88. lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 90. lm/2685 rdfs:label A Clockwork Orange mdb:
  • 92. lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 94. lm/1267 rdfs:label The Last Tycoon mdb:
  • 96. lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn PKU/2014-08-28 27
  • 97. Nave Triple Store Design SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g Subject Property Object mdb:
  • 98. lm/2014 rdfs:label The Shining mdb:
  • 99. lm/2014 movie:initial release date 1980-05-23 mdb:
  • 102. lm/2014 movie:actor mdb: actor/30013 mdb:
  • 103. lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
  • 104. lm/2014 foaf:based near geo:2635167 mdb:
  • 106. lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 108. lm/2685 rdfs:label A Clockwork Orange mdb:
  • 110. lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 112. lm/1267 rdfs:label The Last Tycoon mdb:
  • 114. lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p= r d f s : l a b e l AND T2 . p=movie : r e l a t e dBo o k AND T3 . p=movie : d i r e c t o r AND T4 . p= r e v : r a t i n g AND T5 . p=movie : d i r e c t o r n ame AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o 4 . 0 AND T5 . o= S t a n l e y Kubr i ck PKU/2014-08-28 27
  • 115. Nave Triple Store Design SELECT ?name WHERE f ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : d i r e c t o r n ame S t a n l e y Kubr i ck . ?m movie : r e l a t e dBo o k ?b . ?b r e v : r a t i n g ? r . FILTER(? r 4 . 0 ) g Subject Property Object mdb:
  • 116. lm/2014 rdfs:label The Shining mdb:
  • 117. lm/2014 movie:initial release date 1980-05-23 mdb:
  • 120. lm/2014 movie:actor mdb: actor/30013 mdb:
  • 121. lm/2014 movie:music contributor mdb: music contributor/4110 mdb:
  • 122. lm/2014 foaf:based near geo:2635167 mdb:
  • 124. lm/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 126. lm/2685 rdfs:label A Clockwork Orange mdb:
  • 128. lm/424 rdfs:label Spartacus mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 130. lm/1267 rdfs:label The Last Tycoon mdb:
  • 132. lm/3418 rdfs:label The Passenger geo:2635167 gn:name United Kingdom geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOer bm:oers/0743424425amazonOer lexvo:iso639-3/eng rdfs:label English lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn Easy to implement but too many self-joins! SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p= r d f s : l a b e l AND T2 . p=movie : r e l a t e dBo o k AND T3 . p=movie : d i r e c t o r AND T4 . p= r e v : r a t i n g AND T5 . p=movie : d i r e c t o r n ame AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o 4 . 0 AND T5 . o= S t a n l e y Kubr i ck PKU/2014-08-28 27
  • 133. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Original triple table Subject Property Object mdb:
  • 134. lm/2014 rdfs:label The Shining mdb:
  • 135. lm/2014 movie:initial release date 1980-05-23 mdb:director/8476 movie:director name Stanley Kubrick mdb:
  • 136. lm/2685 movie:director mdb:director/8476 Encoded triple table Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 Mapping table ID Value 0 mdb:
  • 137. lm/2014 1 rdfs:label 2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 138. lm/2685 9 movie:director PKU/2014-08-28 28
  • 139. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Triples are indexed in a clustered B+ tree in lexicographic order Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... B+ tree Easy querying through mapping table PKU/2014-08-28 28
  • 140. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Triples are indexed in a clustered B+ tree in lexicographic order Create indexes for permutations of the three columns: SPO, SOP, PSO, POS, OPS, OSP Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... B+ tree Easy querying through mapping table PKU/2014-08-28 28
  • 141. Exhaustive Indexing{Query Execution Each triple pattern can be answered by a range query Joins between triple patterns computed using merge join Join order is easy due to extensive indexing Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb:
  • 142. lm/2014 1 rdfs:label 2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 143. lm/2685 9 movie:director PKU/2014-08-28 29
  • 144. Exhaustive Indexing{Query Execution Each triple pattern can be answered by a range query Joins between triple patterns computed using merge join Join order is easy due to extensive indexing Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb:
  • 145. lm/2014 1 rdfs:label 2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 146. lm/2685 9 movie:director Advantages I Eliminates some of the joins { they become range queries I Merge join is easy and fast PKU/2014-08-28 29
  • 147. Exhaustive Indexing{Query Execution Each triple pattern can be answered by a range query Joins between triple patterns computed using merge join Join order is easy due to extensive indexing Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb:
  • 148. lm/2014 1 rdfs:label 2 The Shining 3 movie:initial release date 4 1980-05-23 5 mdb:director/8476 6 movie:director name 7 Stanley Kubrick 8 mdb:
  • 149. lm/2685 9 movie:director Advantages I Eliminates some of the joins { they become range queries I Merge join is easy and fast Disadvantages I Space usage PKU/2014-08-28 29
  • 150. Property Tables Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:
  • 151. lm/2014 rdfs:label The Shining mdb:
  • 154. lm/2685 rdfs:label A Clockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : Subject refs:label movie:director mob:
  • 155. lm/2014 The Shining mob:director/8476 mob:
  • 156. lm/2685 The Clockwork Orange mob:director/8476 Subject movie:actor name mdb:actor Jack Nicholson PKU/2014-08-28 30
  • 157. Property Tables Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:
  • 158. lm/2014 rdfs:label The Shining mdb:
  • 161. lm/2685 rdfs:label A Clockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : Subject refs:label movie:director mob:
  • 162. lm/2014 The Shining mob:director/8476 mob:
  • 163. lm/2685 The Clockwork Orange mob:director/8476 Subject movie:actor name mdb:actor Jack Nicholson Advantages I Fewer joins I If the data is structured, we have a relational system { similar to normalized relations PKU/2014-08-28 30
  • 164. Property Tables Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:
  • 165. lm/2014 rdfs:label The Shining mdb:
  • 168. lm/2685 rdfs:label A Clockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : Subject refs:label movie:director mob:
  • 169. lm/2014 The Shining mob:director/8476 mob:
  • 170. lm/2685 The Clockwork Orange mob:director/8476 Subject movie:actor name mdb:actor Jack Nicholson Advantages I Fewer joins I If the data is structured, we have a relational system { similar to normalized relations Disadvantages I Potentially a lot of NULLs I Clustering is not trivial I Multi-valued properties are complicated PKU/2014-08-28 30
  • 171. Binary Tables Grouping by properties: For each property, build a two-column table, containing both subject and object, ordered by subjects [Abadi et al., 2007, 2009] Also called vertical partitioned tables n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:
  • 172. lm/2014 rdfs:label The Shining mdb:
  • 175. lm/2685 rdfs:label A Clockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : movie:director Subject Object mdb:
  • 177. lm/2685 mdb:director/8476 refs:label Subject Object mob:
  • 179. lm/2685 The Clockwork Orange movie:actor name Subject Object mdb:actor/29704 Jack Nicholson PKU/2014-08-28 31
  • 180. Binary Tables Grouping by properties: For each property, build a two-column table, containing both subject and object, ordered by subjects [Abadi et al., 2007, 2009] Also called vertical partitioned tables n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:
  • 181. lm/2014 rdfs:label The Shining mdb:
  • 184. lm/2685 rdfs:label A Clockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : movie:director Subject Object mdb:
  • 186. lm/2685 mdb:director/8476 refs:label Subject Object mob:
  • 188. lm/2685 The Clockwork Orange movie:actor name Subject Object mdb:actor/29704 Jack Nicholson Advantages I Supports multi-valued properties I No NULLs I No clustering I Read only needed attributes (i.e. less I/O) I Good performance for subject-subject joins PKU/2014-08-28 31
  • 189. Binary Tables Grouping by properties: For each property, build a two-column table, containing both subject and object, ordered by subjects [Abadi et al., 2007, 2009] Also called vertical partitioned tables n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:
  • 190. lm/2014 rdfs:label The Shining mdb:
  • 193. lm/2685 rdfs:label A Clockwork Orange mdb:actor/29704 movie:actor name Jack Nicholson : : : : : : : : : movie:director Subject Object mdb:
  • 195. lm/2685 mdb:director/8476 refs:label Subject Object mob:
  • 197. lm/2685 The Clockwork Orange movie:actor name Subject Object mdb:actor/29704 Jack Nicholson Advantages I Supports multi-valued properties I No NULLs I No clustering I Read only needed attributes (i.e. less I/O) I Good performance for subject-subject joins Disadvantages I Not useful for subject-object joins I Expensive inserts PKU/2014-08-28 31
  • 198. Graph-based Approach Answering SPARQL query subgraph matching gStore [Zou et al., 2011, 2014] FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating Subgraph Matching United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 199. lm/2014 movie:initial release date 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 201. lm/1267 mdb:director/8476 Stanley Kubrick movie:director name mdb:
  • 202. lm/2685 refs:label A Clockwork Orange mdb:
  • 203. lm/424 refs:label Spartacus mdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director PKU/2014-08-28 32
  • 204. Graph-based Approach Answering SPARQL query subgraph matching gStore [Zou et al., 2011, 2014] FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating Subgraph Matching United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 205. lm/2014 movie:initial release date 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 207. lm/1267 mdb:director/8476 Stanley Kubrick movie:director name mdb:
  • 208. lm/2685 refs:label A Clockwork Orange mdb:
  • 209. lm/424 refs:label Spartacus mdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director Advantages I Maintains the graph structure I Full set of queries can be handled PKU/2014-08-28 32
  • 210. Graph-based Approach Answering SPARQL query subgraph matching gStore [Zou et al., 2011, 2014] FILTER(?r 4.0) ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook Stanley Kubrick movie:director name ?r rev:rating Subgraph Matching United Kingdom gn:name The Passenger refs:label 62348447 gn:population mdb:
  • 211. lm/2014 movie:initial release date 1980-05-23 bm:oers/0743424425amazonOer The Shining refs:label bm:books/0743424425 4.7 rev:rating geo:2635167 The Last Tycoon refs:label movie:actor movie:actor mdb:actor/29704 movie:actor name Jack Nicholson mdb:
  • 213. lm/1267 mdb:director/8476 Stanley Kubrick movie:director name mdb:
  • 214. lm/2685 refs:label A Clockwork Orange mdb:
  • 215. lm/424 refs:label Spartacus mdb:actor/30013 movie:relatedBook scam:hasOer foaf:based near movie:actor movie:director movie:actor movie:director movie:director Advantages I Maintains the graph structure I Full set of queries can be handled Disadvantages I Graph pattern matching is expensive PKU/2014-08-28 32
  • 216. gStore General Approach: Work directly on the RDF graph and the SPARQL query graph Use a signature-based encoding of each entity and class vertex to speed up matching Filter-and-evaluate Use a false positive algorithm to prune nodes and obtain a set of candidates; then do more detailed evaluation on those Use an index (VS-tree) over the data signature graph (has light maintenance load) for ecient pruning PKU/2014-08-28 33
  • 217. 1. Encode Q and G to Get Signature Graphs Query signature graph Q 00010 0100 0000 1000 0000 0000 0100 10000 Data signature graph G 0010 1000 0100 0001 00001 1000 0001 00010 0000 0100 10000 0000 1000 10000 0000 0010 10000 0000 1001 00100 1001 1000 01000 0001 0001 01000 0100 1000 01000 0001 0100 01000 PKU/2014-08-28 34
  • 218. 2. Filter-and-Evaluate Query signature graph Q 00010 0100 0000 1000 0000 0000 0100 10000 Data signature graph G 0010 1000 0100 0001 00001 1000 0001 00010 0000 0100 10000 0000 1000 10000 0000 0010 10000 0000 1001 00100 1001 1000 01000 0001 0001 01000 0100 1000 01000 0001 0100 01000 Find matches of Q over signature graph G Verify each match in RDF graph G PKU/2014-08-28 35
  • 219. How to Generate Candidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list PKU/2014-08-28 36
  • 220. How to Generate Candidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: PKU/2014-08-28 36
  • 221. How to Generate Candidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: Sequential scan of G Both steps are inecient PKU/2014-08-28 36
  • 222. How to Generate Candidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: Sequential scan of G Both steps are inecient Use S-trees Height-balanced tree over signatures Run an inclusion query for each node of Q and get lists of nodes in G that include that node. Given query signature q and a set of data signatures S,
  • 223. nd all data signatures si 2 S where qsi = q Does not support second step { expensive PKU/2014-08-28 36
  • 224. How to Generate Candidate List Two step process: 1. For each node of Q get lists of nodes in G that include that node. 2. Do a multi-way join to get the candidate list Alternatives: Sequential scan of G Both steps are inecient Use S-trees Height-balanced tree over signatures Run an inclusion query for each node of Q and get lists of nodes in G that include that node. Given query signature q and a set of data signatures S,
  • 225. nd all data signatures si 2 S where qsi = q Does not support second step { expensive VS-tree (and VS-tree) Multi-resolution summary graph based on S-tree Supports both steps eciently Grouping by vertices PKU/2014-08-28 36
  • 226. S-tree Solution 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 37
  • 227. S-tree Solution 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 37
  • 228. S-tree Solution 10000 002 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 PKU/2014-08-28 37
  • 229. S-tree Solution 10000 002 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 PKU/2014-08-28 37
  • 230. S-tree Solution 10000 002 1111 1111 00010 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 004 009 PKU/2014-08-28 37
  • 231. S-tree Solution 10000 002 1111 1111 00010 004 009 on on 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 PKU/2014-08-28 37
  • 232. S-tree Solution 10000 002 1111 1111 00010 004 009 on on 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 0100 1000 0000 1001 1001 1000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 0100 0000 1000 0000 0000 0100 011 003 008 Possibly large join space! PKU/2014-08-28 37
  • 233. VS-tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 Super edge PKU/2014-08-28 38
  • 234. Pruning with VS-Tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 39
  • 235. Pruning with VS-Tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 PKU/2014-08-28 39
  • 236. Pruning with VS-Tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 PKU/2014-08-28 39
  • 237. Pruning with VS-Tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 003 008 002 011 004 009 on on PKU/2014-08-28 39
  • 238. Pruning with VS-Tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 003 008 002 011 004 009 on on PKU/2014-08-28 39
  • 239. Pruning with VS-Tree 1111 1111 10001 01100 0110 1111 1101 1101 10000 0000 1110 0110 1001 1100 1001 1001 1101 G1 G2 0000 1000 001 0010 1000 0000 0100 0000 0010 1000 0001 0100 0001 008 01000 0100 1000 0000 1001 1001 1000 01000 0001 0001 0001 0100 005 004 006 002 003 007 011 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 11101 10010 10000 00001 01100 00010 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 00010 0100 0000 1000 0000 0000 0100 10000 d3 2 d3 3 00010 10000 d3 3 d3 4 d3 1 d3 4 G3 01000 003 008 002 011 004 009 on on Reduced join space! PKU/2014-08-28 39
  • 240. Adaptivity to Workload Web applications that are supported by RDF data management systems are far more varied than conventional relational applications Data that are being handled are far more heterogeneous SPARQL is far more exible in how triple patterns (i.e., the atomic query unit) can be combined An experiment [Aluc et al., 2014] RDF-3X VOS (6.1) VOS (7.1) MonetDB 4Store % queries for which tested system is fastest 20.9 0.0 22.6 56.5 0.0 Total workload exe- cution time (hours) 27.1 20.9 20.8 38.6 72.2 Mean (per query) execution time (sec- onds) 7.8 6.0 6.0 11.1 20.7 PKU/2014-08-28 40
  • 241. Adaptivity to Workload Web applications that are supported by RDF data management systems are far more varied than conventional relational applications Data that are being handled are far more heterogeneous SPARQL is far more exible in how triple patterns (i.e., the atomic query unit) can be combined An experiment [Aluc et al., 2014] Summary of Experiments RDF-3X VOS (6.1) VOS (7.1) MonetDB 4Store % I queries No single for which system is a sole 20.9 winner across 0.0 all queries 22.6 56.5 0.0 tested I system is No single system is the sole loser across all queries, either fastest I There can be 2{5 orders of magnitude dierence in the performance (i.e., query Total workload exe- cution time (hours) 27.1 20.9 20.8 38.6 72.2 execution time) between the best and the worst system for a given query I The winner in one query may timeout in another I Performance dierence widens as dataset size increases Mean (per query) execution time (sec- onds) 7.8 6.0 6.0 11.1 20.7 PKU/2014-08-28 40
  • 242. Group-by-Query Approach hasPost Tamer Post23571 Olaf taggedIn UWaterloo worksAt hasPost taggedIn hasPost Tamer Post23 Bob likes UWaterloo worksAt Post2 hasPost favourites hasPost Tamer Post23 Bob taggedIn UWaterloo worksAt Post2 PKU/2014-08-28 41
  • 244. xed size, (b) contain same set of attributes 1. Workload time analysis Type-A, robust Type-C, robust Type-A, adaptable Type-B, adaptable Type-B, adaptable Type-B, adaptable Type-B, adaptable Type-C, adaptable PKU/2014-08-28 42
  • 246. xed size, (b) contain same set of attributes 1. Workload time analysis 2. Updating the physical layout Cache Storage System Hash Function evict @t1 function adapts Hash Function @tk PKU/2014-08-28 42
  • 248. xed size, (b) contain same set of attributes 1. Workload time analysis 2. Updating the physical layout 3. Partial indexing Index { { { { { { { { { { Cache Storage System Hash Function evict @t1 function adapts Hash Function @tk SPARQL Query Engine PKU/2014-08-28 42
  • 249. chameleon-db Prototype system [Aluc et al., 2013] 35,000 lines of code in C++ and growing Structural Index ... Vertex Index Spill Index Storage System Cluster Index Storage Advisor Query Engine Plan Generation Evaluation PKU/2014-08-28 43
  • 250. Some Open Problems Scalability of the solutions to very large datasets Maintenance of auxiliary data structures in dynamic environments Adaptive systems to handle varying and time-changing workloads Uncertain RDF data processing Keyword search over RDF data Query processing over incomplete RDF data PKU/2014-08-28 44
  • 251. Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 45
  • 252. Remember the Environment Distributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries PKU/2014-08-28 46
  • 253. Remember the Environment Distributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives PKU/2014-08-28 46
  • 254. Remember the Environment Distributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives Data re-distribution + query decomposition PKU/2014-08-28 46
  • 255. Remember the Environment Distributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives Data re-distribution + query decomposition SPARQL federation: just process at SPARQL endpoints PKU/2014-08-28 46
  • 256. Remember the Environment Distributed environment Some of the data sites can process SPARQL queries { SPARQL endpoints Not all data sites can process queries Alternatives Data re-distribution + query decomposition SPARQL federation: just process at SPARQL endpoints Live querying (see next section) PKU/2014-08-28 46
  • 257. Distributed RDF Processing [Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site PKU/2014-08-28 47
  • 258. Distributed RDF Processing [Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) PKU/2014-08-28 47
  • 259. Distributed RDF Processing [Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) SPARQL query decomposed Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg over fD1; : : : ;Dng PKU/2014-08-28 47
  • 260. Distributed RDF Processing [Kaoudi and Manolescu, 2014] Data partitioning approaches RDF data warehouse is partitioned and distributed RDF data D = fD1; : : : ;Dng Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) SPARQL query decomposed Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg over fD1; : : : ;Dng I High performance I Great for parallelizing centralized RDF data I May not be possible to re-partition and re-allocate Web data (i.e., LOD) PKU/2014-08-28 47
  • 261. Distributed RDF Processing { 2 Data summary-based approaches Build summaries (index) for the distributed RDF datasets (e.g., [Atre et al., 2010; Prasser et al., 2012]) PKU/2014-08-28 48
  • 262. Distributed RDF Processing { 2 Data summary-based approaches Build summaries (index) for the distributed RDF datasets (e.g., [Atre et al., 2010; Prasser et al., 2012]) SPARQL query Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg using the data summary PKU/2014-08-28 48
  • 263. Distributed RDF Processing { 2 Data summary-based approaches Build summaries (index) for the distributed RDF datasets (e.g., [Atre et al., 2010; Prasser et al., 2012]) SPARQL query Q = fQ1; : : : ;Qkg Distributed execution of fQ1; : : : ;Qkg using the data summary I No data re-partitioning and re-allocation I Have to scan the data at each site I Index over distributed data with maintenance concerns PKU/2014-08-28 48
  • 264. SPARQL Endpoint Federation Consider only the SPARQL endpoints for query execution No data re-partitioning/re-distribution Consider D = D1 [ D2 [ : : : [ Dn; Di : SPARQL endpoint PKU/2014-08-28 49
  • 265. SPARQL Endpoint Federation Consider only the SPARQL endpoints for query execution No data re-partitioning/re-distribution Consider D = D1 [ D2 [ : : : [ Dn; Di : SPARQL endpoint Alternatives SPARQL query decomposed Q = fQ1; : : : ;Qkg and executed over fD1; : : : ;Dng { DARQ, FedX [Schwarte et al., 2011], SPLENDID [Gorlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Partial query evaluation { Distributed gStore [Peng et al., 2014] PKU/2014-08-28 49
  • 266. SPARQL Endpoint Federation Consider only the SPARQL endpoints for query execution No data re-partitioning/re-distribution Consider D = D1 [ D2 [ : : : [ Dn; Di : SPARQL endpoint Alternatives SPARQL query decomposed Q = fQ1; : : : ;Qkg and executed over fD1; : : : ;Dng { DARQ, FedX [Schwarte et al., 2011], SPLENDID [Gorlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Partial query evaluation { Distributed gStore [Peng et al., 2014] Partial evaluation I Given function f (s; d) and part of its input s, perform f 's computation that only depends on s to get f 0(d) I Compute f 0(d) when d becomes available I Applied to, e.g., XML [Buneman et al., 2006] PKU/2014-08-28 49
  • 267. Distributed SPARQL Using Partial Query Evaluation Two steps: 1. Evaluate a query at each site to
  • 268. nd local matches Query is the function and each Di is the known input Inner match or local partial match D1 D2 D3 D4 PKU/2014-08-28 50
  • 269. Distributed SPARQL Using Partial Query Evaluation Two steps: 1. Evaluate a query at each site to
  • 270. nd local matches Query is the function and each Di is the known input Inner match or local partial match 2. Assemble the partial matches to get
  • 271. nal result Crossing match Centralized assembly Distributed assembly D1 D2 D3 D4 Crossing match PKU/2014-08-28 50
  • 272. Some Open Problems Handling data at non-SPARQL endpoint sites Modi
  • 273. cation to SPARQL endpoints (for partial query evaluation) Heterogeneous use of vocabularies (use of ontologies) PKU/2014-08-28 51
  • 274. Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 52
  • 275. Live Query Processing Not all data resides at SPARQL endpoints Freshness of access to data important Potentially countably in
  • 276. nite data sources Live querying On-line execution Only rely on linked data principles Alternatives Traversal-based approaches Index-based approaches Hybrid approaches PKU/2014-08-28 53
  • 277. SPARQL Query Semantics in Live Querying Full-web semantics Scope of evaluating a SPARQL expression is all Linked Data Query result completeness cannot be guaranteed by any (terminating) execution PKU/2014-08-28 54
  • 278. SPARQL Query Semantics in Live Querying Full-web semantics Scope of evaluating a SPARQL expression is all Linked Data Query result completeness cannot be guaranteed by any (terminating) execution Reachability-based query semantics Query consists of a SPARQL expression, a set of seed URIs S, and a reachability condition c Scope: all data along paths of data links that satisfy the condition Computationally feasible PKU/2014-08-28 54
  • 279. Traversal Approaches Discover relevant URIs recursively by traversing (speci
  • 280. c) data links at query execution runtime [Hartig, 2013; Ladwig and Tran, 2011] Implements reachability-based query semantics Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result PKU/2014-08-28 55
  • 281. Traversal Approaches Discover relevant URIs recursively by traversing (speci
  • 282. c) data links at query execution runtime [Hartig, 2013; Ladwig and Tran, 2011] Implements reachability-based query semantics Advantages Easy to implement. No data structure to maintain. Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result PKU/2014-08-28 55
  • 283. Traversal Approaches Discover relevant URIs recursively by traversing (speci
  • 284. c) data links at query execution runtime [Hartig, 2013; Ladwig and Tran, 2011] Implements reachability-based query semantics Advantages Easy to implement. No data structure to maintain. Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result Disadvantages Possibilities for parallelized data retrieval are limited Repeated data retrieval introduces signi
  • 285. cant query latency. PKU/2014-08-28 55
  • 286. Traversal Optimization Dynamic query execution [Hartig and Ozsu, 2014] Data Retrieval ...lookup queue... Output PKU/2014-08-28 56
  • 287. Traversal Optimization Dynamic query execution [Hartig and Ozsu, 2014] Prioritization of URIs { a number of alternatives Non-adaptive Adaptive, Local processing aware Adaptive, Local processing agnostic Intermediate solution driven Solution-aware graph-based Hybrid graph-based Purely graph-based PKU/2014-08-28 56
  • 288. Index Approaches Use pre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Dierent index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Key: tp Entry: furi1; uri2; ; uring GET urii PKU/2014-08-28 57
  • 289. Index Approaches Use pre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Dierent index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Advantages Data retrieval can be fully parallelized Reduces the impact of data retrieval on query execution time Key: tp Entry: furi1; uri2; ; uring GET urii PKU/2014-08-28 57
  • 290. Index Approaches Use pre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Dierent index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Advantages Data retrieval can be fully parallelized Reduces the impact of data retrieval on query execution time Key: tp Entry: furi1; uri2; ; uring Disadvantages Querying can only start after index GET construction urii Depends on what has been selected for the index Freshness may be an issue Index maintenance PKU/2014-08-28 57
  • 291. Hybrid Approach Perform a traversal-based execution using a prioritized list of URIs to look up [Ladwig and Tran, 2010] Initial seed from the pre-populated index Non-seed URIs are ranked by a function based on information in the index New discovered URIs that are not in the index are ranked according to number of referring documents PKU/2014-08-28 58
  • 292. Some Open Problems Optimize queries by using statistics collected during earlier query executions Heterogeneous use of vocabularies (use of ontologies) Combine SPARQL federation to leverage SPARQL endpoint functionality PKU/2014-08-28 59
  • 293. Outline 1 LOD and RDF Introduction 2 Data Warehousing Approach Relational Approaches Graph-Based Approaches 3 SPARQL Federation Approach Distributed RDF Processing SPARQL Endpoint Federation 4 Live Querying Approach Traversal-based approaches Index-based approaches Hybrid approaches 5 Conclusions PKU/2014-08-28 60
  • 294. Conclusions RDF and Linked Object Data seem to have considerable promise for Web data management 2014 2011 PKU/2014-08-28 61
  • 295. Conclusions RDF and Linked Object Data seem to have considerable promise for Web data management More work needs to be done Query semantics Adaptive system design Optimizations { both in data warehousing and distributed environments Live querying requires signi
  • 296. cant thought to reduce latency 2014 2011 PKU/2014-08-28 61
  • 297. Conclusions What I did not talk about: Not much on general distributed/parallel processing Not much on SPARQL semantics Nothing about RDFS { no schema stu Nothing about entailment regimes 0 ) no reasoning PKU/2014-08-28 62
  • 298. Thank you! Research supported by PKU/2014-08-28 63
  • 299. References I Abadi, D. J., Marcus, A., Madden, S., and Hollenbach, K. (2009). SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J., 18(2):385{406. Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2007). Scalable semantic web data management using vertical partitioning. In Proc. 33rd Int. Conf. on Very Large Data Bases, pages 411{422. Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. (1997). The Lorel query language for semistructured data. Int. J. Digit. Libr., 1(1):68{88. Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., and Ruckhaus, E. (2011). ANAPSID: an adaptive query processing engine for SPARQL endpoints. In Proc. 10th Int. Semantic Web Conf., pages 18{34. Aluc, G., Hartig, O., Ozsu, M. T., and Daudjee, K. (2014). Diversi
  • 300. ed stress testing of RDF data management systems. In Proc. 13th Int. Semantic Web Conf. Forthcoming. Aluc, G., Ozsu, M. T., Daudjee, K., and Hartig, O. (2013). chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10, University of Waterloo. PKU/2014-08-28 64
  • 301. References II Arocena, G. and Mendelzon, A. (1998). Weboql: Restructuring documents, databases and webs. In Proc. 14th Int. Conf. on Data Engineering, pages 24{33. Atre, M., Chaoji, V., Zaki, M. J., and Hendler, J. A. (2010). Matrix bit loaded: A scalable lightweight join query processor for rdf data. In Proc. 19th Int. World Wide Web Conf., pages 41{50. Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., and Bhattacharjee, B. (2013). Building an ecient RDF store over a relational database. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 121{132. Buneman, P., Cong, G., Fan, W., and Kementsietsidis, A. (2006). Using partial evaluation in distributed query evaluation. In Proc. 32nd Int. Conf. on Very Large Data Bases, pages 211{222. Buneman, P., Davidson, S., Hillebrand, G. G., and Suciu, D. (1996). A query language and optimization techniques for unstructured data. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 505{516. Fernandez, M., Florescu, D., and Levy, A. (1997). A query language for a web-site management system. ACM SIGMOD Rec., 26(3):4{11. PKU/2014-08-28 65
  • 302. References III Gorlitz, O. and Staab, S. (2011). SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proc. 2nd Int. Workshop on Consuming Linked Data. Gurajada, S., Seufert, S., Miliaraki, I., and Theobald, M. (2014). TriAD: A distributed shared-nothing RDF engine based on asynchronous message passing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 289{300. Hartig, O. (2012). SPARQL for a web of linked data: Semantics and computability. In Proc. 9th Extended Semantic Web Conf., pages 8{23. Hartig, O. (2013). SQUIN: a traversal based query execution system for the web of linked data. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1081{1084. Hartig, O. and Ozsu, M. T. (2014). Optimizing response time of traversal-based query optimization. In preparation. Huang, J., Abadi, D. J., and Ren, K. (2011). Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment, 4(11):1123{1134. PKU/2014-08-28 66
  • 303. References IV Husain, M. F., McGlothlin, J., Masud, M. M., Khan, L. R., and Thuraisingham, B. (2011). Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. and Data Eng., 23(9):1312{1327. Kaoudi, Z. and Manolescu, I. (2014). RDF in the clouds: A survey. VLDB J. Forthcoming. Konopnicki, D. and Shmueli, O. (1995). W3QS: A query system for the World Wide Web. In Proc. 21th Int. Conf. on Very Large Data Bases, pages 54{65. Ladwig, G. and Tran, T. (2010). Linked data query processing strategies. In Proc. 9th Int. Semantic Web Conf., pages 453{469. Ladwig, G. and Tran, T. (2011). SIHJoin: Querying remote and local linked data. In Proc. 8th Extended Semantic Web Conf., pages 139{153. Lakshmanan, L. V. S., Sadri, F., and Subramanian, I. N. (1996). A declarative language for querying and restructuring the Web. In Proc. 6th Int. Workshop on Research Issues on Data Eng., pages 12{21. Lee, K. and Liu, L. (2013). Scaling queries over big rdf graphs with semantic hash partitioning. Proc. VLDB Endowment, 6(14):1894{1905. PKU/2014-08-28 67
  • 304. References V Mendelzon, A. O., Mihaila, G. A., and Milo, T. (1997). Querying the World Wide Web. Int. J. Digit. Libr., 1(1):54{67. Neumann, T. and Weikum, G. (2008). RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endowment, 1(1):647{659. Neumann, T. and Weikum, G. (2009). The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1):91{113. Papakonstantinou, Y., Garcia-Molina, H., and Widom, J. (1995). Object exchange across heterogeneous information sources. In Proc. 11th Int. Conf. on Data Engineering, pages 251{260. Peng, P., Zou, L., Ozsu, M. T., Chen, L., and Zhao, D. (2014). Processing SPARQL queries over linked data { a distributed graph-based approach. In submitted for publication. Prasser, F., Kemper, A., and Kuhn, K. A. (2012). Ecient distributed query processing for autonomous rdf databases. In Proc. 15th Int. Conf. on Extending Database Technology, pages 372{383. Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011). Fedx: A federation layer for distributed query processing on linked open data. In Proc. 8th Extended Semantic Web Conf., pages 481{486. PKU/2014-08-28 68
  • 305. References VI Umbrich, J., Hose, K., Karnstedt, M., Harth, A., and Polleres, A. (2011). Comparing data summaries for processing live queries over linked data. World Wide Web J., 14(5-6):495{544. Weiss, C., Karras, P., and Bernstein, A. (2008). Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment, 1(1):1008{1019. Wilkinson, K. (2006). Jena property table implementation. Technical Report HPL-2006-140, HP Laboratories Palo Alto. Zhang, X., Chen, L., Tong, Y., and Wang, M. (2013). EAGRE: Towards scalable I/O ecient SPARQL query evaluation on the cloud. In Proc. 29th Int. Conf. on Data Engineering, pages 565{576. Zou, L., Mo, J., Chen, L., Ozsu, M. T., and Zhao, D. (2011). gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment, 4(8):482{493. Zou, L., Ozsu, M. T., Chen, L., Shen, X., Huang, R., and Zhao, D. (2014). gStore: A graph-based SPARQL query engine. VLDB J., 23(4):565{590. PKU/2014-08-28 69