My Linked Data tutorial presentation that I presented at Semtech 2012.
http://semtechbizsf2012.semanticweb.com/sessionPop.cfm?confid=65&proposalid=4724
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Linked Data tutorial at Semtech 2012
1. June 4, 2012
Linked Data
Juan F. Sequeda – Daniel P. Miranker
Capsenta
Semantic Tech & Business Conference 2012
www.capsenta.com 1
2. Outline
Part 1: Introduction to Linked Data
Part 2: Linked Data Principles
Part 3: Linked Data Architectures
Part 4: Linked Enterprise Data
www.capsenta.com June 4, 2012 2
3. Part 1:
Introduction to
Linked Data
www.capsenta.com June 4, 2012 3
4. The Web is a Data Shredder
Structured Unstructured
Data Data
Thanks Martin Hepp
www.capsenta.com June 4, 2012 4
5. The Web of Documents
Search
Search
Engine
Crawler
www.capsenta.com June 4, 2012 5
6. What would we like?
Make it easy for computers/software to find
THINGS
Do you SEARCH or do you
FIND?
www.capsenta.com June 4, 2012 6
7. Search for
Football Players who went to the University
of Texas at Austin, played for the Dallas
Cowboys as Cornerback
www.capsenta.com June 4, 2012 7
14. Guess how I FOUND out?
www.capsenta.com June 4, 2012 14
15. On a Semantic Web
Besides publishing documents on the web
which computers can’t understand easily
Let’s publish on the web something that
computers can understand
DATA
www.capsenta.com June 4, 2012 15
16. The Semantic Web is a
web of data
The current web is a
web of documents
www.capsenta.com June 4, 2012 16
17. But wait… doesn’t the
web already have data?
www.capsenta.com June 4, 2012 17
18. Current Data on the Web
Relational Databases
APIs
XML
CSV
XLS
…
Can’t computers and applications already
consume that data on the web?
www.capsenta.com June 4, 2012 18
19. Yes! But it is all in different
formats and data
models!
www.capsenta.com June 4, 2012 19
20. This makes it hard to
integrate data
www.capsenta.com June 4, 2012 20
21. The data in different
data sources aren’t linked
www.capsenta.com June 4, 2012 21
22. For example, how do I
state that the Juan
Sequeda in Facebook is
the same as Juan
Sequeda in Twitter
www.capsenta.com June 4, 2012 22
23. Or if I create a mashup
from different services, I
have to learn different
APIs and I get different
formats of data back
www.capsenta.com June 4, 2012 23
25. Wouldn’t it be great if we
had a standard way of
publishing data on the
Web?
www.capsenta.com June 4, 2012 25
26. We have a standardized
way of publishing
documents on the
web, right?
HTML
www.capsenta.com June 4, 2012 26
27. Then why can’t we have
a standard way of
publishing data on the
Web?
www.capsenta.com June 4, 2012 27
28. Good question! And the
answer is YES. There is!
RDF
www.capsenta.com June 4, 2012 28
29. Resource Description Framework
(RDF)
Data Model = a way to model data
i.e. Relational databases use relational data model
RDF is a graph data model
www.capsenta.com June 4, 2012 29
41. Databases back up documents
THINGS have PROPERTIES:
A Book as a Title, an author, …
Isbn Title Author PublisherID ReleasedData
978-0-596- Programming Toby 1 July 2009
15381-6 the Semantic Segaran
Web
… … … … …
This is a THING: PublisherID PublisherName
A book title “Programming the 1 O’Reilly Media
Semantic Web” by Toby Segaran, …
… …
www.capsenta.com June 4, 2012 41
42. Lets represent the data in RDF
Isbn Title Author PublisherID ReleasedData
978-0- Programming Toby 1 July 2009
596- the Semantic Segaran
15381- Web
6 Programming
title the Semantic
PublisherID PublisherName
Web
1 O’Reilly Media
author Toby
book
Segaran
isbn
978-0-596-15381-6
publisher
name
Publisher O’Reilly
www.capsenta.com June 4, 2012 42
43. Remember that we are
on the web
Everything on the web is identified by
a URI
www.capsenta.com June 4, 2012 43
44. And now let’s link the data to other
data
Programming
title the Semantic
Web
http://
…/isbn9 author Toby
78 Segaran
isbn
978-0-596-15381-6
publisher
http://…/
name
publisher O’Reilly
1
www.capsenta.com June 4, 2012 44
45. And now consider the data from
Revyu.com
http:// hasReview http://
…/revie …/isbn9
w1 78
description
reviewer
Awesome
Book
http:// name
…/revie
wer
Juan
Sequeda
www.capsenta.com June 4, 2012 45
46. Let’s start to link data
http:// hasReview http://
…/revie …/isbn9
78 Programming
w1 the Semantic
description title
Web
hasReviewer owl:sameAs
Awesome http:// author Toby
Book …/isbn9
Segaran
78
http://…/
reviewer name
isbn
978-0-596-15381-6
Juan publisher
Sequeda http://…/
publisher name
O’Reilly
1
www.capsenta.com June 4, 2012 46
47. Juan Sequeda publishes data too
http://juans http://dbpedia.org/Au
livesIn stin
equeda.co
www.capsenta.com name Juan Sequeda June 4, 2012 47
m/id
48. Let’s link more data
http://…/ hasReview http://…/
review1 isbn978
description
hasReviewer
Awesome
Book
http://…/
name
reviewer
sameAs Juan
Sequeda
http://juans http://dbpedia.org/Au
livesIn stin
equeda.co
www.capsenta.com name Juan Sequeda June 4, 2012 48
m/id
49. And more
http://…/ hasReview http://…/
review1 isbn978 Programming
description title the Semantic
Web
hasReviewer owl:sameAs
Awesome author
http://…/ Toby
Book
isbn978 Segaran
http://…/
reviewer name
isbn 978-0-596-15381-6
owl:sameAs Juan publisher http://…/p
Sequeda ublisher1
name O’Reilly
http://juans http://dbpedia.org/Au
livesIn stin
equeda.co
www.capsenta.com name Juan Sequeda June 4, 2012 49
m/id
50. Data on the Web that is
in RDF and is linked to
other RDF data is
LINKED DATA
www.capsenta.com June 4, 2012 50
51. Linked Data makes the
web appear as
ONE
GIANT
HUGE
GLOBAL
DATABASE!
www.capsenta.com June 4, 2012 51
52. I can query a database
with SQL. Is there a way
to query Linked Data with
a query language?
www.capsenta.com June 4, 2012 52
53. Yes! There is actually a
standardize language for
that
SPARQL
www.capsenta.com June 4, 2012 53
54. FIND all the reviews on
the book “Programming
the Semantic Web” by
people who live in Austin
www.capsenta.com June 4, 2012 54
56. SELECT ?review ?comment
WHERE {
isbn:978 ex:hasReview ?review .
?review ex:description ?comment .
?review ex:hasReviewer ?person .
?person ex:lives dbpedia:Austin .
http://…/ hasReview http://…/
}
review1 isbn978 Programming
description title the Semantic
Web
hasReviewer owl:sameAs
Awesome author
http://…/ Toby
Book
isbn978 Segaran
http://…/
reviewer name
isbn 978-0-596-15381-6
owl:sameAs Juan publisher http://…/p
Sequeda ublisher1name O’Reilly
http://juans http://dbpedia.org/Au
livesIn stin
equeda.co
56
Juan Sequeda
www.capsenta.com name June 4, 2012
m/id
57. This looks cool, but let’s
be realistic. What is the
incentive to publish
Linked Data on the Web?
www.capsenta.com June 4, 2012 57
58. What was your incentive
to publish an HTML page
in 1990?
www.capsenta.com June 4, 2012 58
59. 1) Share data in documents
2) Because you neighbor was doing it
… later on …
3) Marketing, Advertising, …, SEO
www.capsenta.com June 4, 2012 59
60. So why should we publish
Linked Data in 2012?
www.capsenta.com June 4, 2012 60
61. 1) Share data as data
2) Because you neighbor is doing it
… later on …
3) Marketing, Advertising, …, SEO
www.capsenta.com June 4, 2012 61
62. Linked Data Publishers
US and UK Government
BBC
NY Times
Best Buy
Sears
Kmart
Overstock
… too many more to name
www.capsenta.com June 4, 2012 62
75. September 2011
Linking Open Data
cloud diagram, by
Richard Cyganiak and
Anja Jentzsch.
http://lod-cloud.net/
www.capsenta.com June 4, 2012 75
76. YOU GET THE PICTURE
ITS BIG and getting
BIGGER and
BIGGER
www.capsenta.com June 4, 2012 76
77. Part 2:
Linked Data Principles
www.capsenta.com June 4, 2012 77
78. Linked Data is a set of best practices to
publish and interlink data on the web
www.capsenta.com June 4, 2012 78
79. Linked Data Principles
1. Use URIs as names for
things
2. Use HTTP URIs so that
people can look up
(dereference) those
names.
3. When someone looks up a
URI, provide useful
information.
4. Include links to other URIs
so that they can discover
more things.
www.capsenta.com June 4, 2012 79
80. 1. Use URIs as names for things
www.capsenta.com June 4, 2012 80
81. 1) Use URIs as names for
things
http://dbpedia.org/resource/Austin,_Texas
http://xmlns.com/foaf/0.1/based_near
http://juansequeda.com/foaf.rdf#me http://www.w3.org/People/Berners-Lee/card#i
http://xmlns.com/foaf/0.1/knows
www.capsenta.com June 4, 2012 81
82. 2. Use HTTP URIs so that people
can look up (dereference)
those names.
www.capsenta.com June 4, 2012 82
83. 2) Use HTTP URIs
HTTP client can lookup the URI using HTTP
protocol and retrieve a description
http://dbpedia.org/resource/Austin,_Texas
www.capsenta.com June 4, 2012 83
91. Identifies the abstract concept of
“the city of Austin, Texas”
http://dbpedia.org/resource/Austin,_Texas
Accept: text/html Accept: application/rdf+xml
http://dbpedia.org/page/Austin,_Texas http://dbpedia.org/data/Austin,_Texas.xml
Identifies an HTML document that Identifies an RDF document that
describes “the city of Austin, Texas” describes “the city of Austin, Texas”
www.capsenta.com June 4, 2012 91
92. Minting HTTP URIs
If you own the domain name and run a web
server at that location, mint URIs in this
namespace
I own the domain capsenta.com
I run the webserver http://capsenta.com
I can mint URIs in this namespace
http://capsenta.com/person/Juan-Sequeda
www.capsenta.com June 4, 2012 92
93. Cool URIs http://www.w3.org/TR/cooluris/
Don’t misuse a namespace that you don’t own
http://www.imdb.com/title
Avoid implementation details
http://capsenta.com/person.php?id=123&format=rdf
Use Natural Keys
http://capsenta.com/person/123
www.capsenta.com June 4, 2012 93
94. 3. When someone looks up a
URI, provide useful
information.
www.capsenta.com June 4, 2012 94
95. 3) Provide useful information
How do we provide useful information in
document form on the web? HTML
How do we provide useful information in data
form on the web RDF
www.capsenta.com June 4, 2012 95
96. What to publish?
Literal Triples
<http://dbpedia.org/resource/Austin,_Texas>
<http://xmlns.com/foaf/0.1/name>
“City of Austin”
Outgoing Link Triples
<http://dbpedia.org/resource/Austin,_Texas>
<http://www.w3.org/2002/07/owl#sameAs>
<http://rdf.freebase.com/ns/m/0vzm>
Incoming Link Triples
<http://dbpedia.org/resource/Dakota_Johnson>
<http://dbpedia.org/ontology/birthPlace>
<http://dbpedia.org/resource/Austin,_Texas>
www.capsenta.com June 4, 2012 96
97. What to publish?
Description of the data set
Semantic Sitemaps
voiD (Vocabulary of Interlinked Datasets)
Provenance Metadata
Licenses Information
www.capsenta.com June 4, 2012 97
98. Vocabularies (or Schemas or
Ontologies)
Create your own using
RDFS/OWL/ SKOS
Reuse vocabularies
Dublin Core: metadata attributes
Friend of a Friend (FOAF): persons and relationships
Semantically Interlinked Online Communities (SIOC): describing
users, posts, blogs, etc
Description of a Project (DOAP)
Music Ontology
Programmes Ontology: TV and radio programs
Good Relations: describing products and services
Review Vocabulary
Basic Geo (WGS84) Vocabulary
www.capsenta.com June 4, 2012 98
99. 4. Include links to other URIs so
that they can discover more
things.
www.capsenta.com June 4, 2012 99
100. 4) Include links to other things
Set external RDF links into other data sources on
the Web
Subject of the triple is in the namespace of one data
set
Object of the triple is a URI in the namespace of
another data set
Connect siloed data islands
Enable discovery
www.capsenta.com June 4, 2012 100
101. 4) Include links to other things
Relationship Link Triples
<http://juansequeda.com/foaf.rdf#me>
<http://xmlns.com/foaf/0.1/based_near>
<http://dbpedia.org/resource/Austin,_Texas>
Identity Link Triples
<http://dbpedia.org/resource/Austin,_Texas>
<http://www.w3.org/2002/07/owl#sameAs>
<http://rdf.freebase.com/ns/m/0vzm>
Vocabulary Link Triples
<http://capsenta.com/vocab/name>
<http://www.w3.org/2002/07/owl#equivalentProperty>
<http://xmlns.com/foaf/0.1/name>
www.capsenta.com June 4, 2012 101
102. Which predicate for linking to
choose?
Depends on your domain
Is it widely used?
owl:sameAs
foaf:knows
foaf:based_near
…
If you create your own, relate it to a widely
used predicate
www.capsenta.com June 4, 2012 102
103. Part 3:
Linked Data
Architectures
www.capsenta.com June 4, 2012 103
104. Static RDF Files
Small amount of data (personal FOAF file)
Use RDF/XML serialization
Save as .rdf file and upload it to your server
http://www.capsenta.com/company.rdf
http://www.capsenta.com/company.rdf#this
Configure MIME types
AddType application/rdf+xml .rdf
Make RDF discoverable from HTMl
<link rel="alternate" type="application/rdf+xml" href="company.rdf">
www.capsenta.com June 4, 2012 104
105. RDF in HTML (RDFa)
Another syntax for RDF
Useful if you have template HTML pages
Drupal 7 will do this out of the box
www.capsenta.com June 4, 2012 105
107. RDB2RDF
Upcoming W3C RDB2RDF Standards
R2RML: mapping language
Direct Mapping: default automatic mapping
Two Approaches
Dynamic (SPARQL to SQL)
ETL (Dump RDB to RDF)
Ultrawrap
Supports W3C standard and more
SPARQL as fast as SQL
www.capsenta.com June 4, 2012 107
108. Unstructured to RDF
Triplestore
Entity Extractor
Unstructured
www.capsenta.com June 4, 2012 108
109. Semi-structured to RDF
Triplestore
XML2RDF,
XLS2RDF,
CVS2RDF
Semi-structured
www.capsenta.com June 4, 2012 109
110. RDB to RDF
CMS with RDFa, RDB2RDF
Semantic Wiki (SPARQL to SQL) Triplestore
RDB2RDF
ETL
Relational
Database
www.capsenta.com June 4, 2012 110
111. Creating Linked Data
Linked Data
CMS with Data
Linked Data RDB2RDF Custom Linked
Web Server RDFa, Semantic
Interface (i.e. Ultrawrap) Data Wrapper Publication
Wiki
RDB2RDF
Data source
Data
Triplestore RDB
with API Storage
XML2RDF, Data
Entity Extractor
XLS2RDF, CVS2RDF
Preparation
Unstructured Semi-structured Structured Type of Data
Thanks Heath and Bizer
www.capsenta.com June 4, 2012 111
112. Consuming Linked Data
Application
Schema Mapping Record Linkage Provenance Tracking
Data Access
Linked Data
Creating Linked Data
www.capsenta.com June 4, 2012 112
114. Record Linkage
Different URIs that identify the same thing
Create owl:sameAs links between them
Manually lookup: Sindice
(Semi) Automatically: SILK
www.capsenta.com June 4, 2012 114
115. Provenance
Keep track where the data is coming from
Quality
Trust
Named Graphs
SPARQL Graph
www.capsenta.com June 4, 2012 115
116. Centralized
Application
SPARQL
Triplestore
Creating Linked Data
www.capsenta.com June 4, 2012 116
117. Centralized
Advantage
Include the datasets that you need
Complex queries and high performance
Reasoning
Drawbacks
Depends on RDF dumps or crawling
Effort to setup the centralized triplestore
Queried data may be out of date
www.capsenta.com June 4, 2012 117
119. Federated
Advantage
Include the datasets that you need
Queried data is up to date
Drawbacks
Requires existence of a SPARQL endpoint
Effort to setup federator
www.capsenta.com June 4, 2012 119
120. Linked Traversal
Application
SPARQL
Linked Traversal Query Engine
Linked Data
RDB2RDF
Triplestore
Relational
Database
www.capsenta.com June 4, 2012 120
121. Linked Traversal
Advantage
No need to know the data sources in advance
Does not depend on the existence of SPARQL
endpoints or RDF dumps
Queried data is up to date
Drawbacks
Query execution time is slow
Unsuitable for some queries
Results may be incomplete
Still in research
www.capsenta.com June 4, 2012 121
122. Applications
Linked Data Browsers
http://browse.semanticweb.org/
Linked Data (Semantic Web) Search Engines
Falcons, SWSE, VisiNav, Sindice, Sigma, Swoogle, Wats
on
Search Engines
Google, Bing, Yahoo!
Faceted Browsers
http://dbpedia.neofonie.de/browse/
www.capsenta.com June 4, 2012 122
123. Domain Specific Applications
BBC World Cup
Seevl.net
Linked Life Data
Government apps
www.capsenta.com June 4, 2012 123
124. Part 2:
Linked Enterprise Data
www.capsenta.com June 4, 2012 124
125. Use
Linked Data Principles
internally
Consume
Linked (Open) Data
Publish
Linked (Open) Data
www.capsenta.com June 4, 2012 125
126. Linked Enterprise Data
Linked Data can be used as an architectural
style for integrating data in the Enterprise
1. Standard Data Access Mechanism: HTTP
2. Standard Address & Identifier Scheme: URI
3. Standard Data Model: RDF
www.capsenta.com June 4, 2012 126
127. Linked Enterprise Data
Information creation information sharing
Produce and consume data specific to your
needs but also produce it in a way that it can
be connected to other data in the enterprise
Distributed but connected!
Data that you create, may benefit others!
Share it!
www.capsenta.com June 4, 2012 127
128. Benefits of RDF/Linked Data
RDF (graphs) is a least common denominator
Text, CVS, XML, XLS, RDB to RDF
Imagine modeling a social network in XML
Dynamic and Flexible
Adding a column to a table in my RDBMS takes 6
months to authorize!
With RDF, simply add the triple!
Incremental
www.capsenta.com June 4, 2012 128
129. Benefits of RDF/Linked Data
Power of the URI and Links
Universal Identifier
Create a “foreign key” to a table that I have no
control of
Scalability in months, not only seconds
“More can be done with less and faster”
“Cooperation without coordination”
www.capsenta.com June 4, 2012 129
130. What’s next?
W3C Linked Data Platform Working Group
http://www.w3.org/2012/ldp/charter
Linked Data Basic Profile 1.0
http://www.w3.org/Submission/ldbp/
www.capsenta.com June 4, 2012 130
132. Linked Data Checklist
Does your data link to other data sets?
Do you provide provenance metadata?
Do you provide licensing metadata?
Do you reuse common vocabularies?
Do you map proprietary vocabulary terms to
common vocabularies?
Do you provide other access methods?
Thanks Heath & Bizer
www.capsenta.com June 4, 2012
133. Acknowledgements
RiBS Lab – UT Austin
Olaf Hartig – Humboldt University Berlin
Patrick Sinclair – BBC
Jamie Taylor – Google
Tom Heath & Chris Bizer. Linked Data: Evolving the
Web into a Global Data Space
David Wood (Ed.). Linking Enterprise Data
www.capsenta.com June 4, 2012 133
134. Thanks!
Juan F. Sequeda Daniel P. Miranker
juan@capsenta.com miranker@capsenta.com
@juansequeda
www.capsenta.com
www.capsenta.com June 4, 2012 134