Slides of my talk given at Sharif University of Technology, Tehran
Time: Tuesday, August 30, 2016, 15:00- 17:00
Location: Kharazmi Hall, 4th Floor, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
more info: http://knowdiff.net/visiting-lecturer-program-254/#more-872
Towards a Linked Open Data Infrastructure for Science, Technology & Innovation Studies
1. x
Towards a Linked Open Data Infrastructure
for Science, Technology & Innovation
Studies
Ali Khalili, PhD
Department of Computer Science/Artificial Intelligence
Knowledge Representation & Reasoning Research Group
2. Outline
• Linked (Open) Data
• RISIS Project
• Semantically Mapping Science (SMS) Platform
• Workflow
• Use Cases
• Adaptive Functional Urban Areas (FUAs) to Study Innovative Activities
• Gendered Dimensions in Grant Selection
5. Linked (Open) Data
• A set of best practices for publishing data on the Web.
• Follows 4 simple principles:
https://www.ted.com/talks/tim_berners_lee_on_the_next_web
• Use HTTP URIs so that users can look up (dereference) those names.
• When someone looks up a URI, provide useful information, using the
open standards.
• Include links to other URIs, so that users can discover more things.
• Use URIs as names (identifiers) for conceptual things.
11. 5 Open Data
make your stuff available on the Web (whatever format)
under an open license
make it available as structured data
(e.g., Excel instead of image scan of a table)
make it available in a non-proprietary open format
(e.g., CSV instead of Excel)
use Linked Data format
(URIs to identify things, RDF to represent data)
link your data to other people’s data to provide context
http://5stardata.info/
26. Linked Open Data: Examples
• Give me a list of capital cities in Europe with population more than 500,000
• Who are mayors of central European towns elevated more than 1000m?
• Which movies are starring both Brad Pitt and Angelina Jolie?
• All soccer players, who played as goalkeeper for a club that has a stadium with
more than 40.000 seats and who are born in a country with more than 10 million
inhabitants
• …
27. Linked Open Data: Examples
• Give me a list of capital cities in Europe with population more than 500,000
• Who are mayors of central European towns elevated more than 1000m?
• Which movies are starring both Brad Pitt and Angelina Jolie?
• All soccer players, who played as goalkeeper for a club that has a stadium with
more than 40.000 seats and who are born in a country with more than 10 million
inhabitants
• …
40. defined by OECD in collaboration with EC/Eurostat
consider factors beyond the predefined city boundaries to better
reflect the economic geography of where people live and work
Functional Urban Areas (FUAs)
OECD Metropolitan eXplorer: http://measuringurban.oecd.org
41. defined by OECD in collaboration with EC/Eurostat
consider factors beyond the predefined city boundaries to better
reflect the economic geography of where people live and work
population
area
GDP
environment (CO2 emissions and air pollution)
labour market (employment and unemployment growth)
innovation (patent intensity)
urban form and territorial organization
Functional Urban Areas (FUAs)
OECD Metropolitan eXplorer: http://measuringurban.oecd.org
47. Problem
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
OECD FUAs List
48. Problem
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
OECD FUAs List
49. Problem
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
OECD FUAs List
50. Problem
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
- Shapefiles for FUAs or LAUs?
OECD FUAs List
51. Problem
Address FUA
?
• Vrije Universiteit Amsterdam
• De Boelelaan 1105, 1081 HV Amsterdam
Amsterdam (NL002)
- Geocode to LAU (municipality)
- Shapefiles for FUAs or LAUs?
OECD FUAs List
52. Linked Open Data 28Ali Khalili
Linked Open Data
Interlinking
Enrichment
Quality
Analysis
Evolution
Exploration
Extraction
Storage/
Querying
Authoring
Linked (Open) Data
Lifecycle
http://stack.linkeddata.org/
53. Linked Open Data 28Ali Khalili
Linked Open Data
Interlinking
Enrichment
Quality
Analysis
Evolution
Exploration
Extraction
Storage/
Querying
Authoring
Linked (Open) Data
Lifecycle
http://stack.linkeddata.org/
54. Linked Open Data 29Ali Khalili
Linked Open Data Lifecycle
Exploration
55. Linked Open Data 29Ali Khalili
Linked Open Data Lifecycle
• Search
• Browse
• Visualize
Exploration
56. Search for Linked Data
Linked Open Data 30Ali Khalili
Linked Open Data Lifecycle Exploration
http://lov.okfn.org/
57. Search for Linked Data
Linked Open Data 31Ali Khalili
Linked Open Data Lifecycle Exploration
http://schema.org/
http://bl.ocks.org/danbri/1c121ea8bd2189cf411c
58. Search for Linked Data
Linked Open Data 32Ali Khalili
Linked Open Data Lifecycle
Data hub http://datahub.io
search for data, register published datasets, create and manage groups of datasets…
Exploration
59. Search for Linked Data
Linked Open Data 33Ali Khalili
Linked Open Data Lifecycle Exploration
http://lotus.lodlaundromat.org
60. Search for Linked Data
Linked Open Data 34Ali Khalili
Linked Open Data Lifecycle Exploration
• OpenStreepMap (OSM)
• Database of Global Administrative Areas (GADM)
• Flickr Shapefiles Dataset
• Published Shapefiles for Individual Countries
• Published Geospatial RDF Datasets
Example
61. OpenStreetMap (OSM)
• https://www.openstreetmap.org
• built by a community of mappers that contribute and
maintain data about roads, trails, cafés, railway
stations, and much more, all over the world.
• Administrative Boundaries
• Level 1: super-national administrations e.g. European Union.
• Level 2: country borders based on the political entities listed on
the ISO 3166 standard.
• Level 3 to 11: subnational borders such as ``state'', ``province'',
``region'' and ``district''.
• Data Access
• Nominatim Web API for querying OSM
• The Overpass API for fetching specific OSM data
• Planet.osm Data (over 617GB uncompressed!)
62. OSM: Nominatim Web API
• a tool to search OSM data by name and address and to
generate synthetic addresses of OSM points (reverse
geocoding)
• Several companies provide hosted instances of Nominatim
query API, e.g MapQuest Open Initiative, PickPoint or the
OpenCage Geocoder
• API documentation
• Example usage:
• http://nominatim.openstreetmap.org/search.php?
q=amsterdam&polygon=1&country=Netherlands&format=
json&addressdetails=1
• MapQuest API
64. GADM (Global Administrative Areas)
• http://www.gadm.org
• GADM is developed by University of California, Berkeley
Museum of Vertebrate Zoology, the International Rice
Research Institute and the University of California, Davis, and
with contributions of many others.
• uses other existing sources: http://www.gadm.org/links
• Administrative Boundaries
• Level 0: countries.
• Level 1 to 5: lower level subdivisions such as provinces, departments,
counties, etc. depending on the size and availability of data for the
underlying country.
• Data Access
• data is available globally and for each individual country, in different
formats: geopackage,R SpatialPolygonsDataFrame, ESRI file geodatabase, Google Earth
65. Flickr geo-tagged pictures
• Data from 190M geo-tagged photos on Flickr
• new smart phone do not only have a camera but also the ability to capture
location information.
• plotted all the geotagged photos associated with a particular place to
generate a mostly accurate contour of that place (something more fine-
grained than a bounding box!).
• Where On Earth (WOE) IDs
• correspond to the hierarchy of places where a photo was taken: from
country (level 1), region (level 2) county (level 3), locality (level 4) to
neighborhood (level 5).
• for a given WOE entity, approximate shape of that place is inferred.
• shapes in GeoJSON format
• view shapes at http://polymaps.org/ex/flickr.html
• download at http://www.flickr.com/services/shapefiles/2.0.1/
• more info: http://code.flickr.net/2012/10/24/2273/
66. Published Shapefiles for Individual Countries
• Local administrative offices or Geo-related research
centres might provide shape files specific to a country.
• E.g. for the Netherlands, shapefiles are provided by
Centraal Bureau voor de Statistiek (CBS)
• Data collection needs to be done by a group of people
in contact with Geo-related organization in countries.
• Current status
67. Published Geospatial RDF Datasets
•http://linkedgeodata.org and http://geoknow.eu
•a large spatial knowledge base (>400m geo elements)
which has been derived from OpenStreetMap.
•provides unique URIs and has Mappings to DBpedia.
•GeoVocab.org
• GADM-RDF: Global Administrative Areas
• NUTS-RDF: EU's Nomenclature of Territorial Units for
Statistics
68. Published Geospatial RDF Datasets
•http://linkedgeodata.org and http://geoknow.eu
•a large spatial knowledge base (>400m geo elements)
which has been derived from OpenStreetMap.
•provides unique URIs and has Mappings to DBpedia.
•GeoVocab.org
• GADM-RDF: Global Administrative Areas
• NUTS-RDF: EU's Nomenclature of Territorial Units for
Statistics Outdated!
No Shapefiles!
71. Linked Open Data 44Ali Khalili
Linked Open Data Lifecycle Extraction DBpedia
72. Linked Open Data 44Ali Khalili
Linked Open Data Lifecycle Extraction DBpedia
Persian DBpedia?
73. Persian DBpedia (mapping Wiki)
Linked Open Data 45Ali Khalili
Linked Open Data Lifecycle Extraction DBpedia
74. Linked Open Data 46Ali Khalili
Linked Open Data Lifecycle Extraction
• Ad-hoc
• DBpedia extraction framework
• Generic
• OpenRefine
from Semi-structured sources
75. from Unstructured sources
Linked Open Data 47Ali Khalili
Linked Open Data Lifecycle Extraction
…After leaving Apple, Jobs took a few of its members with him to
found NeXT, a computer platform development company based in
Redwood City, specializing in state-of-the-art computers for higher-
education and business markets. In addition, Jobs helped to initiate
the development of the visual effects industry when he funded the
spinout of the computer graphics division of George Lucas's
company Lucasfilm in 1986. The new company, Pixar, would
eventually produce the first fully computer-animated film, Toy Story…
NLP, Text mining, Annotation
76. from Unstructured sources
Linked Open Data 47Ali Khalili
Linked Open Data Lifecycle Extraction
…After leaving Apple, Jobs took a few of its members with him to
found NeXT, a computer platform development company based in
Redwood City, specializing in state-of-the-art computers for higher-
education and business markets. In addition, Jobs helped to initiate
the development of the visual effects industry when he funded the
spinout of the computer graphics division of George Lucas's
company Lucasfilm in 1986. The new company, Pixar, would
eventually produce the first fully computer-animated film, Toy Story…
NLP, Text mining, Annotation
Named Entity Recognition
77. from Unstructured sources
Linked Open Data 47Ali Khalili
Linked Open Data Lifecycle Extraction
…After leaving Apple, Jobs took a few of its members with him to
found NeXT, a computer platform development company based in
Redwood City, specializing in state-of-the-art computers for higher-
education and business markets. In addition, Jobs helped to initiate
the development of the visual effects industry when he funded the
spinout of the computer graphics division of George Lucas's
company Lucasfilm in 1986. The new company, Pixar, would
eventually produce the first fully computer-animated film, Toy Story…
NLP, Text mining, Annotation
Named Entity Recognition
foundedBy
Relation Extraction
78. Named Entity Recognition
Linked Open Data 48Ali Khalili
Linked Open Data Lifecycle Extraction
http://spotlight.dbpedia.org
http://bioportal.bioontology.org/annotator
79. from Structured sources: Triplification
Linked Open Data 49Ali Khalili
Linked Open Data Lifecycle Extraction
• Relational Database to RDF
R2RML: RDB to RDF Mapping Language
http://www.w3.org/TR/r2rml/
• D2R Server: Accessing databases with SPARQL &
as Linked Data
http://d2rq.org/
• Sparqlify
defining RDF views on relational databases
http://sparqlify.org/
80. DATA EXTRACTION & CONVERSION
GeoJSON
Enrichment
Functions
Mapping
Configurations
OSM XML
PBF
ESRI shapes
triplify
mapshaper
osmtogeojson
osmosis
81. DATA EXTRACTION & CONVERSION
Metadata about different levels provided by OSM
http://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative
83. Relational Databases vs. Triple Stores
Linked Open Data 53Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• A relational databases’ (e.g. MySQL, PostgreSQL, Oracle)
natural representation is a collection interlinked tables.
• A triple stores’ (e.g. OpenSesame, AllegroGraph, Neo4j)
natural representation is a multi-relational network, or graph.
* Triple Store: it is called a triple store because in RDF, the facts
are represented in form of a triple (Subject-Predicate-Object).
84. Existing Triple Stores
Linked Open Data 54Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Native triple stores
4Store, AllegroGraph, BigData, Jena TDB, Sesame,
Stardog, OWLIM and uRiKa
• RDBMS-backed triple stores
Jena SDB, IBM DB2 and OpenLink Virtuoso
• NoSQL triplestores
CumulusRDF
85. DATA STORAGE & QUERYING
Virtuoso Geo Spatial
Geometry as SMS
internal representation
for Geo-data in RDF
86. SPARQL – SQL for the Linked Data
Linked Open Data 56Ali Khalili
Linked Open Data Lifecycle Storage/Querying
What can be done with SPARQL that can't be done with SQL?
87. SPARQL – SQL for the Linked Data
Linked Open Data 56Ali Khalili
Linked Open Data Lifecycle Storage/Querying
What can be done with SPARQL that can't be done with SQL?
• SPARQL queries are considerably better aligned with users’ mental
models of a domain.
88. SPARQL – SQL for the Linked Data
Linked Open Data 56Ali Khalili
Linked Open Data Lifecycle Storage/Querying
What can be done with SPARQL that can't be done with SQL?
• SPARQL queries are considerably better aligned with users’ mental
models of a domain.
89. SPARQL – SQL for the Linked Data
Linked Open Data 57Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• SPARQL allows the conceptual data model to be fully explored
through queries.
90. SPARQL – SQL for the Linked Data
Linked Open Data 57Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• SPARQL allows the conceptual data model to be fully explored
through queries.
- example:workPhone rdfs:subPropertyOf example:phone
- example:cellPhone rdfs:subPropertyOf example:phone
- example:homePhone rdfs:subPropertyOf example:phone
91. SPARQL – SQL for the Linked Data
Linked Open Data 58Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Queries that have to traverse a chain of connections are
particularly complex in SQL while very simple in SPARQL.
92. SPARQL – SQL for the Linked Data
Linked Open Data 58Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• Queries that have to traverse a chain of connections are
particularly complex in SQL while very simple in SPARQL.
93. SPARQL – SQL for the Linked Data
Linked Open Data 59Ali Khalili
Linked Open Data Lifecycle Storage/Querying
• In addition to SELECT, INSERT and DELETE, SPARQL supports
ASK queries.
• SPARQL includes syntax (i.e. SERVICE) to call two or more data
sources within a single query.
• …
94. SPARQL Query Interface
Linked Open Data 60Ali Khalili
Linked Open Data Lifecycle Storage/Querying
http://yasgui.org/
96. Interlinking
Linked Open Data 62Ali Khalili
Linked Open Data Lifecycle
• The degree to which entities that represent the same
concepts are linked to each other.
• “Connecting things that are somehow related”
• Methods
• Automatic, Semi-automatic, Manual
• Universal, Domain-specific
<http://dbpedia.org/resource/VU_University_Amsterdam>
<https://www.wikidata.org/entity/Q1065414>
SameAs
97. Interlinking Methods
Linked Open Data 63Ali Khalili
Linked Open Data Lifecycle
• Ontology Matching
• establish links between ontologies underlying two
data sources.
• Instance Matching (Link Discovery)
• discover links between instances contained in two
data sources.
98. DATA LINKAGE
- Query on metadata about the
administrative boundaries
- Find the alignment between levels
in different datasets
99. DATA LINKAGE
- used the possible mappings between datasets at different levels.
- check the overlaps of areas at the similar level, and for the matching areas apply
string matching to make sure that they refer to the same administrative boundary.
109. Use Cases
(research) and innovation subsidies for organizations and companies in the Netherlands
People Hybrid OECD FUAsBusinesses
People Hybrid OECD FUAsBusinesses
110. Use Cases
Universities + Companies + Projects + Boundaries
Properties of container
administrative boundaries
Collaboration between
Universities and Companies
Properties of Universities
and Companies
111. Use Cases
Universities + Companies + Projects + Boundaries
RVO-NL
DBpedia OpenStreetMap
GADM
Flickr
OECD FUAs
CBS-NL
Properties of container
administrative boundaries
Collaboration between
Universities and Companies
Properties of Universities
and Companies
112. Use Cases
Universities + Companies + Projects + Boundaries
RVO-NL
DBpedia
Leiden-Ranking
ETER
OrgRef Cordis OpenStreetMap
GADM
Flickr
OECD FUAs
Grid
CBS-NL
Properties of container
administrative boundaries
Collaboration between
Universities and Companies
Properties of Universities
and Companies
Eurostat
113. Summary of the Use Case
Address
FUA
Administrative Boundaries
Coordinates
geocode
114. Summary of the Use Case
Address
FUA
Administrative Boundaries
Coordinates
geocode
115.
116. References
Linked Open Data 77Ali Khalili
Linked Open Data
• http://slidewiki.org/deck/11936_semantic-data-web-lecture-series
• Introduction to linked data and its lifecycle on the web
• http://euclid-project.eu/
• http://videolectures.net/wims2011_auer_interlinked/
• https://vimeo.com/76257120
• http://www.slideshare.net/slidarko/evolving-the-web-into-a-giant-global-
database-3880018
• http://www.dataversity.net/introduction-to-triplestores/
• http://www.topquadrant.com/2014/05/05/comparing-sparql-with-sql/