In this tutorial we present the life cycle of linked geospatial data and we focus on two important steps: the publication of geospatial data as RDF graphs and interlinking them with each other. Given the proliferation of geospatial information on the Web many kinds of geospatial data are now becoming available as linked datasets (e.g., Google and Bing maps, user-generated geospatial content, public sector information published as open data etc.). The topic of the tutorial is related to all core research areas of the Semantic Web (e.g., semantic information extraction, transformation of data into RDF graphs, interlinking linked data etc.) since there is often a need to re-consider existing core techniques when we deal with geospatial information. Thus, it is timely to train Semantic Web researchers, especially the ones that are in the early stages of their careers, on the state of the art of this area and invite them to contribute to it.
In this tutorial we give a comprehensive background on data models, query languages, implemented systems for linked geospatial data, and we discuss recent approaches on publishing and interlinking geospatial data. The tutorial is complemented with a hands-on session that will familiarize the audience with the state-of-the-art tools in publishing and interlinking geospatial information.
http://event.cwi.nl/eswc2015-geo/
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
1. Publishing and Interlinking
Linked Geospatial Data
Tutorial in Conjunction with the
12th Extended Semantic Web Conference
http://event.cwi.nl/eswc2015-geo/
2. Tutorial organization
9:00-9:15 Introduction
9:15-10:30 Background in geospatial data modeling, representing
geospatial information in the Semantic Web, and querying linked
geospatial data.
10:30-11:00 coffee break
11:00-12:00 Publishing geospatial information as RDF graphs
12:00-12:30 Discovering Spatial and Temporal Links among RDF graphs
12:30-14:00 Lunch break
14:00-14:30 Discovering Spatial and Temporal Links among RDF graphs
14:30-15:30 Hands-on session: Publishing geospatial information as RDF
graphs
15:30-16:00 coffee break
16:00-17:00 Hands-on session: Discovering Spatial and Temporal Links
among RDF graphs
17:00-17:10 Conclusions
http://event.cwi.nl/eswc2015-geo/
3. Part 1:
Background in geospatial data
modeling
ESWC 2015 Tutorial
Publishing and Interlinking Linked Geospatial Data
Dept. of Informatics and Telecommunications
National and Kapodistrian University of Athens
4. ESWC 2015 Tutorial 2
Outline
• Basic GIS concepts and terminology
• Representing geometries
• Representing topological information
• Geospatial data standards
5. ESWC 2015 Tutorial 3
Basic GIS Concepts and
Terminology
• Theme: the information corresponding to a particular domain
that we want to model. A theme is a set of geographic
features.
• Example: the countries of Europe
6. ESWC 2015 Tutorial 4
Basic GIS Concepts (cont’d)
• Geographic feature or geographic object: a domain entity
that can have various attributes that describe spatial and non-
spatial characteristics.
• Example: the country Greece with attributes
• Population
• Flag
• Capital
• Geographical area
• Coastline
• Bordering countries
7. ESWC 2015 Tutorial 5
Basic GIS Concepts (cont’d)
• Geographic features can be atomic or complex.
• Example: According to the Kallikratis administrative reform of
2010, Greece consists of:
• 13 regions (e.g., Crete)
• Each region consists of regional units (e.g., Heraklion)
• Each regional unit consists of municipalities (e.g.,
Dimos Chersonisou)
• …
8. ESWC 2015 Tutorial 6
Basic GIS Concepts (cont’d)
• The spatial characteristics of a feature can involve:
• Geometric information (location in the underlying
geographic space, shape etc.)
• Topological information (containment, adjacency etc.).
Municipalities of the regional unit of
Heraklion:
1. Dimos Irakliou
2. Dimos Archanon-Asterousion
3. Dimos Viannou
4. Dimos Gortynas
5. Dimos Maleviziou
6. Dimos Minoa Pediadas
7. Dimos Festou
8. Dimos Chersonisou
9. ESWC 2015 Tutorial 7
Geometric Information
• Geometric information can be captured by using geometric primitives
(points, lines, polygons, etc.) to approximate the spatial attributes of
the real world feature that we want to model.
• Geometries are associated with a coordinate reference system which
describes the coordinate space in which the geometry is defined.
10. ESWC 2015 Tutorial 8
Encoding Geometries: Vector
Representation
• In this encoding objects in space are represented using points as
primitives as follows:
• A point is represented by a tuple of coordinates.
• A line segment is represented by a pair with its beginning
and ending point.
• More complex objects such as arbitrary lines, curves,
surfaces etc. are built recursively by the basic primitives
using constructs such as lists, sets etc.
• This is the approach used in all GIS and other popular
systems today. It has also been standardized by various
international bodies.
12. ESWC 2015 Tutorial 10
Encoding Geometries: Constraint
Representation
• In this case objects in space are represented by quantifier free
formulas in a constraint language (e.g., linear constraints).
)
3
4
3
53()124()223(
x
yxyyxxyyxxy
13. ESWC 2015 Tutorial 11
Constraint Databases
• The constraint representation of spatial data was the focus of
much work in databases, logic programming and AI after the
paper by Kanellakis, Kuper and Revesz (PODS, 1991).
• The approach was very fruitful theoretically but was not adopted
in practice.
14. ESWC 2015 Tutorial 12
Topological Information
• Topological information is inherently qualitative and it is
expressed in terms of topological relations (e.g., containment,
adjacency, overlap etc.).
• Topological information can be derived from geometric
information or it might be captured by asserting explicitly the
topological relations between features.
15. ESWC 2015 Tutorial 13
Topological Relations
• The study of topological relations has produced
a lot of interesting results by researchers in:
• GIS
• Spatial databases
• Artificial Intelligence (qualitative reasoning
and knowledge representation)
16. ESWC 2015 Tutorial 14
DE-9IM
• The dimensionally extended 9-intersection model
(DE-9IM) of Clementini and Felice.
• It is based on the point-set topology of R2.
• It deals with simple, closed and connected
geometries (areas, lines, points).
• It is an extension of earlier approaches: the 4-
intersection (4IM) and 9-intersection (9IM)
models by Egenhofer and colleagues.
17. ESWC 2015 Tutorial 15
Topological Relations in DE-9IM
• It captures topological relationships between two
geometries a and b in R2 by considering the
dimensions of the intersections of the
boundaries, interiors and exteriors of the two
geometries:
• The dimension can be 2, 1, 0 and -1 (dimension of
the empty set).
18. ESWC 2015 Tutorial 16
Example
I(C) B(C) E(C)
I(A) -1 -1 2
B(A) -1 -1 1
E(A) 2 1 2
A
C
19. ESWC 2015 Tutorial 17
Topological Relations in DE-9IM
• The following five named relationships between two different
geometries can be distinguished: disjoint, touches, crosses,
within and overlaps.
• The named relationships have a reasonably intuitive meaning
for users. They are jointly exclusive and pairwise disjoint
(JEPD).
• The model can also be defined using an appropriate calculus of
geometries that uses these 5 binary relations and boundary
operators.
20. ESWC 2015 Tutorial 18
Example: A disjoint C
I(C) B(C) E(C)
I(A) F F *
B(A) F F *
E(A) * * *
A
C
Notation:
• T = { 0, 1, 2 }
• F = -1
• * = don’t care = { -1, 0, 1, 2 }
21. ESWC 2015 Tutorial 19
Example: A within C
I(C) B(C) E(C)
I(A) T * F
B(A) * * F
E(A) * * *
C
A
Notation equivalent to 3x3
matrix:
• String of 9 characters
representing the above matrix in
row major order.
• In this case: T*F**F***
23. ESWC 2015 Tutorial 21
The Region Connection Calculus (RCC)
• The primitives of the calculus are spatial regions. These are
non-empty, regular closed subsets of a topological space.
• The calculus is based on a single binary predicate C that
formalizes the “connectedness” relation.
• C(a,b) is true when the closure of a is connected to the
closure of b i.e., they have at least one point in common.
• It is axiomatized using first order logic.
24. ESWC 2015 Tutorial 22
RCC-8
• This is a set of eight JEPD binary relations that can
be defined in terms of predicate C.
25. ESWC 2015 Tutorial 23
RCC-5
• The RCC-5 subset has also been studied. The
granularity here is coarser. The boundary of a region is
not taken into consideration:
• No distinction among DC and EC, called just DR.
• No distinction among TPP and NTPP, called just
PP.
• RCC-8 and RCC-5 relations can also be defined
using point-set topology, and there are very close
connections to the models of Egenhofer and others.
26. ESWC 2015 Tutorial 24
More Qualitative Spatial Relations
• Orientation/Cardinal directions (left of, right of,
north of, south of, northeast of etc.)
• Distance (close to, far from etc.). This information
can also be quantitative.
27. ESWC 2015 Tutorial 25
Coordinate Systems
• Coordinate: one of n scalar values that determines the position
of a point in an n-dimensional space.
• Coordinate system: a set of mathematical rules for specifying
how coordinates are to be assigned to points.
• Example: the Cartesian coordinate system
28. ESWC 2015 Tutorial 26
Coordinate Reference Systems
• Coordinate reference system: a coordinate system
that is related to an object (e.g., the Earth, a planar
projection of the Earth, a three dimensional
mathematical space such as R3) through a datum
which specifies its origin, scale, and orientation.
• The term spatial reference system is also used.
29. ESWC 2015 Tutorial 27
Geographic Coordinate Reference Systems
• These are 3-dimensional coordinate systems that utilize latitude
(φ), longitude (λ) , and optionally geodetic height (i.e.,
elevation), to capture geographic locations on Earth.
30. ESWC 2015 Tutorial 28
The World Geodetic System
• The World Geodetic System (WGS) is the most well-known
geographic coordinate reference system and its latest revision is
WGS84.
• Applications: cartography, geodesy, navigation (GPS), etc.
31. ESWC 2015 Tutorial 29
Projected Coordinate Reference
Systems
• Projected coordinate reference systems: they transform the
3-dimensional approximation of the Earth into a 2-dimensional
surface (distortions!)
• Example: the Universal Transverse Mercator (UTM) system
32. ESWC 2015 Tutorial 30
Coordinate Reference Systems
(cont’d)
• There are well-known ways to translate between co-
ordinate reference systems.
• See the list of coordinate reference systems of the
European Petroleum Survey Group: http://www.epsg-
registry.org/
33. ESWC 2015 Tutorial 31
Geospatial Data Standards
• The Open Geospatial Consortium (OGC) and the
International Organization for Standardization (ISO) have
developed many geospatial data standards that are in wide use
today. In this tutorial we will cover:
• Well-Known Text
• Geography Markup Language
• OpenGIS Simple Features Access
34. ESWC 2015 Tutorial 32
Well-Known Text (WKT)
• WKT is an OGC and ISO standard for representing geometries,
coordinate reference systems, and transformations between
coordinate reference systems.
• WKT is specified in OpenGIS Simple Feature Access - Part 1:
Common Architecture standard which is the same as the ISO 19125-1
standard. Download from
http://portal.opengeospatial.org/files/?artifact_id=25355 .
• This standard concentrates on simple features: features with all
spatial attributes described piecewise by a straight line or a
planar interpolation between sets of points.
37. ESWC 2015 Tutorial 35
Geography Markup Language
(GML)
• GML is an XML-based encoding standard for the
representation of geospatial data.
• GML provides XML schemas for defining a variety of concepts:
geographic features, geometry, coordinate reference
systems, topology, time and units of measurement.
• GML profiles are subsets of GML that target particular
applications.
• Examples: Point Profile, GML Simple Features Profile etc.
40. ESWC 2015 Tutorial 38
OpenGIS Simple Features Access
• OGC has also specified a standard for the storage, retrieval,
query and update of sets of simple features using
relational DBMS and SQL.
• This standard is “OpenGIS Simple Feature Access - Part 2: SQL
Option” and it is the same as the ISO 19125-2 standard. Download from
http://portal.opengeospatial.org/files/?artifact_id=25354.
• Related standard: ISO 13249 SQL/MM - Part 3.
41. ESWC 2015 Tutorial 39
OpenGIS Simple Features Access
(cont’d)
• The standard covers two implementations options: (i) using only
the SQL predefined data types and (ii) using SQL with
geometry types.
• SQL with geometry types:
• We use the WKT geometry class hierarchy presented earlier
to define new geometric data types for SQL
• We define new SQL functions on those types.
42. ESWC 2015 Tutorial 40
SQL with Geometry Types -
Functions
• Functions that request or check properties of a geometry:
• ST_Dimension(A:Geometry):Integer
• ST_GeometryType(A:Geometry):Character Varying
• ST_AsText(A:Geometry): Character Large Object
• ST_AsBinary(A:Geometry): Binary Large Object
• ST_SRID(A:Geometry): Integer
• ST_IsEmpty(A:Geometry): Boolean
• ST_IsSimple(A:Geometry): Boolean
43. ESWC 2015 Tutorial 41
SQL with Geometry Types –
Functions (cont’d)
• Functions that test topological relations between two geometries
using the DE-9IM:
• ST_Equals(A:Geometry, B:Geometry):Boolean
• ST_Disjoint(A:Geometry, B:Geometry):Boolean
• ST_Intersects(A:Geometry, B:Geometry):Boolean
• ST_Touches(A:Geometry, B:Geometry):Boolean
• ST_Crosses(A:Geometry, B:Geometry):Boolean
• ST_Within(A:Geometry, B:Geometry):Boolean
• ST_Contains(A:Geometry, B:Geometry):Boolean
• ST_Overlaps(A:Geometry, B:Geometry):Boolean
• ST_Relate(A:Geometry, B:Geometry, Matrix: Char(9)):Boolean
44. ESWC 2015 Tutorial 42
DE-9IM Relation Definitions
• A equals B can also be
defined by the pattern
TFFFTFFFT.
• A intersects B is the
negation of A disjoint B
• A contains B is equivalent
to B within A
45. ESWC 2015 Tutorial 43
SQL with Geometry Types –
Functions (cont’d)
• Functions for constructing new geometries out of existing
ones:
• ST_Boundary(A:Geometry):Geometry
• ST_Envelope(A:Geometry):Geometry
• ST_Intersection(A:Geometry, B:Geometry):Geometry
• ST_Union(A:Geometry, B:Geometry):Geometry
• ST_Difference(A:Geometry, B:Geometry):Geometry
• ST_SymDifference(A:Geometry, B:Geometry):Geometry
• ST_Buffer(A:Geometry, distance:Double):Geometry
46. ESWC 2015 Tutorial 44
Geospatial Relational DBMS
• The OpenGIS Simple Features Access Standard is today been
used in all relational DBMS with a geospatial extension.
• The abstract data type mechanism of the DBMS allows
the representation of all kinds of geospatial data types
supported by the standard.
• The query language (SQL) offers the functions of the
standard for querying data of these types.
47. • The book Geographic Information Systems and Science is a nice introduction to GIS. See:
http://eu.wiley.com/WileyCDA/WileyTitle/productCd-EHEP001475.html
• The following papers present the DE-9IM model:
Eliseo Clementini, Paolino Di Felice and Peter van Oosterom.
A Small Set of Formal Topological Relationships Suitable for End-User Interaction. SSD
1993: 277-295
http://link.springer.com/chapter/10.1007%2F3-540-56869-7_16
E. Clementini and P. Felice. A Comparison of Methods for Representing Topological
Relationships. Information Sciences 80 (1994), pp. 1-34.
http://www.sciencedirect.com/science/article/pii/106901159400033X The paper
• The paper below surveys a lot of interesting results on the RCC calculus:
J. Renz, B. Nebel, Qualitative Spatial Reasoning using Constraint
Calculi, in: M. Aiello, I. Pratt-Hartmann and J. van Benthem (eds.),
Handbook of Spatial Logics, pp. 161–215, 2007, Springer.
http://users.cecs.anu.edu.au/~jrenz/papers/renz-nebel-los.pdf
• The two OGC standards mentioned in the slides.
Readings
48. Part 2:
Spatial and Temporal Data in RDF:
stRDF/stSPARQL and GeoSPARQL
ESWC 2015 Tutorial
Publishing and Interlinking Linked Geospatial Data
Dept. of Informatics and Telecommunications
National and Kapodistrian University of Athens
49. ESWC 2015 Tutorial 2
Common Approach
• The two proposals (stRDF/stSPARQL and
GeoSPARQL) offer constructs for:
o Developing ontologies for spatial
and temporal data.
o Encoding spatial and temporal
data that use these ontologies in
RDF.
o Extending SPARQL to query spatial
and temporal data.
51. ESWC 2015 Tutorial 4
The data model stRDF
An extension of RDF for the representation of
geospatial information that changes over
time.
Geospatial dimension:
Spatial data types are introduced.
Geospatial information is representing using
spatial literals of these datatypes.
OGC standards WKT and GML are used for
the serialization of spatial literals.
Temporal dimension (later)
Proposed independently and around the same time
as GeoSPARQL (starting with an ESWC 2010 paper
by Koubarakis and Kyzirakos).
[ Kyzirakos, Karpathiotakis
& Koubarakis 2012 ]
62. ESWC 2015 Tutorial 15
stSPARQL: Geospatial SPARQL 1.1
We define a SPARQL extension function for each function
defined in the OpenGIS Simple Features Access standard
Basic functions
Get a property of a geometry
xsd:int strdf:dimension(strdf:geometry A)
xsd:string strdf:geometryType(strdf:geometry A)
xsd:int strdf:srid(strdf:geometry A)
Get the desired representation of a geometry
xsd:string strdf:asText(strdf:geometry A)
xsd:string strdf:asGML(strdf:geometry A)
Test whether a certain condition holds
xsd:boolean strdf:isEmpty(strdf:geometry A)
xsd:boolean strdf:isSimple(strdf:geometry A)
63. ESWC 2015 Tutorial 16
stSPARQL: Geospatial SPARQL 1.1
Functions for testing topological spatial
relationships
OGC Simple Features Access
xsd:boolean strdf:equals(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:disjoint(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:intersects(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:touches(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:crosses(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:within(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:contains(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:overlaps(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:relate(strdf:geometry A, strdf:geometry B,
xsd:string intersectionPatternMatrix)
Egenhofer
RCC-8
64. ESWC 2015 Tutorial 17
stSPARQL: Geospatial SPARQL 1.1
Spatial analysis functions
Construct new geometric objects from existing geometric objects
strdf:geometry strdf:boundary(strdf:geometry A)
strdf:geometry strdf:envelope(strdf:geometry A)
strdf:geometry strdf:convexHull(strdf:geometry A)
strdf:geometry strdf:intersection(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:union(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:difference(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:symDifference(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:buffer(strdf:geometry A, xsd:double distance, xsd:anyURI units)
Spatial metric functions
xsd:float strdf:distance(strdf:geometry A, strdf:geometry B, xsd:anyURI units)
xsd:float strdf:area(strdf:geometry A)
Spatial aggregate functions
strdf:geometry strdf:union(set of strdf:geometry A)
strdf:geometry strdf:intersection(set of strdf:geometry A)
strdf:geometry strdf:extent(set of strdf:geometry A)
65. ESWC 2015 Tutorial 18
stSPARQL: Geospatial SPARQL 1.1
Select clause
Construction of new geometries (e.g., strdf:buffer(?geo, 0.1, uom:metre))
Spatial aggregate functions (e.g., strdf:union(?geo))
Metric functions (e.g., strdf:area(?geo))
Filter clause
Functions for testing topological spatial relationships between spatial terms (e.g.,
strdf:contains(?G1, strdf:union(?G2, ?G3)))
Numeric expressions involving spatial metric functions
(e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2)+1)
Boolean combinations
Having clause
Boolean expressions involving spatial aggregate functions and spatial metric
functions or functions testing for topological relationships between spatial terms
(e.g., strdf:area(strdf:union(?geo))>1)
66. ESWC 2015 Tutorial 19
stSPARQL: An example (1/3)
SELECT ?name
WHERE {
?comm rdf:type gag:LocalCommunity;
gag:name ?name;
gag:hasGeometry ?commGeo .
?ba rdf:type noa:BurntArea;
noa:hasGeometry ?baGeo .
FILTER(strdf:overlaps(?commGeo,?baGeo))
}
Spatial
Function
Return the names of local communities that have
been affected by fires
67. ESWC 2015 Tutorial 20
stSPARQL: An example (2/3)
SELECT ?ba ?baGeom
WHERE {
?r rdf:type clc:Region;
clc:hasGeometry ?rGeom;
clc:hasCorineLandUse ?f.
?f rdfs:subClassOf clc:Forest.
?c rdf:type gag:LocalCommunity;
gag:hasGeometry ?cGeom.
?ba rdf:type noa:BurntArea;
noa:hasGeometry ?baGeom.
FILTER( strdf:intersects(?rGeom,?baGeom) &&
strdf:distance(?baGeom,?cGeom,uom:metre) < 200)}
Spatial
Functions
Find all burnt forests near local communities
68. ESWC 2015 Tutorial
Spatial
Function
21
SELECT ?burntArea
(strdf:intersection(?baGeom,
strdf:union(?fGeom))
AS ?burntForest)
WHERE {
?burntArea rdf:type noa:BurntArea;
noa:hasGeometry ?baGeom.
?forest rdf:type clc:Region;
clc:hasLandCover clc:ConiferousForest;
clc:hasGeometry ?fGeom.
FILTER(strdf:intersects(?baGeom,?fGeom))
}
GROUP BY ?burntArea ?baGeom
Compute the parts of burnt areas that lie in
coniferous forests.
stSPARQL: An example (3/3)
Spatial
Aggregate
69. ESWC 2015 Tutorial
Time dimensions in Linked Data
User-defined time: A time value (literal) with no special
semantics.
Valid time: The time when a fact (represented by a triple) is true
in the modeled reality.
Transaction time: The time when the triple is current in the
database.
70. ESWC 2015 Tutorial
The time dimension of stRDF: The valid
time of triples
The following extensions are introduced in stRDF:
• Timeline: the (discrete) value space of the datatype xsd:dateTime of
XML-Schema
• Two kinds of time primitives are supported: time instants and time periods.
• A time instant is an element of the time line.
• A time period is an expression of the form [B, E) or [B, E] or (B, E] or (B, E) where B and E
are time instants called the beginning and ending time of the period.
• The new datatype strdf:period is introduced.
23
rdfs:Literal
strdf:WKT strdf:GML
strdf:period
strdf:geometry
71. ESWC 2015 Tutorial
The time dimension of stRDF (cont’d)
• Triples are extended to quads.
• A temporal triple (quad) is an expression of the form
s p o t.
where s p o. is an RDF triple and t is a time instant or time
period called the valid time of the triple.
• The temporal constants NOW and UC (“until changed”) are
introduced.
24
73. ESWC 2015 Tutorial 26
Forest
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02, "UC")"^^strdf:period .
An example with valid time
74. ESWC 2015 Tutorial
An example with valid time
27
Forest
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02, "UC")"^^strdf:period .
Burnt area
75. ESWC 2015 Tutorial 28
Forest Burnt area
noa:ba1 rdf:type noa:BurntArea
"[2007-08-25T11:00:00+02, "UC")"^^strdf:period .
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02, "UC")"^^strdf:period .
An example with valid time
76. ESWC 2015 Tutorial 29
Forest Burnt area
noa:ba1 rdf:type noa:BurntArea
"[2007-08-25T11:00:00+02, "UC"))"^^strdf:period .
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02,2007-08-25T11:00:00+02)"^^strdf:period .
An example with valid time
77. ESWC 2015 Tutorial 30
Forest Burnt area Agricultural
area
clc:region1 clc:hasLandCover clc:AgriculturalArea
"[2009-08-25T11:00:00+02, "UC")"^^strdf:period .
noa:ba1 rdf:type noa:BurntArea
"[2007-08-25T11:00:00+02,2009-08-25T11:00:00+02)"^^strdf:period .
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02,2007-08-25T11:00:00+02)"^^strdf:period .
An example with valid time
78. ESWC 2015 Tutorial
The time dimension of stSPARQL
The following extensions are introduced:
• Triple patterns are extended to quad patterns (the last component is a temporal
term: variable or constant)
• Temporal extension functions are introduced:
• Allen's temporal relations (e.g., strdf:after)
• Period constructors (e.g., strdf:period_intersect)
• Temporal aggregates (e.g., strdf:maximalPeriod)
31
79. ESWC 2015 Tutorial
• Find the current land cover of all areas in the dataset
SELECT ?clc
WHERE {
?R rdf:type clc:Region .
?R clc:hasLandCover ?clc ?t1 .
FILTER(strdf:during ("NOW", ?t1))
}
Temporal extension function
Temporal constant
Example Query
32
Quad Pattern
81. ESWC 2015 Tutorial 34
GeoSPARQL
GeoSPARQL is an OGC standard.
Functionalities similar to stRDF/stSPARQL:
Geometries are represented using literals of spatial datatypes.
Literals are serialized using WKT and GML.
The same families of functions are offered for querying geometries.
Functionalities beyond stSPARQL:
High level ontologies inspired from GIS terminology.
Topological relations can now be asserted as well so that reasoning and querying on
them is possible.
A query rewriting mechanism.
Functionalities of stSPARQL that are not included in GeoSPARQL:
• Geospatial aggregate functions
• Temporal dimension
82. ESWC 2015 Tutorial
GeoSPARQL Components
Core
Topology Vocabulary
Extension
- relation family
Geometry Extension
- serialization
- version
Geometry Topology
Extension
- serialization
- version
- relation family
Query Rewrite
Extension
- serialization
- version
- relation family
RDFS Entailment
Extension
- serialization
- version
- relation family
Parameters
• Serialization
• WKT
• GML
• Relation Family
• Simple
Features
• RCC-8
• Egenhofer
83. ESWC 2015 Tutorial 36
GeoSPARQL Core
Defines two top level
classes that can be used to
organize geospatial data.
84. ESWC 2015 Tutorial 37
GeoSPARQL Geometry Extension
Provides vocabulary for asserting and querying data
about the geometric attributes of a feature.
88. ESWC 2015 Tutorial 41
Example Data
gag:Olympia
rdf:type gag:MunicipalCommunity;
gag:name "Ancient Olympia";
gag:population "184"^^xsd:int;
geo:hasGeometry ex:polygon1.
ex:polygon1
rdf:type geo:Geometry;
geo:asWKT "http://www.opengis.net/def/crs/OGC/1.3/CRS84
POLYGON((21.5 18.5,23.5 18.5,
23.5 21,21.5 21,21.5 18.5))"
^^sf:wktLiteral.
Datatype from
Geometry
extension
Geometry
literal
Property from
Geometry
extension
Property from
Geometry
extension
Class from
Geometry
extension
89. ESWC 2015 Tutorial 42
Non-Topological Query Functions of the
Geometry Extension
The following non-topological query functions are also offered:
geof:distance
geof:buffer
geof:convexHull
geof:intersection
geof:union
geof:difference
geof:symDifference
geof:envelope
geof:boundary
90. ESWC 2015 Tutorial 43
GeoSPARQL Topology Vocabulary Extension
The extension is parameterized by the family of topological
relations supported.
Topological relations for simple features
The Egenhofer relations e.g., geo:ehMeet
The RCC-8 relations e.g., geo:rcc8ec
91. ESWC 2015 Tutorial
gag:Olympia
rdf:type gag:MunicipalCommunity;
gag:name "Ancient Olympia".
gag:OlympiaMUnit
rdf:type gag:MunicipalityUnit;
gag:name "Municipality Unit of
Ancient Olympia".
gag:OlympiaMunicipality
rdf:type gag:Municipality;
gag:name "Municipality of
Ancient Olympia".
gag:Olympia geo:sfWithin gag:OlympiaMUnit .
gag:OlympiaMUnit geo:sfWithin gag:OlympiaMunicipality.
44
Greek Administrative Geography
Simple Features
topological
relation
92. ESWC 2015 Tutorial 45
GeoSPARQL: An example
SELECT ?m
WHERE {
?m rdf:type gag:MunicipalityUnit.
?m geo:sfContains gag:Olympia.
}
Find the municipality unit that contains
the community of Ancient Olympia
Simple Features
topological relation
Answer: ?m = gag:OlympiaMUnit
93. ESWC 2015 Tutorial 46
GeoSPARQL: An example
SELECT ?m
WHERE {
?m rdf:type gag:Municipality.
?m geo:sfContains gag:Olympia.
}
Find the municipality that contains the
community of Ancient Olympia
Answer?
94. ESWC 2015 Tutorial 47
Example (cont’d)
The answer to the previous query is
?m = gag:OlympiaMunicipality
GeoSPARQL does not tell you how to
compute this answer which needs
reasoning about the transitivity of
relation geo:sfContains.
Options:
• Use rules
• Use constraint-based techniques
96. ESWC 2015 Tutorial 49
Example Query
SELECT ?name
WHERE {
?comm rdf:type gag:LocalCommunity;
gag:name ?name;
geo:hasGeometry ?commGeo .
?ba rdf:type noa:BurntArea;
geo:hasGeometry ?baGeo .
FILTER(geof:sfOverlaps(?commGeo,?baGeo))
}
Geometry Topology
Extension Function
Return the names of local communities that have
been affected by fires
Geometry
Extension
Property
Geometry
Extension
Property
97. ESWC 2015 Tutorial 50
GeoSPARQL Query Rewrite Extension
Provides a collection of RIF rules that use topological extension
functions to establish the existence of topological predicates.
Example: given the RIF rule named geor:sfWithin, the
serializations of the geometries of gag:Athens and
gag:Greece named AthensWKT and GreeceWKT and the fact
that
geof:sfWithin(AthensWKT, GreeceWKT)
returns true from the computation of the two geometries, we can
derive the triple
gag:Athens geo:sfWithin gag:Greece
One possible implementation is to re-write a given SPARQL
query.
99. ESWC 2015 Tutorial 52
Example
SELECT ?feature
WHERE {
?feature geo:sfWithin
geonames:OlympiaMunicipality.
}
Find all features that are inside the municipality of Ancient
Olympia
101. ESWC 2015 Tutorial
Specifies the RDFS entailments that follow from the class and
property hierarchies defined in the other components e.g., the
Geometry Extension.
Systems should use an implementation of RDFS entailment to
allow the derivation of new triples from those already in a graph.
54
GeoSPARQL RDFS Entailment Extension
102. ESWC 2015 Tutorial 55
Example
Given the triples
ex:f1 geo:hasGeometry ex:g1 .
geo:hasGeometry rdfs:domain geo:Feature.
we can infer the following triples:
ex:f1 rdf:type geo:Feature .
ex:f1 rdf:type geo:SpatialObject .
103. ESWC 2015 Tutorial
Readings
56
• Material from the Strabon web site (http://strabon.di.uoa.gr ).
• The following tutorial paper which introduces to the topic of linked geospatial data:
M. Koubarakis, M. Karpathiotakis, K. Kyzirakos, C. Nikolaou and M. Sioutis. Data Models and
Query Languages for Linked Geospatial Data. Reasoning Web Summer School 2012.
http://strabon.di.uoa.gr/files/survey.pdf
• The following paper which introduces stSPARQL and Strabon:
K. Kyzirakos, M. Karpathiotakis and M. Koubarakis. Strabon: A Semantic Geospatial DBMS.
11th International Semantic Web Conference (ISWC 2012). November 11-15, 2012. Boston,
USA.
http://iswc2012.semanticweb.org/sites/default/files/76490289.pdf
• The following paper which introduces the temporal features of stSPARQL and
Strabon:
K. Bereta, P. Smeros and M. Koubarakis. Representing and Querying the Valid Time of
Triples for Linked Geospatial Data. In the 10th Extended Semantic Web Conference (ESWC
2013). Montpellier, France. May 26-30, 2013.
http://www.strabon.di.uoa.gr/files/eswc2013.pdf
• The GeoSPARQL standard found at http://www.opengeospatial.org/standards/geosparql
104. ESWC 2015 Tutorial
Readings (cont’d)
57
• The following paper which introduces the RDFi framework:
Charalampos Nikolaou and Manolis Koubarakis. Incomplete Information in RDF.
In the 7th International Conference on Web Reasoning and Rule Systems (RR
2013). Mannheim, Germany. July 27-29, 2013.
http://cgi.di.uoa.gr/~koubarak/publications/rr2013.pdf
• The following paper which introduces the benchmark Geographica:
G. Garbis, K. Kyzirakos and M. Koubarakis. Geographica: A Benchmark for
Geospatial RDF Stores. In the 12th International Semantic Web Conference
(ISWC 2013). Sydney, Australia. October 21-25, 2013.
http://cgi.di.uoa.gr/~koubarak/publications/Geographica.pdf
106. Outline
Mapping relational data to RDF graphs
Mapping non-relational data to RDF graphs
Geospatial Extensions for mapping geospatial data
to RDF graphs
Implemented Systems
Demonstration
2
107. Mapping relational data to RDF graphs
Sitecode Sitename ReleaseDate …
DE0916391 NTP S-H W 2011-01-27
DE1003301 DOGGERB
ANK
2011-01-27
ProtectedArea
?
Natura 2000 is an ecological network
designated under the Birds Directive and
the Habitats Directive which form the
cornerstone of the nature conservation
policy of the European Union.
http://ec.europa.eu/environment/nature/natura2000/index_en.htm
http://www.eea.europa.eu/data-and-maps/data/natura-6
108. Direct Mapping
W3C Recommendation from 2012
http://www.w3.org/TR/rdb-direct-mapping/
Relational tables are mapped to classes defined by
an RDF vocabulary.
Attributes of each table are mapped to RDF
properties that represent the relation between
subject and object resources.
Identifiers, class names, properties and instances
are generated automatically following the labels of
the input data. 4
110. The language R2RML
R2RML is a language for expressing customized
mappings from relational databases to RDF graphs
R2RML is a W3C Recommendation from 2012
http://www.w3.org/TR/r2rml/
R2RML mappings provide the user with the ability
to express the desired transformation of existing
relational data into the RDF data model, following a
structure and a target vocabulary that is chosen by
the user.
6
112. The language R2RML (cont’d)
A logical table can be
a relational table that is explicitly stored in the
database
an SQL view
an SQL select query
A triples map is a rule that defines how each
tuple of the logical table will be mapped to a set
of RDF triples. It consists of
a subject map
zero or more predicate-object maps.
8
113. The language R2RML (cont’d)
A subject map is a rule that defines how to
generate the URI that will be the subject of each
generated RDF triple.
A predicate-object map consists of predicate maps
and object maps.
A predicate map defines the RDF property to be
used to relate the subject and the object of the
generated triple.
An object map defines how to generate the object
of the triple which originates from the current row
of the logical table.
9
114. The language R2RML (cont’d)
Subject, predicate, object and graph maps are
term maps. A term map is a function that
generates an RDF term from a logical table.
Three types of term maps are defined:
constant-valued term maps
column-valued term maps
template-valued term maps
10
115. The language R2RML (cont’d)
A referencing object map allows using the
subjects of another triples map as the objects
generated by a predicate-object map.
Optionally, it has one or more join condition
properties.
11
Predicate
ObjectMap
RefObjectMap
TriplesMap
JoinCondition
column name
column name
source: http://www.w3.org/TR/r2rml/#dfn-predicate-map
rr:child
rr:parent
rr:join
Condition*
rr:parent
TriplesMaprr:object
Map
116. The language R2RML – Example
Sitecode Sitename ReleaseDate …
DE0916391 NTP S-H W 2011-01-27
DE1003301 DOGGERB
ANK
2011-01-27
ProtectedArea Protected
Area
xsd:string
Site
name
@base <http://foo.example/DB/> .
<NaturaMapping>
rr:subjectMap [
rr:template "ProtectedArea/SiteCode={SiteCode}";
rr:class <ProtectedArea> ];
rr:predicateObjectMap [
rr:predicate ProtectedArea:SiteName;
rr:objectMap [ rr:column "SiteName"; ]; ] .
<ProtectedArea/Sitecode=DE0916391> rdf:type <ProtectedArea> .
<ProtectedArea/Sitecode=DE0916391> <ProtectedArea#Sitename> "NTP S-H W" .
<ProtectedArea/Sitecode=DE1003301> rdf:type <ProtectedArea> .
<ProtectedArea/Sitecode=DE1003301> <ProtectedArea#Sitename> "DOGGERBANK" .
118. RDF Mapping Language (RML)
RML is a recently proposed mapping language that defines how to
map heterogeneous sources into RDF.
http://semweb.mmlab.be/rml/spec.html
RML is defined as a superset of the W3C-standard R2RML
R2RML RML
Logical Table rr:logicalTable Logical Source rml:logicalSource
Table Name rr:tableName URI rml:source
column rr:column reference rml:reference
SQL Reference Formulation rml:referenceFormulation
per row iteration defined iterator rml:iterator
source: http://semweb.mmlab.be/rml/RML_R2RML.html
120. RML extensions
A logical source refers to the input dataset that will
be converted to an RDF graph.
Each logical source has
a source property pointing to input data
a logical iterator that defines the iteration pattern over
the input data source
an optional reference formulation property that defines
the query language that may be used (e.g., SQL2008,
XPath, JSONPath)
An RML reference is a term map that refers to a
column name (SQL, CSV), an XML element or
attribute, or an JSON object.
122. Mapping geospatial data to RDF graphs
Geospatial data are available in formats such
as:
• ESRI shape files
• KML documents
• GeoJSON documents
• XML documents
Geospatial data may also be stored in
spatially-enabled relational databases.
123. Extending R2ML with transformation-valued
term maps
LogicalTable
PredicateObject
Map
GraphMap
TriplesMap SubjectMap
ObjectMap
PredicateMap
RefObjectMap Join
TermMap
Constant
Column
Template
Child
Parent
Function
Argument
Map
Argument
Map
Function
124. Extending RML with transformation-valued
term maps
LogicalSource
PredicateObject
Map
GraphMap
TriplesMap SubjectMap
ObjectMap
PredicateMap
RefObjectMap Join
TermMap
Source
Iterator
Reference
Formulation
Constant
Column
Template
Child
Parent
Function
Argument
Map
Argument
Map
Function
125. Transformation-valued term maps
A transformation-valued term maps is a term map
that generates an RDF term by applying a SPARQL
extension function on one or more term maps.
A transformation-valued term map has
exactly one rrx:function property that defines a
SPARQL extension function that performs the desired
transformation
one rrx:argumentMap property that has as range an
rdf:List of term maps that define the arguments to
be passed to the transformation function
133. GeoTriples
Open Source software
Released under Mozilla Public Licence v2.0
Available at:
https://github.com/LinkedEOData/GeoTriples
Extends the D2RQ Platform
Extends the iMinds lab RML processor
Provides both a graphical user interface and a
command line interface
29
135. Automatic generation
of R2RML mappings (cont’d)
Generate two triples maps for each table that has a
geometry column.
Thematic triples map for the non-geometric information
Spatial triples map for the geometric information
The spatial triples map contains multiple
transformation functions over the input geometries
in order to generate a GeoSPARQL compliant
dataset.
31
NaturaGeometryNaturaArea
geo:
hasGeometry
(rr:joinCondition)
136. Automatic generation of RML mappings
for GML documents
Each geometric object is mapped to a geo:Geometry
instance
For each geometric object we generate a set of predicate
object maps that use the appropriate transformation
functions for producing a GeoSPARQL compliant dataset
Each simple element is mapped to a predicate object map
Each non simple element is mapped to a triples map
Appropriate mappings are generated for linking nested
elements
32
Mapping
GeneratorXSD
RML
mapping
138. Discovering Spatial and Temporal Links
among RDF Graphs
Publishing and Interlinking Linked Geospatial Data
In Conjunction with the 12th Extended Semantic Web Conference
Portoroz, Slovenia, 1st June 2015
Presenter: Panayiotis Smeros
139. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 2
Outline
• Introduction to Entity Resolution and Link
Discovery
– Examples, Definitions, Common Problems
• Spatial Entity Resolution
• Spatial and Temporal Link Discovery
– Background and Developed Methods
– Extensions to the Silk Framework
– Hands-on
140. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 3
Entities in Real-World
source
source
Most of our knowledge about the world is based on entities
and their relations:
141. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 4
Entities in Data-World
Portoroz Portorož بورتوروز Порторож Πορτορόζ
Portorose Портороз Порторожу Portorožu
Порторож
Portorož (Italian: Portorose, literally "Port of
Roses"), is an Adriatic - Mediterranean coastal
settlement in the Municipality of Piran in
southwestern Slovenia. Its modern development
began in the late 19th century with appearance of
first health resorts.
http://www.geonames.org/3192682/portoroz.html
http://en.wikipedia.org/wiki/Portoroz
http://www.portoroz.si/en/
…
source
Many names, descriptions or IDs (URIs) are used for the
same real-world entity:
142. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 5
Content Providers
News about Portoroz
Reviews of hotels in Portoroz
Pictures about Portoroz
Videos for Portoroz
Wiki pages about Portoroz
Social networks in Portoroz
Many applications provide valuable information about each of
these entities:
143. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 6
Content Providers
News about Portoroz
Reviews of hotels in Portoroz
Pictures about Portoroz
Videos for Portoroz
Wiki pages about Portoroz
Social networks in Portoroz
Many applications provide valuable information about each of
these entities:
Solution?
144. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 7
Entity Resolution
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
145. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 8
Entity Resolution (Example)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
146. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 9
Entity Resolution (Example)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
sameAs
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
147. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 10
Spatial Entity Resolution (Example)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
sameAs
location = 45.51663, 13.57996 location = 45.51661, 13.57998
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
148. Entity Resolution (Definition)
Let 𝑆 and 𝑇 be two sets of entities. We define a distance
(similarity) function 𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 and a distance (similarity)
threshold 𝜃 𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
as follows:
𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦: 𝑆 × T → [0,1] , 𝜃 𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
∈ 0,1
We define the set of discovered similarity links 𝐷𝐿 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 as
follows:
𝐷𝐿 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = s, sameAs, t 𝑠 ∈ 𝑆 𝑡 ∈ 𝑇 𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑠, 𝑡 < 𝜃 𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
}
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 11
149. Link Discovery
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 12
Source Source
Link Discovery is the fourth and the most important Linked Data
Principle.
Establish semantic relations between entities in order to enrich the
information that is known about them. [Bizer et al., IJSWIS’06]
150. Link Discovery (Definition)
Let 𝑆 and 𝑇 be two sets of entities and 𝑅 the set of relations
that can be discovered between entities. For a relation 𝑟 ∈ 𝑅,
w.l.o.g., we define a distance function 𝑑 𝑟 and a distance
threshold 𝜃 𝑑 𝑟
as follows:
𝑑 𝑟: S × T → [0,1] , 𝜃 𝑑 𝑟
∈ 0,1
We define the set of discovered links for relation 𝑟 (𝐷𝐿 𝑟) as
follows:
𝐷𝐿 𝑟 = s, r, t 𝑠 ∈ 𝑆 𝑡 ∈ 𝑇 𝑑 𝑟 𝑠, 𝑡 < 𝜃 𝑑 𝑟
}
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 13
151. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 14
Link Discovery (Example)
Natura (2000) - Fields Fields - OSM Water Bodies
153. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 16
Main Problem: Heterogeneity
• Different Data Providers create Heterogeneous
Datasets
– Example: Literal Heterogeneity (case, language, etc).
• We focus on:
– Heterogeneity in the Representation of Geospatial
Information in RDF
– Heterogeneity in the Representation of Temporal
Information in RDF
name = PORTOROZ name = Portorose
154. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 17
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
_:1 rdf:type wgs84Geo:Point .
_:1 wgs84Geo:lat “10“^^xsd:double .
_:1 wgs84Geo:long “20“^^xsd:double .
155. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 18
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
• Different Vocabularies
156. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 19
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
• Different Vocabularies
• Different Serializations of Geometries
157. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 20
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
• Different Vocabularies
• Different Serializations of Geometries
• Geometries expressed in Different Coordinate
Reference Systems (CRS)
158. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 21
Heterogeneity in the Representation of
Geospatial Information in RDF
source
159. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 22
Heterogeneity in the Representation of
Geospatial Information in RDF
• Different Sampling Values
• Different Granularity
• Different Rounding Effects
source
160. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 23
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
161. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 24
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
• Different Vocabularies
162. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 25
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
• Different Vocabularies
• Different Time Zones
163. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 26
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
• Different Vocabularies
• Different Time Zones
• Time Instants and Periods
164. Outline
• Introduction to Entity Resolution and Link
Discovery
– Examples, Definitions, Common Problems
• Spatial Entity Resolution
• Spatial and Temporal Link Discovery
– Background and Developed Methods
– Extensions to the Silk Framework
– Hands-on
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 27
165. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 28
Spatial Entity Resolution (Example
Revisited)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
sameAs
location = 45.51663, 13.57996 location = 45.51661, 13.57998
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
166. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 29
Spatial Entity Resolution (1/4)
• Location Name Similarity
– Edit, Jaccard distance
• Location Similarity
– Euclidean distance
• Location Type Similarity
– (e.g. type “river” is similar to type “stream”)
Combines the above similarities to compute the
overall similarity between entities
167. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 30
Spatial Entity Resolution (2/4)
• Similarity measure: Hausdorff Distance
– Intuitively Hausdorff Distance is defined as the
largest distance between the closest points of
two geometric shapes
• Handling Geospatial Heterogeneity
– Converts geometries to a common
vocabulary (NeoGeo)
– Assumes WGS-84 CRS
• Optimization
– Simplifies Geometries with Ramer-Douglas-Peucker algorithm
168. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 31
Spatial Entity Resolution (3/4)
• Heuristic Combination of:
– URI Similarity
– Label Similarity
• Considering the language of the labels
– Location Similarity
• Assuming the W3C Geo vocabulary
– Geometric Similarity
• Minimum Distance between two Geometries
169. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 32
Spatial Entity Resolution (4/4)
• Non-Spatial Criteria
– Implemented within the LIMES framework
• Geometric Similarity
– Hausdorff Distance
– Optimizations
• Bounding Circle: Avoids useless comparisons
μ(s, t) = δ(ζ(s), ζ(t)) − r (s) − r (t) > θ ⇒ δ(s, t) > θ
• Space tiling: Reduces the quadratic number of comparisons
170. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 33
Spatial Entity Resolution
• [Sehgal et al. GIS’06]
– Spatial and non-Spatial Criteria
– Only Location Similarity
• [Salas et al., TerraCognita’11]
– Only Spatial Criteria
– Complex Geometric Similarity Methods
• [Vilches-Blázquez et al., AGILE’12]
– Spatial and non-Spatial Criteria
– Simple Geometric Similarity Methods
• [Ngonga Ngomo, ISWC’13]
– Spatial and non-Spatial Criteria
– Complex Geometric Similarity Methods
– Reduced number of comparisons
171. Outline
• Introduction to Entity Resolution and Link
Discovery
– Examples, Definitions, Common Problems
• Spatial Entity Resolution
• Spatial and Temporal Link Discovery
– Background and Developed Methods
– Extensions to the Silk Framework
– Hands-on
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 34
172. Link Discovery (reminder)
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 35
Source Source
Link Discovery is the fourth and the most important Linked Data
Principle.
Establish semantic relations between entities in order to enrich the
information that is known about them. [Bizer et al., IJSWIS’06]
173. Background on Spatial Relations (1/2)
• Dimensionally Extended 9-Intersection Model
[Clementini et al., SSD'93]
– Captures topological relations in ℝ2, by considering the
dimension (dim) of the intersections involving the
interior (I), the boundary (B) and the exterior (E) of the
two geometries.
– Examples: Intersects, Equals, Touches, Disjoint,
Contains, Crosses, Covers, CoveredBy and Within
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 36
174. Background on Spatial Relations (2/2)
• Region Connection Calculus [Randell et al. KR’92]
– RCC-8: a well-known subset of RCC, which is based on
eight topological relations
– DC stands for DisConnected, EC for Externally
Connected, TPP for Tangential Proper Part, NTPP, for
Non Tangential Proper Part, and TPPi and NTPPi are
the inverse relations of TPP and NTPP
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 37
175. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 38
Background on Temporal Relations
• Allen’s Interval Calculus [Allen, Commun. ACM’83]
– thirteen jointly exclusive and pairwise disjoint qualitative
relations
176. Spatial and Temporal Relations
• We consider the previous Spatial (𝑅 𝑠) and Temporal (𝑅𝑡)
relations as Boolean relations (𝑅 𝐵) i.e., either they hold or
they do not:
𝑅 𝑠, 𝑅𝑡 ⊂ 𝑅 𝐵
• 𝑅 𝐵 constitutes a special subset of 𝑅. The distance function
𝑑 𝑟 and the distance threshold 𝜃 𝑑 𝑟
for a relation 𝑟 ∈ 𝑅 𝐵 are
defined as follows:
𝑑 𝑟(s,t) =
0 𝑖𝑓 𝑟 ℎ𝑜𝑙𝑑𝑠
1 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
, 𝜃 𝑑 𝑟
= 1
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 39
177. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 40
Spatial and Temporal Transformations
(1/2)
• CRS Transformation. The geometries of a dataset can be expressed
in a Coordinate Reference System that is more precise for the
geographic area that they describe (e.g., the GGRS87 for Greece).
This transformation converts the CRS of a geometry to the World
Geodetic System (WGS 84)
• Vocabulary Transformation. This transformation converts geometry
literals from GeoSPARQL, stRDF or W3C GEO to a common
vocabulary (GeoSPARQL)
• Serialization Transformation. This transformation converts the
geometries of a dataset to a common serialization (WKT)
• Time-Zone Transformation. This transformation converts the time
zone of a given time interval to Coordinated Universal Time (UTC)
• Period Transformation. This transformation converts a time instant to
a period with the same starting and ending point
178. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 41
Spatial and Temporal Transformations
(2/2)
• Simplification Transformation. Some datasets have very complex
geometries, which makes the computation of spatial relations inefficient. This
transformation simplifies a geometry according to a given distance tolerance,
ensuring that the result is a valid geometry having the same dimension and
number of components as the input
• Envelope Transformation. This transformation computes the envelope (i.e.,
the minimum bounding rectangle) of a geometry and it is useful in cases that
we want to compute approximate spatial relations between two datasets
• Area Transformation. In some cases it is enough to compare just the areas of
two geometries to infer whether they are the same or not. This transformation
computes the area of a given geometry in square metres
• Points-To-Centroid Transformation. In crowdsourcing datasets like
OpenStreetMap, multiple users can define the position of the same placemark.
As a better approximation of the real position of this placemark we can
compute the centroid of these positions. This transformation computes the
centroid of a cluster of points
179. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 42
Techniques for Checking the Relations
• Cartesian Product Technique (Naive)
– Performs exhaustive checks between the pairs of the entities
of datasets
– Complete
– Complexity: O(|S||T|) checks
• Blocking Technique [Isele et al., WebDB’11, Papadakis et al, TKDE’13]
– Divides the entities into blocks
– Decreases the number of checks
– Complete
– Complexity: O(|S||T|) checks (worst case), O(|L|) checks
(best case)
* |S|, |T|: number of entities in datasets S and T; |L|: number of links between datasets S and T
180. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 43
Blocking Technique for Spatial Relations
• Divide the surface of the earth
into curved rectangles (blocks)
• Adjust the area of the blocks
with a blocking factor (bf)
(blockArea:
1
𝑏𝑓2
𝑜2
)
• If the MBB of a geometry spatially intersects with a block, then
insert it in this block
• Check for a spatial relation only within each block
(independently)
• Construct the set of discovered links (𝐷𝐿 𝑟) by aggregating the
respective links that have been discovered within each block
181. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 44
Blocking Technique for Temporal
Relations
• Divide the time into
intervals (blocks)
• Adjust the length of the
blocks with a blocking factor (bf)
(blockLength:
1
𝑏𝑓
𝑡𝑖𝑚𝑒 𝑢𝑛𝑖𝑡𝑠)
• If a time period or instant temporally intersects with a block, then
insert it in this block
• Check for a temporal relation only within each block
(independently)
• Construct the set of discovered links (𝐷𝐿 𝑟) by aggregating the
respective links that have been discovered within each block
182. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 45
Blocking Technique
• Fully parallelizable with respect to the blocks
• Proven sound and complete
• 100% accurate links
• 100% precision, recall, F-measure
183. Extensions to the Silk Framework:
Spatial and Temporal Relations
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 46
Silk
184. Silk
Extensions to the Silk Framework:
Spatial and Temporal Transformations
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 47
185. Extensions to the Silk Framework
• Spatial and Temporal Extensions for Silk implemented as
Plugins
• Transparent to all the applications of Silk
– Single Machine
– MapReduce
– Workbench
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 48
Silk
186. • Download: https://github.com/silk-framework/silk
• Workbench application pre-installed in the VM
• Discover the following links:
All the datasets will be first converted to RDF with GeoTriples!
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 49
Hands-on Silk
Source Dataset Relation Target Dataset
Field Boundaries Contains Raster Cells
OSM Water
Bodies
Intersects Natura (2000)
Natura (2000) Within Federal States of
Germany
187. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 50
References (1/3)
• [Bizer et al., IJSWIS’06]
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International
Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
• [Christen, TKDE’11]
P. Christen, " A survey of indexing techniques for scalable record linkage and
deduplication.” in IEEE TKDE 2011.
• [Auer, RW’13]
Auer, S., Lehmann, J., Ngomo, A.C.N., Zaveri, A.: Introduction to Linked Data and Its
Lifecycle on the Web. In: Rudolph, S., Gottlob, G., Horrocks, I., van Harmelen, F. (eds.)
Reasoning Web. Lecture Notes in Computer Science, vol. 8067, pp. 1–90. Springer
(2013)
• [Salas et al., TerraCognita’11]
Salas, J., Harth, A.: Finding spatial equivalences accross multiple RDF datasets. In:
Proceedings of the Terra Cognita Workshop on Foundations, Technologies and
Applications of the Geospatial Web. pp. 114–126. Citeseer (2011)
• [Sehgal et al. GIS’06]
Sehgal, V., Getoor, L., Viechnicki, P.D.: Entity resolution in geospatial data integration. In:
Proceedings of the 14th annual ACM international symposium on Advances in
geographic information systems. pp. 83–90. ACM (2006)
188. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 51
References (2/3)
• [Vilches-Blázquez et al., AGILE’12]
Vilches-Blázquez, L.M., Saquicela, V., Corcho, O.: Interlinking geospatial information in
the web of data. In: Bridging the Geographic Information Sciences, pp. 119–139.
Springer (2012)
• [Ngonga Ngomo, ISWC’13]
Ngonga Ngomo, A.C.: Orchid - reduction-ratio-optimal computation of geo-spatial
distances for link discovery. In: Proceedings of ISWC 2013 (2013)
• [Clementini et al., SSD'93]
Clementini, E., Di Felice, P., van Oosterom, P.: A small set of formal topological
relationships suitable for end-user interaction. In: Abel, D., Chin Ooi, B. (eds.) Advances
in Spatial Databases, Lecture Notes in Computer Science, vol. 692, pp. 277–295.
Springer Berlin Heidelberg (1993), http://dx.doi.org/10.1007/3-540-56869-7_16
• [Randell et al. KR’92]
Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In:
KR. pp. 165–176 (1992)
• [Allen, Commun. ACM’83]
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–
843 (Nov 1983)
189. 01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 52
References (3/3)
• [Isele et al., WebDB’11]
Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery
without losing recall. In: WebDB. Citeseer (2011)
• [Papadakis et al, TKDE’13]
Papadakis, G., Ioannou, E., Palpanas, T., Niederée, C., Nejdl, W.: A blocking framework
for entity resolution in highly heterogeneous information spaces. Knowledge and Data
Engineering, IEEE Transactions on 25(12), 2665–2682 (2013)
202. Transforming OpenStreetMaps GML
document into an RDF graph (2/4)
# cp OSM/automatic-mapping.rml.ttl
OSM/altered-mapping.rml.ttl
# gedit OSM/altered-mapping.rml.ttl
203. Transforming OpenStreetMaps GML
document into an RDF graph (3/4)
1. Change the class definition for the triples map
<#ogr:waterwaysogr:geometryProperty>
1. Replace the class onto:LineStringPropertyType
with ogc:Geometry
2. Change the predicate that will link the thematic
data with the geometric data.
1. Find the triples map <#waterways>
2. Replace the text onto:has_geometryProperty
with ogc:hasGeometry
204. Transforming OpenStreetMaps GML
document into an RDF graph (4/4)
# ./osmdump.sh
--
geotriples-cmd dump_rdf -rml
-o OSM/osmtriples.n3
-ns osm-namespaces.ns
OSM/altered-mapping.ttl
--
# endpoint store
http://localhost:8080/strabonendpoint N-
Triples -t
/home/leo/DEMO_ESWC15/OSM/osmtriples.n3
205. Store TalkingFields datasets to Strabon
# endpoint store
http://localhost:8080/strabonendpoint
N-Triples -t /home/leo/datasets/fb.n3
# endpoint store
http://localhost:8080/strabonendpoint
N-Triples -t /home/leo/datasets/rc.n3
206. • Download: https://github.com/silk-framework/silk
• Workbench application pre-installed in the VM
• Discover the following links:
All the datasets will be first converted to RDF with GeoTriples!
Hands-on Silk
Source Dataset Relation Target Dataset
Field Boundaries Contains Raster Cells
OSM Water
Bodies
Intersects Natura (2000)
Natura (2000) Within Federal States of
Germany
214. • Download: https://github.com/silk-framework/silk
• Workbench application pre-installed in the VM
• Discover the following links:
All the datasets will be first converted to RDF with GeoTriples!
Hands-on Silk
Source Dataset Relation Target Dataset
Field Boundaries Contains Raster Cells
OSM Water
Bodies
Intersects Natura (2000)
Natura (2000) Within Federal States of
Germany