4. Guest Star
Metadata
Data about Data (really?)
Functions
Manage Data (discovery,
selection etc)
Issues (selection of)
- What is metadata to
me, can be data to
others
- Many standards
- Ontologies
6. Actor
The Triad
A set of 3 elements to
fully manage data
Functions
PID – persistent identifier
Metadata – discovery &
selection
DO – data of interest
<PID, metadata, DO>
7. Technical support staff
Data Base
Collection of (organized)
Data
Alias
Repository, Data Center
etc.
Superpowers
- DBMS (allows definition,
creation, querying,
update, and
administration of
8. Technical support staff
APIs
Application programming
Interface
Standard procedures or
instructions to access to a
service (or function)
Alias
WEB service, RESTful
service, [thin layer] etc..
Needs
- Standards for requests
- Standards for
9. Themes
1. Optimizaton of
resources
2. Single point access…
to several Database
and services
3. OPEN ACCESS
obligations
Berlin
Declaration,DPC…
4. Interoperation for data
re-use New
multidisciplinary
science
11. SCENARIOS
1. Friendship based
discovery
2. Manual discovery
3. Advanced manual
discovery
4. Brokering (canonical
form
5. Metadata driven
canonical brokering
6. Metadata driven
canonical brokering
with contextualization
12. #0 friendship
based discovery
1. data stored on USB
pendrives, CDs etc.
2. Phone calls
3. Emails
Issues
Works well in masonry
clubs
13. #1 Manual
discovery
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
Data from Irpinia
1. User discovers data
2. Repository do not have
web services
3. No metadata (or
embedded into file or
diectory structure)
4. Manual match &
mapping
Issues
Performances, efficiency,
error prone, partial
datasets
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
14. #2 Advanced
manual discovery
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
Data from Irpinia
1. User discovers data
2. Repository have access
interfaces (APIs, WS…)
3. Minimal metadata set
4. Manual match &
mapping
Issues
- Performances,
efficiency, error prone
- Some standardization in
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
15. #4 Brokering
(canonical form)
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
Data from Irpinia
1. Broker discovers data
2. Repository have access
interfaces (APIs, WS…)
3. Minimal metadata set
4. Minimal match
&mapping
5. Multdisciplinary
(ontologies)
Issues
- Single AP
- development and
maintenance
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Broker
API Metadata
canonical form
16. #5 Metadata driven
canonical
Brokering
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
Data from Irpinia
1. Broker discovers data
2. Access interfaces
3. Full metadata set
4. Advance match
&mapping
5. Multdisciplinary
(ontologies)
Issues
- Single AP
- Stored graph metadata
- Huge metadata
superset
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Broker
API
Metadata
catalog
17. #6 Metadata driven
canonical
Brokering
with
contextualization
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
Data from Irpinia
1. Map & match only
contextualization
metadata
2. Pointers to detailed
metadata
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Broker
API
Metadata
catalog
18. #6 Metadata driven
canonical
Brokering
with
contextualization
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
1. Map & match only
contextualization
metadata
2. Pointers to detailed
metadata
3. Export metadata in any
standard
3 layer metadata
model
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Discovery (DC) and (CKAN, eGMS)
Contextual (CERIF metadata model)
Detailed (community specific)
21. Wrapping up
We need
1. Metadata describing
data
2. APIs & web services
3. Defined WS output
format
4. PID system -
5. Brokering system
6. Metadata catalogue
supporting
1. Ontologies
2. Contextualization
23. #3 Metadata driven
canonical
brokering
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Data
set
Data
set
Data
set
Data from Irpinia
1. Broker discovers data
2. Repository have access
interfaces (APIs, WS…)
3. Significant metadata set
4. Good match &mapping
Issues
- development and
maintenance
- Single AP
- “hardcoded” metadata
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Broker
API
Metadata
catalog
24. #4 Metadata driven
canonical
brokering
Broker
= any data format
Data
set
Issues
1. Predefined tools for
matching and mapping
2. Writing software:
n conversion
algorithms to canonical
form
3. Ontologies
4. Multidisciplinary
but many formats
5. Good data discovery
Data
set Data
set
Data
set
Data
set
= metadata format A
= metadata format B
Data from Irpinia
catalog
25. #1 Conventional
Brokering
Broker
= data Format A
= data Format B
= data Format C
Data
set
Data
set Data
set
Data
set
Data
set
Data
set
Data
set
Data
setData
set Data
set
Data
set
Data
set
Data from Irpinia
Issues
1. Writing software: n*(n-
1) conversion
algorithms
2. does not scale in costs
of development and
maintenance
3. matching and mapping
4. works within a
restricted research
domain
26. #2 Brokering with
canonical form
Broker
= data Format A
= data Format B
= data Format C
Data
set
Data
set Data
set
Data
set
Data
set
Data
set
Data
set
Data
setData
set Data
set
Data
set
Data
set
Data from Irpinia
Issues
1. Writing software:
n conversion
algorithms to canonical
form
2. works within a
restricted research
domain
3. matching and mapping
4. “Complex” data
discovery
= canonical
Format A
27. #3 Metadata driven
simple brokering
Broker
= any data format
Data
set
Issues
1. Good data discovery
2. Predefined tools for
matching and mapping
3. Multidisciplinary
but many formats
4. Writing software:
n*(n-1) conversion
algorithms
5. Ontologies
Data
set Data
set
Data
set
Data
set
= metadata format A
= metadata format B
Data from Irpinia
METADATA
28. #2 Metadata driven
canonical
brokering
Broker
= any data format
Data
set
Issues
1. Predefined tools for
matching and mapping
2. Writing software:
n conversion
algorithms to canonical
form
3. Ontologies
4. Multidisciplinary
but many formats
5. Good data discovery
Data
set Data
set
Data
set
Data
set
= metadata format A
= metadata format B
Data from Irpinia
catalog
METADATA
Editor's Notes
Solo dato digitale
Deve avere per noi un senso, deve essere interpretabile
Esempio carta identità
WE SHOW 4 SCENARIOS LEADING TO THE MOST EFFICIENT ARCHITECTURE
Qui vengono usati I metadati
Tool per match-mapping definiti a priori
N^2 conversion tools
Ontologie (lat, latitude, lt) hardcoded
Tanti formati di dati ma chissenefrega
Discovery e selction più semplice
- Non fa uso di metadata
Usa solo files
Abbiamo il broker
Abbiamo l’utente chiede dati da una zona
Il broker legge tutti I file, estrapola le info di interesse, vede se matchano con la richiesta
Converte tutti I dati
PER CONVERTIRE c’è bisogno di un match-mapping fatto a priori
N^2 cnversioni
Qui vengono usati I metadati
Tool per match-mapping definiti a priori
N^2 conversion tools
Ontologie (lat, latitude, lt) hardcoded
Tanti formati di dati ma chissenefrega
Discovery e selction più semplice
Qui vengono usati I metadati
Tool per match-mapping definiti a priori
N^2 conversion tools
Ontologie (lat, latitude, lt) hardcoded
Tanti formati di dati ma chissenefrega
Discovery e selction più semplice