This presentation was given at the ISESS'15 conference at Melbourne.
Abstract. netCDF is a well-known and widely used format to exchange array-oriented scientific data such as grids and time-series. We describe a new convention for encoding netCDF based on Linked Data principles called netCDF-LD. netCDF-LD allows metadata elements, given as string values in current netCDF files, to be given as Linked Data objects. netCDF-LD allows precise semantics to be used for elements and expands the type options beyond lists of controlled terms. Using Uniform Resource Identifiers (URIs) for elements allows them to refer to other Linked Data resources for their type and descriptions. This enables improved data discovery through a generic mechanism for element type identification and adds element type expandability to new Linked Data resources as they become available. By following patterns already established for extending existing formats, netCDF-LD applications can take advantage of existing software for processing Linked Data and supporting more effective data discovery and integration across systems.
See http://link.springer.com/chapter/10.1007%2F978-3-319-15994-2_9
netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF
1. Towards linked data conventions for
delivery of environmental data using
netCDF
LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
Jonathan Yu, Nicholas Car, Adam Leadbetter*, Bruce A. Simons, and Simon J.D. Cox
CSIRO LWF and BODC/ Marine Institute (Galway)
25 March 2014 | ISESS
… netCDF-LD
2. Outline
1. Big data and netCDF
2. Metadata challenges
3. Linked Data, JSON-LD & CSV-on-the-web
4. netCDF-LD
5. Applications
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu2 |
3. Big data
• “90% of the world’s data has been produced over the last two
years”
• Environmental and earth observation has always been big-data…
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu3 |
4. National Soil and Landscape Grid
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu4 |
6. Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu6 |
Real-time wind data
7. netCDF
• Persistence layer
• Simple: self-describing, user
accessible “flat file”
• Java, C++, python, others
• Interfaces to the Data Model
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu7 |
File format
Software library
API
9. netCDF cont’
• Really good at handling array-oriented data – efficient subsetting
• Multi-dimensional grids
• CF conventions
climatology and
weather forecasting
domains
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu9 |
10. netCDF metadata example
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu10 |
float Nap_MIM(time, latitude, longitude) ;
Nap_MIM:_FillValue = -999.f ;
Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;
Nap_MIM:units = "mg/L" ;
Nap_MIM:valid_min = 0.01209607f ;
Nap_MIM:valid_max = 226.9626f ;
Nap_MIM:standard_name =
”total_suspended_solids”;
11. Use of netCDF for Chlorophyll observations
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu11 |
12. netCDF challenges
• Encoding format and framework
• “Domain agnostic” but conventions exist for communities – e.g.
Climate and Forecasting (CF conventions)
• CF conventions limited
• Use of ‘standard_name’ attribute to denote the combination of
medium/transformation/substance/state/quantity using a particular syntax
• Assumes English
• Standard names take a while to be adopted
• Standard names variations of each other
• Units are from UDUNITS – Unidata managed
• Narrow scope of scientific domain – can’t reuse for other domains
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu12 |
13. netCDF Metadata challenges – Semantic heterogeneity
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu13 |
Enviro
Application
#1
Data
DB
Chl_MIM
Enviro
Application
#2
Data
DB
mass_conc_
chlorophyll_
In_sea_water
Enviro
Application
#3
Data
DB
mass_conc_
chlorophyll_a_
In_sea_water
14. Semantic Heterogeneity… leads to data silos
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu14 |
Enviro
Application
#1
Enviro
Application
#2
Enviro
Application
#3
Data Data Data
DB DBDB
Chl_MIM
mass_conc_
chlorophyll_
In_sea_water
mass_conc_
chlorophyll_a_
In_sea_water
X X
Meetings
15. Approaches to address netCDF metadata challenge
1. Strict naming grammars for standard_name
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu15 |
(Peckham 2014) CSDMS Standard
Naming Conventions
16. Approaches to address netCDF metadata challenge (2)
2. Break up various concerns in standard_name into multiple
attributes – ref (Yu et al. 2014)
float Nap_MIM(time, latitude, longitude) ;
Nap_MIM:_FillValue = -999.f ;
Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;
Nap_MIM:units = "mg/L" ;
Nap_MIM:valid_min = 0.01209607f ;
Nap_MIM:valid_max = 226.9626f ;
Nap_MIM:scaledQuantityKind_id
= "http://environment.data.gov.au/water/quality/def/property/solids-
total_suspended" ;
Nap_MIM:unit_id =
"http://environment.data.gov.au/water/quality/def/unit/MilliGramsPerLitre" ;
Nap_MIM:substanceOrTaxon_id =
"http://environment.data.gov.au/water/quality/def/object/solids";
Nap_MIM:medium_id =
"http://environment.data.gov.au/water/quality/def/object/ocean"
Nap_MIM:procedure_id = "http://data.ereefs.org.au/ocean-colour/MIM_SVDC_RRS" ;
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu16 |
17. Harmonised Publish, Discovery, Access and Use
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu17 |
Relies on community
agreed vocabularies
Describe/Publish
the data
Query/Use data
Enviro
Application
#1
Enviro
Application
#2
Enviro
Application
#3
Data Data Data
DB DBDB
substanceOrTaxon=
http://environment.data.gov.au/def/object/chlorophyll
scaledQuantityKind =
http://environment.data.gov.au/def/property/chlorophyll_
concentration
Need to communicate more consistently
Requires shared, precise, agreed semantics
18. Linked Data, JSON-LD & CSV-on-the-web
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu18 |
"LOD Cloud Diagram as of September 2011" by Anja Jentzsch
19. Linked Data
• Method of connect related data, existing vocabularies and other
semantics using web links (URIs) and a language for describing
resources (RDF)
• Applications can then
query data, follow the links
and infer new insights
from the data
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu19 |
20. JSON-LD
{
"name": "John Lennon",
"born": "1940-10-09",
"spouse":
"http://dbpedia.org/resource/Cynthia_Lennon"
}
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu20 |
JSON-LD
Decorators
21. JSON-LD and Semantic Web
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu21 |
http://dbpedia.org/re
source/John_Lennon
http://dbpedia.org/res
ource/Cynthia_Lennon
“John Lennon”
1940-10-09
22. CSV-on-the-web
• Linked data for CSV
tabular data
• Add context to
tables
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu22 |
23. Example: Domain vocab term
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu23 |
24. Example: Domain vocab term
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu24 |
...
wqp:chlorophyll_a_concentration
a skos:Concept, op:ScaledQuantityKind,
qudt:ChemistryQuantityKind ;
skos:broader wqp:chlorophyll_concentration ;
skos:prefLabel "chlorophyll a concentration"@en ;
26. netCDF-LD: Linkifying netCDF
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu26 |
‘Context’ boilerplates
27. netCDF-LD: Linkifying netCDF
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu27 |
Air Temperature
Definition
Air
Temperature
Medium
Quantity
Kind
Kelvin
UoM
eReefs Vocabularies
29. Assigning URIs to variable level attributes
z:units = "meters";
z:units_ref = "http://qudt.org/vocab/unit#Meter";
z:a =
"http://environment.data.gov.au/def/op#quantityKind";
z:dcPartOf = "http://foo.bar/linked_netCDF_example";
z:valid_range = 0., 5000.;
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu29 |
30. netCDF-LD to RDF
@prefix unit: <http://qudt.org/vocab/unit#> .
@prefix qudt: <http://qudt.org/1.1/schema/qudt#> .
@prefix op: <http://environment.data.gov.au/def/op#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
_:z qudt:unit unit:Meter;
a op:ScaledQuantityKind;
dcterms:isPartOf <http://foo.bar/linked_netCDF_example>.
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu30 |
31. Use of netCDF-LD to support Data Discovery
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu31 |
32. Data annotated with bindings to vocab URIs
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu32 |
THREDDS
THREDDS
Catalog
Domain Vocabs
(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology
(QUDT)
substanceOrTaxon=
http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind
= http://environment.data.gov.au
/def/property/chlorophyll_concentra
tion
unit
= http://qudt.org/vocab/unit#Unitless
medium
= http://environment.data.gov.au
/def/feature/ocean
33. DBL Harvesting and End Use
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu33 |
Data Brokering
Layer
THREDDS
THREDDS
Catalog
Domain Vocabs
(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology
(QUDT)
substanceOrTaxon=
http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind
= http://environment.data.gov.au
/def/property/chlorophyll_concentra
tion
unit
= http://qudt.org/vocab/unit#Unitless
medium
= http://environment.data.gov.au
/def/feature/ocean
End users
Client
application
chlorophyll
34. netCDF-LD benefits
• Enhanced metadata – better self-description for any domain
• Allows binding to rich semantics from existing vocabularies being
published via SKOS/OWL/RDF in multiple domains
• Linked Data approach – consistent with Semantic Web
technologies, RDF, JSON-LD, CSV-on-the-web
• Hook into toolkits and software libraries for data discovery via
Semantic Web technologies and other Linked data already being
published online
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu34 |
35. Future/Current Work
1. netCDF-LD requires adoption at community level – need to test
the proposal at OGC, Unidata, CF communities
2. Tooling for parsing netCDF-LD
3. Encoding large volumes using netCDF-LD
4. Demonstrate Data Broker concept using netCDF-LD
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu35 |
36. Summary
• netCDF is really good at big data – esp. array-oriented data
• Limitations with the current netCDF metadata structure
• *-LD pattern is emerging across multiple formats (JSON, CSV)
• Propose netCDF-LD which is an unintrusive extension on netCDF to
facilitate improved self-described datasets
• netCDF-LD has potential to enhance data discovery from
environmental dataset across to other Linked Data being
published online
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu36 |
37. LAND AND WATER
Thank you
Land and Water
Jonathan Yu
Research Software
Engineer
t +61 3 9252 6440
e jonathan.yu@csiro.au