young call girls in Janakpuri🔝 9953056974 🔝 escort Service
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDEXES). Tim Pugh, ACEAS Grand 2014
1. SPEDDEXES: An open-source, community developed
approach to enhancing the way ‘Big Data’ is managed,
discovered and shared by ecosystem scientists
Evans, Bradley John*; Guru, Siddeswara; Allen, Stuart; Beckett, Duan; de Wit,
Roald; Duursma, Daisy; Erwin, Tim; Evans, Ben; Fuchs, David; Hodge, Jonathan; Ip,
Alex; King, Edward; Lewis, Adam; Paget, Matthew; Porter, David; Prentice, Iain
Colin; Pugh, Tim; Scarth, Peter; Sixsmith, Joshua; Sun. Yi; Trevithick, Rebecca;
Whitley, Rhys
SPatially Explicit Data Discovery, EXtraction and
Evaluation Service
2. SPatially Explicit Data Discovery,
EXtraction and Evaluation Service
SPATIALLY and temporally EXPLICIT research data
infrastructure to interrogate data streams on the National
Computing Infrastructure
DATA DISCOVERY for TERN or any datasets which can be read
by the platform
EXTRACTION AND EVALUATION drives advances in ecosystem
science, impact assessment and land management
Community success story: convolution of ideas from ...
Government (Fed and State), CSIRO, NCI, INTERSECT and Universities
3. : Ever growing need
The Spatially Explicit Data Discovery, Extraction and Evaluation
Service (SPEDDEXES) was developed to address the ever growing
need to better manage and access the Big Data available to
Australian ecosystem sciences today.
Coupled Model Intercomparison Project for Climate Experiments
• CMIP-5 consists of ~23 international models
• CMIP-5 international data repository is >2PB
• CMIP-6 contributions expected to 10x CMIP-5
5. : New approach for Big Data
It is no longer practical, let alone affordable, to
continue to do data-intensive ecosystem science
in the copy-and-work paradigm, a new approach
to working with Big Data is required.
Think about network data access, not file downloads
…
Cross-disciplinary use of file formats and services
…
Open-source server technology and file formats
…
Work with big data in a high performance facility
6. : Two key issues
The SPEDDEXES concept and tools addresses
two key issues.
• Firstly, create a self-describing data archive,
which adheres to international standards and
community conventions.
• Secondly, data providers to adopt community
standards to enable data catalogue and data
access services for easier utility,
management, and sustainability.
7. : SPEDDEXES architecture
Connecting data to applications through the use of open-source
middleware services and web technologies
1. an Open-source Project for a Network Data Access Protocol
(OPeNDAP)
2. the Open Geospatial Consortium (OGC) web services and the Web
Map Service (WMS) and protocol
3. the Thematic Real-time Environmental Distributed Data Services
(TDS) service, an implementation of OPeNDAP and WMS
4. an Environmental Research Divisions Data Access Program
(ERDDAP) service to aggregate data sources and provide search
and data download services
5. ZOO Web Processing Service (ZOO WPS) for server-side processing
6. A javascript web interface with search and visualization and subset
download functionalities (a.k.a. SPEDDEXES-UI).
8. : Seeking climatic data
ERDDAP Service
- Catalogue
- File (csv,…)
- RSS notify
- Rich user interface
NCAR Data Service
- Catalogue
- OPeNDAP
- WMS
NCI Data Service
- Catalogue
- OPeNDAP
- WMS
TERN Data Service
- Catalogue
- OPeNDAP
- WMS
9. THREDDS and Discovery Systems
Data server
Communicate with
Discovery Systems
Metadata
Repository
Metadata
HarvesterReads
References
Discovery
System
THREDDS Services
with data server
Writes
Catalog
Searches
Metadata
Generator
Netcdf, hdf, grib …
12. Trans-disciplinary science
• To publish, catalogue and access self-documented data for
enhancing trans-disciplinary, big ecosystem-data science
within interoperable data services and protocols.
Integrity of Science
• Ease of access to data to enhance the scientist’s workflow,
ensures more accurate and repeatable science which can be
conducted with less effort.
Integrity of Data
• The data repository services ensure data integrity, digital
object identifiers, data discovery and catalogue searches.
13. For further information:
Brad Evans
Director ~ TERN e-MAST
bradley.evans@mq.edu.au
Tim Pugh
Australian Bureau of Meteorology
Centre for Australian Weather and Climate Research
t.pugh@bom.gov.au
14. Self-describing data
An open-source GeoSciences file format is the network Common Data
Format (netCDF) from Unidata (http://www.unidata.ucar.edu).
NetCDF goals support for data archives:
• Portable: byte order neutral.
• Efficient: random access
• Appendable data arrays
• Metadata within the file for global and variable attributes
Metadata conventions provide community standards for …
• self-describing (CF) metadata conventions
• data discovery (Unidata ACDD) conventions
• community specific metadata (i.e. IMOS, TERN AusCover)
• http://www.auscover.org.au/userdocs/metadata
15. Fundamental Objective of OPENDAP
The fundamental objective of OPeNDAP and OPeNDAP Inc. is to
facilitate internet access to scientific data
This is done by:
• Providing a protocol (DAP) to access data over the internet,
• Hiding the format (and organization) in which the data are stored from
the user, and
• Providing subsetting (and other) capabilities for the data at the server
OPeNDAP is based on a multi-tier architecture
OPeNDAP software is open source
16. THREDDS Data Server (TDS)
TDS is THREDDS Data Server
• THREDDS is Thematic Real-time Environmental Distributed Data Services
• Middleware to bridge the gap between data providers and data users
• THREDDS Data Server (TDS), a web server that provides catalog, metadata, and data
access services for scientific datasets.
• The TDS is open source, 100% Java, and runs inside the open source Tomcat Servlet
container.
Unidata’s Common Data Model
• merges the OPeNDAP, netCDF, and HDF5 data models to create a common API for
scientific data
• implemented by the NetCDF Java library
• read netCDF, OPeNDAP, HDF5, HDF4, GRIB 1 & 2, BUFR, NEXRAD 2 & 3, GEMPAK,
MCIDAS, GINI, among others
• A pluggable framework allows other developers to add readers for their own specialized
formats.
• provides standard APIs for geo-referencing coordinate systems, and specialized queries
for scientific feature types like Grid, Point, and Radial datasets
17. Spectrum of Use Cases
Application Data
Representation
OGC data model
domain specific
geospatial, 1-D, 2-D
DAP2 data model
domain neutral
n-D, time series
**DAP4 data model
domain neutral
new data types and data
structures
streaming, compressed,
chunked
Common Data Model (CDM)
domain specific
Future data model
domain neutral??
Application Types
Programmatic / Langauge
API
FORTRAN, C/C++, JAVA,
Python, NetCDF, Java NetCDF
Programmatic / Tools
NetCDF, NCO, PyDAP
Custom Tools: OPeNDAP
crawler, ocean_prep
Interactive Data Viewer
IDV, Panolopy, IDL, MATLAB,
iPython (matplotlib), NCL, web
browser (metadata)
Interactive Analysis
MATLAB, IDL, iPython, NCL
Custom Application: Inudation
Modeller
Web Application
Live Access Server
IMOS Data Portal (WMS)
Custom Java Servlet
Programming
DAP2 Legacy Code
existing tools
DAP2 New Code
New tools
**DAP4 programming
legacy code support
**DAP4 programming
new data model and protocols
streaming support
**DAP4 programming
Asynchronous access modes,
server-side processing
Data Access
Protocol
Metadata Request
das, dds, ddx
ASCII/Binary Data Request
Simple data representation
DAP Binary Object Request NcML Data Request
aggregation, virtual data sets
**DAP4
server-side operations, async
access mode, new data model,
posting
Syntax
Return data set info
file.nc.dds - readable
file.nc.ddx - XML
file.nc.asc - ASCII data return
Select variables
file.nc.dods?var1,var2,var3
subset arrays
file.dods?var1(0:1:10)
Return file translations
file.nc.netcdf - NetCDF file
Server-side operations
file.nc?GEOLOC()
Async access mode
??
Clients
Programmatic Access
Tsunami inudation modeller,
NetCDF,
NCO, PyDAP, PyNetCDF,
MATLAB, IDL, …
Interactive Access
Web browser - Catalog
MATLAB, IDL, Python,
Panolopy,…
Data Library & Catalog
Service
metadata harvesting
directory listings
remote THREDDS services
Web Service
Java servlet, Java applet
Geospatial Information Service
OPeNDAP data service
Analysis Service
Live Access Server
Service Capabilities
DAP2 response
metadata, dods, ASCII / Binary
**DAP4 Response
async access mode, server-
side, streaming,
NcML
Aggregation service
Virtual Data Set Service
Remote Data Access
Metadata Conversion and
RDF
metadata definitions,
translations (-> ISO) sematics,
ontalogy
CF->ISO, CF->WMS, CF->WCS
Layered Services
Catalogue service
WMS, WCS services
Authentication
Conformance checks
CF metadata check
ISO metadata check
**DAP4 features listed is my estimation and not the official specification