The Universal GTM - how we design GTM and dataLayer
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, Alvaro Diez Soto
1. Eurostat
How to collect data using SDMX?
1
Alvaro Diez Soto & Hubertus Cloodt
European Commission, DG Eurostat , the statistical office of the European Union
SDMX Experts Meeting
17-20 October 2016, Aguascalientes, Mexico
2. Eurostat
Overview
Data exchange and collection
Data exchange scenarios
SDMX and non-SDMX architecture and tools
Planned developments
April, 2016 2SIS-CC Workshop
4. Eurostat
Data exchange and collection
Data exchange
Data exchange is the process of taking data structured under a
source schema and actually transforming it into data structured
under a target schema, so that the target data is an accurate
representation of the source data.
Data collection
Data collection is the process of gathering and measuring
information on targeted variables in an established systematic
fashion, which then enables one to answer relevant questions
and evaluate outcomes.
October, 2015 4SDMX Tools Task Force
Source Wikipedia
5. Eurostat
Data exchange and collection
Requires a framework for:
Agreed structure (statistical concepts, nomenclatures)
Data provisions
Format and timeliness
Means for data exchange
The environment used for the physical exchange is the
implementation of the framework, while technically data
for each data collection defers in:
size, reporting periods and frequency
structure, formats and sources used
lifecycle
October, 2015 5SDMX Tools Task Force
Source Wikipedia
6. Eurostat
Data exchange and collection
Business needs result in requirements for:
Flexibility
Standardisation
Reference architecture
Modularity
Maintainability
October, 2015 6SDMX Tools Task Force
7. Eurostat
Data exchange and collection in Eurostat
Developed software solutions to satisfy the needs to
support:
Various formats, size, structure and sources
Different reporting taxonomies
Based on SDMX and implementing SDMX Reference
architecture
Fully-fletched systems implemented in different
technologies (client and server software)
Covering multiple data life cycle scenarios
October, 2015 7SDMX Tools Task Force
8. Eurostat
Data exchange scenarios
Data are ''pushed'' based on availability and provision
agreement
Data files are prepared by the providers and submitted to the
collecting organisation based on agreed deadlines for transmission
Data are ''pulled'' based on end-user need
Data are made available by the providers and transferred for direct
dissemination based on end-user request
Data are ''pulled'' for further processing
Data are made available by the providers and transferred for further
processing and dissemination based on agreed deadlines for
transmission
October, 2015 8SDMX Tools Task Force
9. Eurostat
Why SDMX
Covers all the scenarios, known as ''push'' and ''pull''
Provides a reference architecture for data dissemination,
database driven and Hub based exchange
Provides specifications for structural metadata
management (SDMX Registry) and exchange of data
using Web Services
October, 2015 9SDMX Tools Task Force
11. Eurostat
Data push (simplified view)
October, 2015 11SDMX Tools Task Force
Data provider Eurostat
Metadata repository
Data processing
Dissemination
Dissemination
management
Create
files
Register the
reception
Send
ProductionCollection
12. Eurostat
Data pull (simplified view)
October, 2015 12SDMX Tools Task Force
Data provider Eurostat
Metadata repository
DisseminationProductionCollection
Data request
Data response
13. Eurostat
Data pull (simplified view)
October, 2015 13SDMX Tools Task Force
Data provider Eurostat
Metadata repository
Data processing
Dissemination
Dissemination
management
Data request and
registration
ProductionCollection
Data request
Data response
15. Non-SDMX
local data
Data provider
DSWS
Census
Hub
Other softwareSDMX tools
SDMX
Converter
SDMX RI
NSI software
SDMX RI/Web
service
Software overview
EDAMIS
STRUVAL
SDMX
converter
EDIT
Data reporting Data Pull
ESS-MH
Push
Pull
Data Hub
Eurostat
SDMX data Push/Pull
Data
processing
Dissemination/
Transmission
Metadata
More info
17. Eurostat
EDAMIS – Infrastructure
17
Data provider Eurostat
ftp / http
Data
Servers
EDAMIS
Statel server
EDAMIS server
EWP
EDAMIS Web data
transmission
EDAMIS Management
MIS, users, datasets
EWF
https
EDAMIS environment
EWA /
Statel
18. Eurostat
EDAMIS – EWA / Statel
Eurostat
EDAMIS
Monitoring
Archive
Dispatching
Notification
Production
Unit
secure
transmission
Data
Notification
NSI
Data
secure
transmission
Acknowledgement1 2
3
4
5
6
7
8
9
19. Eurostat
EDAMIS - EWP / Portal
19
Office
Data
Travel
Data
Eurostat
eDAMIS
Monitoring
Archive
Dispatching
Notification
Production
Unit
Web
Portal
Data
Any Place
Data
1
2
3
4
5
6
7
21. Eurostat
SDMX based collection
To support the data pull scenarios, we need tools to
support the following processes:
Metadata creation and management (metadata related tools)
Data compliance (create data in SDMX format)
Data reporting and dissemination (data and metadata exchange
and dissemination
21
22. 22
SDMX exchange
Process of creating
SDMX artefacts:
Concepts,
Codelists, DSDs,
MSDs, Etc.
Compliance
related tools
Metadata
related tools
Like SDMX
Registries, Data
Structure
Wizard
Like SDMX
Reference
Infrastructure
or SDMX
Converter
Data modelling
Process of creating
SDMX data from:
data stored in files
or database; using
SDMX DSDs
Data&metadata
reporting tools
Data compliance
Data & metadata
reporting,
validation and
dissemination
Like SDMX-RI,
Hub, ESS-MH,
STRUVAL
Process of:
Data&metadata
transfer,exchange,
sharing, validation
and dissemination
23. Eurostat
Data modelling
Data Structure Wizard (DSW)
From your desktop
Online and offline mode
Can connect to a SDMX Registry
Local storage and maintenance of artefacts
Generation of sample data and templates
SDMX 2.0 and 2.1
23
Metadata related
tools
24. Eurostat
Data modelling
SDMX Registry
Web application
Central storage and maintenance of artefacts
Used in the data exchange and production process
SDMX 2.0 and 2.1
Supports SOAP and REST queries
Can connect to other SDMX Registries
24
Metadata related
tools
26. Eurostat
Data compliance
SDMX Converter
GUI (desktop and web), API, CLI and WS
Input data stored in files
Convert from to csv, xls*, xml, gesmes
Can connect to a SDMX Registry
Support templates and batch conversions
Mapping and transcoding
SDMX 2.0 and 2.1
SOAP WS
26
Compliance
related tools
27. Eurostat
Data compliance
SDMX Reference Infrastructure (SDMX-RI)
Input data stored in DDB
Provides mapping between the internal data and SDMX DSD
From your desktop (Mapping Assistant and Test Client)
Mapping and transcoding
Export the data in SDMX file format
SDMX 2.0 and 2.1
Developed in .NET and Java
27
Compliance
related tools
28. Non-SDMX
local data
NSI
Process workflow
SDMX
codes
Extract
files
Transform
file
SDMX file
Dissemina
tion/Trans
mission
NSI software
SDMX
Converter Processing
for sending
EDAMIS
SDMX Converter
SDMX-RI
SDMX-RI
Processing
for sending
SDMX-RI
EDAMIS
HUB
NSI development
NSI software
EDAMIS
NSI developed softwareEurostat tools
Data compliance
SDMX data
28
29. Eurostat
Data reporting and dissemination
SDMX Reference Infrastructure (SDMX-RI)
Exposes data stored in DDB via a Web Service
SOAP 2.0 and 2.1, REST 2.1
Export the data in SDMX file format
Hub
Single dissemination point for Census data
Data stored in MS`s
SOAP 2.0
Export the data in SDMX file format
29
Data&metadata
reporting tools
30. Eurostat
Data reporting and dissemination
Structural validation (STRUVAL)
Structural validation of SDMX-ML 2.0 and 2.1*
SOAP based Web service,
SDMX Creates validation reports
ESS-MH
Reference metadata reporting tool
Dynamically generated reporting structure
Supports ESMS, ESQRS and user defined reports
30
Data&metadata
reporting tools
31. Process workflow
SDMX data Push/Pull
Data
processing
Disseminatio
n/Transmissi
on
Eurostat
DSWS
Census Hub
Census
Hub
Other softwareEurostat tools
SDMX RI
NSI software
SDMX RI
Data reporting and
dissemination
Data Hub
STRUVAL
SDMX
converter
EDIT
Data reporting Data dissemination
ESS-MH
NSI
Non-SDMX
local data
32. Eurostat
Pull solution
Exchange of data based on web service
Triggered by the data collector
Based on SDMX SOAP 2.0 queries
Can pull data for direct data dissemination (Census Hub)
Can pull data for further processing (Data Hub)
Uses SDMX-RI as data provider software
32
33. Eurostat
How the Census Hub works
Eurostat Census
Hub
National Statistical Institute
National Statistical Institute
34. Census vs Data Hub
Data Provider Eurostat
Dissemination
database
Euro SDMX
Registry
Mapping
Assistant
Metadata
repository
Test
Client
Mapping
store
Web
Client
WEBSERVICES
Query and
transmission
management
Execution
plans
Census
Hub
Query
dispatcher
Edamis WS
Transmission
registration
Data request
Data response
Data Pull
Query executor
Metadata flow Data Hub flow
Data Hub
35. Eurostat
SDMX Reference Infrastructure
Set of IT modules, allowing a statistical office to
transform the data into SDMX format and to expose data
in SDMX format to the external world
Modular architecture, developed in both Java and .NET
Supports different database vendors
Supports SDMX 2.0 and in the future SDMX 2.1
Allows data collector organisation to access and retrieve
data on demand (pull approach)
Open Source Software – free of charge
In use: National Statistical Offices, Eurostat dissemination
chain, UN, etc.
36
37. Eurostat
38
Set of IT modules, allowing a
statistical office to transform the
data into SDMX format and to
expose data in SDMX format to
the external world
Can map dissemination DBs to
SDMX structures
Provides tools to browse the
statistical data
SDMX Reference Infrastructure
38. SDMX – RI modules and functionalities
- Mapping Assistant
Stores the SDMX structures
agreed for the data exchange
process
Allows users to define subsets
of data to be disseminated
Creates and stores mappings
between the internal data
structure and SDMX concepts
(e.g. My_column_A = AGE)
Creates and stores mappings
between the internal
classifications and SDMX
codelists
(e.g. My_code_AB = Y_LT15)
Result:
Control the exposed data
Preview the data in SDMX
format
Identify errors 39
39. SDMX – RI modules and functionalities
- Test Client
Allows users to view and extract
data in SDMX format, using the
mappings defined in the
Mapping Assistant tool
extract data directly from
the dissemination database
extract data using a web
address (web service)
Result:
Allows to test the data dissemination process
Test for SDMX compliance
Create custom extraction
Identify errors
Extract data in different formats 40
40. SDMX – RI modules and functionalities
- Web Client
Allows users to view and extract
data in SDMX format, using the
mappings defined in the
Mapping Assistant tool
Provides user friendly interface
for even not experienced users
Can extract data using a web
address (web service)
Result:
Allows to test the data dissemination process
Test for SDMX compliance
Create custom extraction
Identify errors
Extract data in different formats 41
41. SDMX – RI modules and functionalities
- NSI) Web service
No graphical user interface
Invisible for the user modules
controlling the incoming data
requests
Retrieving SDMX structure and
mappings
Retrieving data from the
dissemination database
Generating data response
messages
Sending data in SDMX format
Result:
Data are made available to
different data consumers via
internet
42
43. Eurostat
Future…
Further integration of SDMX in the statistical production
processes, used within and outside ESS, supporting global
data sharing between:
Statistical offices, agencies and national banks
International organisations
Improve quality of data exchange by introducing SDMX
compliant validation services
Maintain and further develop generic SDMX tools that
support SDMX implementation projects.
44
45. The slides here after are
just for information
and
only available in English!
46
46. Data Structure Wizard (DSW) – usage
Offline mode
Creation and maintenance of SDMX artefacts: Data
Structure Definitions, Code Lists, Concept
Schemes, Data Flows, Hierarchical Code lists, etc.
Import/export DSDs
Online mode
Connection to SDMX Registry to update local
repository
Submission of artefacts to SDMX Registry
47
48. Euro SDMX Registry – usage
Repository of SDMX artefacts (DSDs, standard
code lists)
Used for SDMX-based data/metadata exchange by
Eurostat and Member States
Enabling IT applications, organisations (NSIs) and
individuals
To share data and metadata structures and other
SDMX artefacts
To allow applications to subscribe for notifications
49
49. Euro SDMX Registry – functionality
Search of artefacts
Upload and download of SDMX artefacts
Web service interface for machine to machine
interaction
Subscriptions to artefacts
50
50. Most recent items
Access to the content of
the Registry: text search
51
Home page
Access to the content of
the Registry by type
51. SDMX Converter – usage
Mainly developed to convert from/to SDMX
Continuously extended to offer new functionality,
conversion capabilities and supported formats
Grows to be an important tool for many data
exchange systems and processes
Supported formats:
SDMX-ML 2.0 and 2.1 formats
GESMES/TS, GESMES/2.1, GESMES/DSIS
CSV, FLR, DSPL, Excel
52
52. SDMX Converter – functionality
Reading input messages
Parsing & populating internal SDMX data model
Writing output messages
Writing in target format
Importing Data Structure Definition (DSD)
Provided locally or retrieved from a Registry
4 modes of operation
Graphical User Interface, Command Line,
Application Programming Interface, Web Service
53
53. SDMX-RI tools – usage
Mainly developed to support data exchange via
web services in SDMX-ML format
Cornerstone of the European Census Hub
Growing number of use cases
ESS.VIP.BUS ICT dissemination demo
Eurostat's Dissemination Web Service
SDMX data file creation by certain statistical offices
Adoption by Member States and international
organisations
54
55. SDMX-RI tools – functionality
Set of building blocks
Allowing an organization to expose data to third
parties (via Web Service)
Supporting mapping of dissemination databases to
given structural metadata (via Mapping Assistant)
Testing mappings and web services and exporting
data in SDMX format (via Test Client)
Browsing statistical data (via Web Client)
Supports
SDMX v2.0 (and shortly v2.1) WS guidelines
Java and .NET
56
56. SDMX Converter – functionality
• Convert from/to SDMX based on DSD
• Reading input data files and writing output files
• Supported formats:
• SDMX-ML 2.0 and 2.1 formats
• GESMES/TS, GESMES/2.1, GESMES/DSIS
• CSV, FLR, DSPL, Excel
• 4 modes of operation
• Graphical User Interface, Command Line,
Application Programming Interface, Web Service
57
58. Eurostat
Hub approach – PULL method for data collection and
dissemination
NSI
Eurostat Pull
Requestor
eDAMIS
Data Input
SDMX Registry
Intermediate
storage
Verification /
Conversion
To SDMX
Received
data in
SDMX-ML
Loader
register
Warehouse
storage
Eurobase
query
Dissemination
XSL for
SDMX-ML
P
U
L
L
P
U
S
H
Hub Dissemination
59. Eurostat
ESTAT SDMX tools covers:
1) Data dissemination scenario
Collect
Process
Analyse
Disseminate
Evaluate
G
S
B
P
M 60
60. Eurostat
ESTAT SDMX tools covers:
2) Database driven architecture
Database WS
SDMX-ML
Data file
Data Providers
Collection organization
SDMX RegistryProvisioning metadata
Notification
Pull
requestor
Data
warehouse
website
Database WS
SDMX-ML
Data file
Database WS
SDMX-ML
Data file
61
Collect
Process
Analyse
Disseminate
Evaluate
G
S
B
P
M
61. Eurostat
ESTAT SDMX tools covers:
3) Data Hub driven architecture
Collect
Process
Analyse
Disseminate
Evaluate
G
S
B
P
M
Database WS
Data Providers
SDMX RegistryProvisioning metadata
Notification
HUB
Database WS
Database WS
62
62. Eurostat
Interoperability Architecture
Support and service improvements
EU Public Licence and others
Community and forum pages
Shared/collaborative development
April, 2016 63SIS-CC Workshop
Security and availability
Shared services, housing, hosting
Auditing & Operations
Modular & interoperable
Reference architecture
Strategy
We will talk about the generic topic of Data exchange and collection, some general terms and particularities. Will discuss how this process is organised within ESTAT, different modes of data collection and exchange used and with a special attention to the place of SDMX as a standard, reference architecture and implementation/tools
Planned developments
Function
It is a mini web server including a database and provides an user interface where users can perform the data delivery and view the status of their data file transmission.
For the data exchange the files are encapsulated in an "envelope" which is a XML file. Due to this principle it is possible to exchange every file type. The XML file is afterwards handed over to the transport component Statel.
The key concept of Statel is a so called Virtual File System or VFS. This VFS presents a unified view of the files (and only those files) that have been exchanged between the EWA and the EDAMIS / Stadium end system in ESTAT.
Activity
data preparation of the statistical data in the IT environment of the NSI
user logon to the EWA client application installed in the NSI IT environment
he / she loads the data files via a transmission menu into the EWA application. The user selects among other things the EDAMIS name of the transferred file. It follows the EDAMIS naming convention. The original file name will be transformed in the EDAMIS name.
the data transfer will be performed. Internally the data file will be integrated in an "envelope" containing several meta data like date, time, name of EWA, original file name, user, etc. Finally the file will be "handed over" to the transport layer (Statel).
the transport layer (Statel) forwards the file to its counterpart in EC / ESTAT. The transmission is protected as the data of the envelope is encrypted.
EDAMIS application takes the file from the transport layer and starts the file processing which contains of several steps.
one process step is to inform the sender about the delivered file via an email acknowledgement.
the file will be transferred in a next step to the IT environment of the production Unit.
users in the production Unit get an email notification about the delivered file.
Function
It is used for managing the dataset inventory, managing the user rights related to the transmissions, monitoring the traffic through its Management Information System (MIS) and transferring data files between Eurostat and Member States.
The dataset inventory is the basis for all data transmissions. It contains descriptive information about statistical domains and related datasets as well as links between datasets and countries.
All EDAMIS users with corresponding rights can send data via the user interface. Three transfer features are available.
For the data exchange the files are encapsulated in an "envelope" which is a XML file. Due to this principle it is possible to exchange every file type. The XML file is afterwards handed over to the internal transport layer.
Activity
data preparation of the statistical data in the IT environment of the user
user logon to the EDAMIS Web Portal using secure https protocol
he / she loads the data files via a transmission menu into the EWP application. The user selects among other things the EDAMIS name of the transferred file. It follows the EDAMIS naming convention. The original file name will be transformed into the EDAMIS name. The data transfer will be performed. On the EDAMIS Web server the data file will be integrated in an "envelope" containing several meta data like date, time, original file name, user, etc. Finally the file will be "handed over" to the internal transport layer .
EDAMIS application takes the file from the internal transport layer and starts the file processing which contains of several steps.
one process step is to inform the sender about the delivered file via an email acknowledgement.
the file will be transferred in a next step to the IT environment of the production Unit.
users in the production Unit get an email notification about the delivered file.
33
Data hub – module to create and execute SDMX queries
Metadata repository – storage of structural metadata
Query and transmission management – module to schedule the queries and transmission plans; administration of connection properties
Query dispatcher – Edamis WS client
Authentications of admin users are made using a LDAP/ECAS Server. Authentication of a public user is made using credentials stored in the Census Hub database. The notifications sent to the Users are realized using a SMTP Server.
The meaningful modules respecting the high cohesion architectural goal are:
Offline downloads Module: A user can run a query asynchronously. This Architectural pattern is known as asynchronous request and aims at reusing existing assets defined in the generic architectural goals. This module uses the Census DB Architectural mechanism in order to store the data.
Administration Module: An Admin User (located in the Eurostat tier) can:
Manage specific configuration of the Census System: These configurations can act on SMTP, Web Services’ Endpoints, and Proxy configuration […].
This module uses the Census DB Architectural mechanism in order to store the data.
Multilingual support Module: This Module allows a User to select a different language that will translate the labels displayed by the User Interface. By default, the operating system language will be selected.
Local NSI Web Service: This Web Service aims at fetching the statistical data stored in the Local NSI Database.
Local NSI Database: This local Database contains the statistical data for testing purposes
DLBB: The Data Loader Building Block is a middleware mechanism that was developed in order to import datasets into the local NSI Database.
LAU Management Module This tool should allow the countries to manage and update the LAU codes they will be using for Census 2011 regulation. An ESTAT user (The LAU Validator) must be able to validate the change done by the NSIs (LAU Managers) before they are published and used in Census Hub. This Module is independent from the Census Hub application.
SMD Application The tool manages codelists, DSDs and concepts in the SMD database. The application also manages the principal marginal’s and hypercube categorization. Only SMDManager has access to this application.
SMD WebService The Web service will pull the codelist, dsd and concept data from SMD database and return as SDMX artefacts.
Reading input messages -> populating the internal SDMX model -> writing in the target format
Retrieve the DSD from a structural metadata source (e.g. an SDMX Registry), and create database tables.
Read an SDMX data set file and load the data into the database
Data discovery system continually synchronises its metadata with the structural metadata source. A user makes a data selection from choices built from the information held in an SDMX Registry (structural metadata such as category scheme, dataflow, DSD, data provider, provision agreements and data registration)
These choices are logical choices, built from the dimension selections.
The logical choice is formatted as an SDMX data query. This is passed to the Data Base which responds with an SDMX data set.
Reference metadata relevant to the data returned is retrieved from a metadata repository.
The data and metadata are passed to a visualization tool to display the data in tables, charts, graphs, maps etc. Often a download is offered in various formats. The download options often include also the DSD or MSD.
Follow the approach of building blocks; cross border framework; reusable solutions with low maintenance cost; enhancements easy to implement as plug-ins; similar to MS software and compatible with environment; follows common components and reference framework/architecture established in the standard.
Aims at following Towards a European Interoperability Architecture
Interoperatility solutions foe european public administations
ISA² started on 1 January 2016 and it will last last until 31 December 2020
he promotion of the use and maintenance of the European Interoperability Strategy(2 MB) (EIS), the European Interoperability Framework(2 MB) (EIF), the European Interoperability Reference Architecture