SlideShare a Scribd company logo
1 of 26
Download to read offline
Big Data and Nowcasting
Table 3 New and traditional data sources for local indicators –
is it possible reconciling small areas and big data?
Dario Buono*, European Commission, Eurostat
dario.buono@ec.europa.eu @darbuono
*The views expressed are the author’s alone and do not necessarily correspond to those of the corresponding organisations of affiliation
Workshop on Small Area Methods and living conditions indicators in
European poverty studies in the era of data deluge and Big Data
Pisa, 8-10 May 2018
Eurostat, the Statistical Office of EU
• About 700 people with 28 different nationalities
• Statistical Office of European Union, part of EC
• Core business:
• Euro-zone (19) & EU (28) aggregates
• harmonization, best practices, guidelines,
trainings & international cooperation, grants
research funds
• Methodology team: Time Series, Econometrics,
Data Analytics, Use of R in Statistical Offices
Digital (r) Evolution?
Eurostat
State-of-play
Explosion of information around us
Popularity of internet and other new technologies
Multitude of new data sources
Is change needed?
Do not start with data, but
start with a good question!
Why interested in Big Data for nowcasting?
• Big Data are complementary information to standard
data, being based on different information sets
• More granular perspective on the indicator of interest,
both in the temporal and cross-sectional dimensions
• It is timely available, generally not subject to revisions
A Big Data expert is… actually a team!
….ended in these research activities
Can Big Data help for Macroeconomic Nowcasting?
What are the potential Big Data sources?
1. Literature review
2. Models/methods to be used for Big data
3. Recommendations on how to handle Big Data
4. Case study: IPI, Inflation, unemployment of some EU
countries
Big Data types & dimensionality
• When the dimensionality increases, the volume of the space
increases so fast that the available data become sparse.
• Use of a typology based on Doornik and Hendry (2015):
• Tall data: many observation, few variables
• Fat data: many variables, few observations
• Huge data: many variables, many observations
Big Data Types potentially
interesting for Macro Nowcasting
Financial markets data Scanner prices data
Electronic payments data Online prices data
Mobile phones data Online search data
Sensor data Textual data
Satellite images data Social media data
Types of Big Data by Dominant Dimension
Type Fat Tall Huge
Financial Markets X
Electronic Payments X
Mobile Phones X X
Sensor Data / IoT X X
Satellite Images X
Scanner Prices X X
Online Prices X X
Online Search X
Fat (big cross-sectional
dimension, N, small
temporal dimension, T)
Tall (small N, big T)
Huge (big N, big T)
Eurostat
Models race
• Dynamic Factor Analysis
• Partial Least Squares
• Bayesian Regression
• LASSO regression
• U-Midas models
• Model averaging
255 models tested, macro-financial & google trend data
From Data Access to Modelling
Step-by-step approach, accompanied by specific
recommendations for the use of big data for
macroeconomic nowcasting, guiding to
• the identification and the choice of Big Data
• pre-treatment and econometric modelling
• the comparative evaluation of results to obtain a very useful
tool for decision about the use or not of Big Data
Step 1: Big Data usefulness within
a nowcasting exercise
Recommendations
1. Evaluate the quality of the existing nowcasts
and identify issue (bias or inefficiency or large
errors in specific periods), that can be fixed by
adding information in Big Data based indicators
2. Use of Big Data only when expecting to improve
the timeliness and/or the quality of nowcastings
3. Do not consider Big Data sources with spurious
correlations with the target variable
Step 2: Assessment of big-data
accessibility and quality
Recommendations
1. Privilege data providers with guarantee of continuity and of the
availability of a good metadata associated to the Big Data
2. Privilege Big Data sources ensuring sufficient time and cross-
sectional coverage
3. If a bias is observed a bias correction can be included in the
nowcasting strategy.
4. To deal with possible instabilities of the relationships between the
Big Data and the target variables, nowcasting models should be re-
specified on a regular basis (e.g. yearly) and occasionally in the
presence of unexpected events.
Step 3: Big data preparation
Recommendations
1. Big data often unstructured: proper mapping
2. Pre-treatment to remove deterministic patterns
• Outliers, calendar effects, missing observations
• Seasonal and non-seasonal short-term movements should
be dealt accordingly to the characteristic of the target
variable
3. Create a specific IT environment where the original
data are collected and stored with associated routines
4. Ensure the availability of an exhaustive
documentation of the Big Data conversion process
Step 4: Big Data modelling strategy
Recommendations
1. Identification of appropriate econometric techniques
2. First dimension: choice between the use of methods suited for
large but not huge datasets, therefore applied to summaries of the
Big Data (Google Trends)
• nowcasting with large datasets can be based on factor models,
large BVARs, or shrinkage regressions
3. Huge datasets can be handled by sparse principal components,
linear models combined with heuristic optimization, or a variety of
machine learning methods such as LASSO & LARS regression
4. In case of mixed frequency data, methods such as UMIDAS and,
as a second best, Bridge, should be privileged.
Step 5: Results evaluation of Big Data
based nowcasting
Recommendations
1. Run a critical and comprehensive assessment of the
contribution of Big Data for nowcasting the indicator of interest
based, e.g., on standard criteria such as MSE or MAE.
2. In order to reduce the extent of data and model snooping, a cross-
validation approach should be followed:
• various models and indicators, with and without Big Data,
estimated over a first sample and selected and/or pooled
according to their performance
• then the performance of the preferred approaches re-evaluated
over a second sample
Case study: No News, Good News?
- Implementation of all these steps for nowcasting IP growth, inflation
and unemployment in several EU countries in a pseudo out of
sample context, using Google trends for specific and carefully
selected keywords for each country and variable
-
- Overall, the results are mixed but there are several cases where
Google trends, when combined with rather sophisticated econometric
techniques, yield forecasting gains, though generally small.
Eurostat
Statistical Methods: findings
• Big Data specific features: transform unstructured into
structured data, time series decompositions, handling mixed
frequency data
• Sparse regression (LASSO) works for fat, huge data
• Data reduction techniques (PLS) helpful for large variables
• (U)-MIDAS or bridge modelling for mixed frequency data
• Dimensionality reduction improves nowcasting
http://ec.europa.eu/eurostat/publications/sta
tistical-working-papers
https://github.com/eurostat/econowcast
Some other lessons learned..
 Data access is challenging
 Quality of sources is an issue
o Timeliness gains
o Granularity benefits
o Validation impacts
 Data mesh-ups: new products
 New metrics: measure the unmeasured
Focus on
New Skills
for
New
Methods
for
New Data
…but it is also about culture & management
automate
model
document
version
metadata
program
process
reuse
share
test
How are resources (mis)used:
Duplication? Redundancy?
Cost?
?
Open questions for debate
• Reversing the approach: starting from the data
rather than from the goal?
• How can a statistician become a data scientist?
• Are Big data more useful for benchmarking
purposes or rather to measure new phenomena?
• Design based to model based to algorithm based:
maybe there is a possible link between SAE and
statistical learning?
• How to measure data uncertainty?
Dario.Buono@ec.europa.eu
@darbuonoorrespond to those of the
corresponding organisations of affiliation

More Related Content

What's hot

Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentationPiet J.H. Daas
 
Politics of numbers in the framework of redd+
Politics of numbers in the framework of redd+Politics of numbers in the framework of redd+
Politics of numbers in the framework of redd+CIFOR-ICRAF
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsIstituto nazionale di statistica
 
New Data for Innovation Policy
New Data for Innovation PolicyNew Data for Innovation Policy
New Data for Innovation PolicyJuan Mateos-Garcia
 
Arloesiadur: An analytics experiment in innovation policy
Arloesiadur: An analytics experiment in innovation policyArloesiadur: An analytics experiment in innovation policy
Arloesiadur: An analytics experiment in innovation policyJuan Mateos-Garcia
 
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate UniversityBig Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate UniversityData Con LA
 
Big Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleBig Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleMichael Lew
 
J. Van der Valk - From Labour Force Survey to Labour Market Statistics
J. Van der Valk - From Labour Force Survey to  Labour Market Statistics J. Van der Valk - From Labour Force Survey to  Labour Market Statistics
J. Van der Valk - From Labour Force Survey to Labour Market Statistics Istituto nazionale di statistica
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaPiet J.H. Daas
 
A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...
A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...
A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...Tilastokeskus
 
Query reverse engineering in the context of the semantic web
Query reverse engineering in the context of the semantic webQuery reverse engineering in the context of the semantic web
Query reverse engineering in the context of the semantic webLeandro Tabares-Martin
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...Mindtrek
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsPiet J.H. Daas
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output dataEdwin de Jonge
 
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 mahdi ben jelloul, microsimulation for policy evaluationSession 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 mahdi ben jelloul, microsimulation for policy evaluationEconomic Research Forum
 
IAOS 2018 - Continuing to unlock the potential of new and existing data sourc...
IAOS 2018 - Continuing to unlock the potential ofnew and existing data sourc...IAOS 2018 - Continuing to unlock the potential ofnew and existing data sourc...
IAOS 2018 - Continuing to unlock the potential of new and existing data sourc...StatsCommunications
 
New data for innovation policy SPRU 50th presentation
New data for innovation policy SPRU 50th presentationNew data for innovation policy SPRU 50th presentation
New data for innovation policy SPRU 50th presentationJuan Mateos-Garcia
 

What's hot (20)

Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 
Politics of numbers in the framework of redd+
Politics of numbers in the framework of redd+Politics of numbers in the framework of redd+
Politics of numbers in the framework of redd+
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European Statistics
 
New Data for Innovation Policy
New Data for Innovation PolicyNew Data for Innovation Policy
New Data for Innovation Policy
 
Arloesiadur: An analytics experiment in innovation policy
Arloesiadur: An analytics experiment in innovation policyArloesiadur: An analytics experiment in innovation policy
Arloesiadur: An analytics experiment in innovation policy
 
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate UniversityBig Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
 
Big Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleBig Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson Schedule
 
J. Van der Valk - From Labour Force Survey to Labour Market Statistics
J. Van der Valk - From Labour Force Survey to  Labour Market Statistics J. Van der Valk - From Labour Force Survey to  Labour Market Statistics
J. Van der Valk - From Labour Force Survey to Labour Market Statistics
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...
A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...
A statistical approach to big data, Gustav Haraldsen and Arild Langseth, Stat...
 
Query reverse engineering in the context of the semantic web
Query reverse engineering in the context of the semantic webQuery reverse engineering in the context of the semantic web
Query reverse engineering in the context of the semantic web
 
dssi 10 12
dssi 10 12dssi 10 12
dssi 10 12
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output data
 
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 mahdi ben jelloul, microsimulation for policy evaluationSession 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
 
Banner
BannerBanner
Banner
 
IAOS 2018 - Continuing to unlock the potential of new and existing data sourc...
IAOS 2018 - Continuing to unlock the potential ofnew and existing data sourc...IAOS 2018 - Continuing to unlock the potential ofnew and existing data sourc...
IAOS 2018 - Continuing to unlock the potential of new and existing data sourc...
 
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
 
New data for innovation policy SPRU 50th presentation
New data for innovation policy SPRU 50th presentationNew data for innovation policy SPRU 50th presentation
New data for innovation policy SPRU 50th presentation
 

Similar to Big Data and Nowcasting

IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...StatsCommunications
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. maigva
 
IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...
IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...
IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...StatsCommunications
 
Reliability of estimates in socio-demographic groups with small samples
Reliability of estimates in socio-demographic groups with small samplesReliability of estimates in socio-demographic groups with small samples
Reliability of estimates in socio-demographic groups with small samplesDario Buono
 
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...Big Data Value Association
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
Big Data Analysis: The curse of dimensionality in official statistics
Big Data Analysis: The curse of dimensionality in official statisticsBig Data Analysis: The curse of dimensionality in official statistics
Big Data Analysis: The curse of dimensionality in official statisticsDario Buono
 
dataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnndataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnnjainutkarsh078
 
OECD Blue Sky 3 Summary Presentation
OECD Blue Sky 3 Summary PresentationOECD Blue Sky 3 Summary Presentation
OECD Blue Sky 3 Summary Presentationinnovationoecd
 
Methodological network and strategy
Methodological network and strategy Methodological network and strategy
Methodological network and strategy Dario Buono
 
Use of Computational Tools to Support Planning & Policy by Johannes M. Bauer
Use of Computational Tools to Support Planning & Policy by Johannes M. BauerUse of Computational Tools to Support Planning & Policy by Johannes M. Bauer
Use of Computational Tools to Support Planning & Policy by Johannes M. BauerLaleah Fernandez
 
Big Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBig Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBYTE Project
 

Similar to Big Data and Nowcasting (20)

IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
 
Improving activity data for Tier 2 estimates of livestock emissions: End of W...
Improving activity data for Tier 2 estimates of livestock emissions: End of W...Improving activity data for Tier 2 estimates of livestock emissions: End of W...
Improving activity data for Tier 2 estimates of livestock emissions: End of W...
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
ppt1.pptx
ppt1.pptxppt1.pptx
ppt1.pptx
 
IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...
IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...
IAOS 2018 - Evaluation of nowcasting/flash-estimation based on a big set of ...
 
Reliability of estimates in socio-demographic groups with small samples
Reliability of estimates in socio-demographic groups with small samplesReliability of estimates in socio-demographic groups with small samples
Reliability of estimates in socio-demographic groups with small samples
 
Big data
Big dataBig data
Big data
 
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
BDVe Webinar Series - Big Data for Public Policy, the state of play - Roadmap...
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Big Data Analysis: The curse of dimensionality in official statistics
Big Data Analysis: The curse of dimensionality in official statisticsBig Data Analysis: The curse of dimensionality in official statistics
Big Data Analysis: The curse of dimensionality in official statistics
 
Data mining
Data mining Data mining
Data mining
 
dataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnndataminingppt-170616163835.pdf jejwwkwnwnn
dataminingppt-170616163835.pdf jejwwkwnwnn
 
OECD Blue Sky 3 Summary Presentation
OECD Blue Sky 3 Summary PresentationOECD Blue Sky 3 Summary Presentation
OECD Blue Sky 3 Summary Presentation
 
Methodological network and strategy
Methodological network and strategy Methodological network and strategy
Methodological network and strategy
 
Use of Computational Tools to Support Planning & Policy by Johannes M. Bauer
Use of Computational Tools to Support Planning & Policy by Johannes M. BauerUse of Computational Tools to Support Planning & Policy by Johannes M. Bauer
Use of Computational Tools to Support Planning & Policy by Johannes M. Bauer
 
Big Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBig Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case Studies
 

More from Dario Buono

Reporting uncertainties - too much information?
Reporting uncertainties - too much information?Reporting uncertainties - too much information?
Reporting uncertainties - too much information?Dario Buono
 
Skills for the new generation of statisticians
Skills for the new generation of statisticians Skills for the new generation of statisticians
Skills for the new generation of statisticians Dario Buono
 
JDemetra+ Java Tool for Seasonal Adjustment
JDemetra+ Java Tool for Seasonal AdjustmentJDemetra+ Java Tool for Seasonal Adjustment
JDemetra+ Java Tool for Seasonal AdjustmentDario Buono
 
Physics4Stats & BMI vs. QoL
Physics4Stats & BMI vs. QoLPhysics4Stats & BMI vs. QoL
Physics4Stats & BMI vs. QoLDario Buono
 
Safebook quality grading
Safebook quality gradingSafebook quality grading
Safebook quality gradingDario Buono
 
MIP: Analysis of metadata and data revisions
MIP: Analysis of metadata and data revisionsMIP: Analysis of metadata and data revisions
MIP: Analysis of metadata and data revisionsDario Buono
 
New innovative 3 way anova a-priori test for direct vs. indirect approach in ...
New innovative 3 way anova a-priori test for direct vs. indirect approach in ...New innovative 3 way anova a-priori test for direct vs. indirect approach in ...
New innovative 3 way anova a-priori test for direct vs. indirect approach in ...Dario Buono
 
Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...
Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...
Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...Dario Buono
 
Detecting outliers at the end of the series using forecast intervals
Detecting outliers at the end of the series using forecast intervals Detecting outliers at the end of the series using forecast intervals
Detecting outliers at the end of the series using forecast intervals Dario Buono
 
1 out of 20 scenarios
1 out of 20 scenarios1 out of 20 scenarios
1 out of 20 scenariosDario Buono
 
Eurostat methodological skills staff survey lesson learned final
Eurostat methodological skills staff survey lesson learned finalEurostat methodological skills staff survey lesson learned final
Eurostat methodological skills staff survey lesson learned finalDario Buono
 

More from Dario Buono (11)

Reporting uncertainties - too much information?
Reporting uncertainties - too much information?Reporting uncertainties - too much information?
Reporting uncertainties - too much information?
 
Skills for the new generation of statisticians
Skills for the new generation of statisticians Skills for the new generation of statisticians
Skills for the new generation of statisticians
 
JDemetra+ Java Tool for Seasonal Adjustment
JDemetra+ Java Tool for Seasonal AdjustmentJDemetra+ Java Tool for Seasonal Adjustment
JDemetra+ Java Tool for Seasonal Adjustment
 
Physics4Stats & BMI vs. QoL
Physics4Stats & BMI vs. QoLPhysics4Stats & BMI vs. QoL
Physics4Stats & BMI vs. QoL
 
Safebook quality grading
Safebook quality gradingSafebook quality grading
Safebook quality grading
 
MIP: Analysis of metadata and data revisions
MIP: Analysis of metadata and data revisionsMIP: Analysis of metadata and data revisions
MIP: Analysis of metadata and data revisions
 
New innovative 3 way anova a-priori test for direct vs. indirect approach in ...
New innovative 3 way anova a-priori test for direct vs. indirect approach in ...New innovative 3 way anova a-priori test for direct vs. indirect approach in ...
New innovative 3 way anova a-priori test for direct vs. indirect approach in ...
 
Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...
Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...
Eurostat tools for benchmarking and seasonal adjustment j_demetra+ and jecotr...
 
Detecting outliers at the end of the series using forecast intervals
Detecting outliers at the end of the series using forecast intervals Detecting outliers at the end of the series using forecast intervals
Detecting outliers at the end of the series using forecast intervals
 
1 out of 20 scenarios
1 out of 20 scenarios1 out of 20 scenarios
1 out of 20 scenarios
 
Eurostat methodological skills staff survey lesson learned final
Eurostat methodological skills staff survey lesson learned finalEurostat methodological skills staff survey lesson learned final
Eurostat methodological skills staff survey lesson learned final
 

Recently uploaded

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligencePriyadharshiniG41
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...boychatmate1
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 

Recently uploaded (20)

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 

Big Data and Nowcasting

  • 1. Big Data and Nowcasting Table 3 New and traditional data sources for local indicators – is it possible reconciling small areas and big data? Dario Buono*, European Commission, Eurostat dario.buono@ec.europa.eu @darbuono *The views expressed are the author’s alone and do not necessarily correspond to those of the corresponding organisations of affiliation Workshop on Small Area Methods and living conditions indicators in European poverty studies in the era of data deluge and Big Data Pisa, 8-10 May 2018
  • 2. Eurostat, the Statistical Office of EU • About 700 people with 28 different nationalities • Statistical Office of European Union, part of EC • Core business: • Euro-zone (19) & EU (28) aggregates • harmonization, best practices, guidelines, trainings & international cooperation, grants research funds • Methodology team: Time Series, Econometrics, Data Analytics, Use of R in Statistical Offices
  • 4. Eurostat State-of-play Explosion of information around us Popularity of internet and other new technologies Multitude of new data sources Is change needed? Do not start with data, but start with a good question!
  • 5. Why interested in Big Data for nowcasting? • Big Data are complementary information to standard data, being based on different information sets • More granular perspective on the indicator of interest, both in the temporal and cross-sectional dimensions • It is timely available, generally not subject to revisions
  • 6. A Big Data expert is… actually a team!
  • 7. ….ended in these research activities Can Big Data help for Macroeconomic Nowcasting? What are the potential Big Data sources? 1. Literature review 2. Models/methods to be used for Big data 3. Recommendations on how to handle Big Data 4. Case study: IPI, Inflation, unemployment of some EU countries
  • 8. Big Data types & dimensionality • When the dimensionality increases, the volume of the space increases so fast that the available data become sparse. • Use of a typology based on Doornik and Hendry (2015): • Tall data: many observation, few variables • Fat data: many variables, few observations • Huge data: many variables, many observations
  • 9.
  • 10.
  • 11. Big Data Types potentially interesting for Macro Nowcasting Financial markets data Scanner prices data Electronic payments data Online prices data Mobile phones data Online search data Sensor data Textual data Satellite images data Social media data
  • 12. Types of Big Data by Dominant Dimension Type Fat Tall Huge Financial Markets X Electronic Payments X Mobile Phones X X Sensor Data / IoT X X Satellite Images X Scanner Prices X X Online Prices X X Online Search X Fat (big cross-sectional dimension, N, small temporal dimension, T) Tall (small N, big T) Huge (big N, big T)
  • 13. Eurostat Models race • Dynamic Factor Analysis • Partial Least Squares • Bayesian Regression • LASSO regression • U-Midas models • Model averaging 255 models tested, macro-financial & google trend data
  • 14. From Data Access to Modelling Step-by-step approach, accompanied by specific recommendations for the use of big data for macroeconomic nowcasting, guiding to • the identification and the choice of Big Data • pre-treatment and econometric modelling • the comparative evaluation of results to obtain a very useful tool for decision about the use or not of Big Data
  • 15. Step 1: Big Data usefulness within a nowcasting exercise Recommendations 1. Evaluate the quality of the existing nowcasts and identify issue (bias or inefficiency or large errors in specific periods), that can be fixed by adding information in Big Data based indicators 2. Use of Big Data only when expecting to improve the timeliness and/or the quality of nowcastings 3. Do not consider Big Data sources with spurious correlations with the target variable
  • 16. Step 2: Assessment of big-data accessibility and quality Recommendations 1. Privilege data providers with guarantee of continuity and of the availability of a good metadata associated to the Big Data 2. Privilege Big Data sources ensuring sufficient time and cross- sectional coverage 3. If a bias is observed a bias correction can be included in the nowcasting strategy. 4. To deal with possible instabilities of the relationships between the Big Data and the target variables, nowcasting models should be re- specified on a regular basis (e.g. yearly) and occasionally in the presence of unexpected events.
  • 17. Step 3: Big data preparation Recommendations 1. Big data often unstructured: proper mapping 2. Pre-treatment to remove deterministic patterns • Outliers, calendar effects, missing observations • Seasonal and non-seasonal short-term movements should be dealt accordingly to the characteristic of the target variable 3. Create a specific IT environment where the original data are collected and stored with associated routines 4. Ensure the availability of an exhaustive documentation of the Big Data conversion process
  • 18. Step 4: Big Data modelling strategy Recommendations 1. Identification of appropriate econometric techniques 2. First dimension: choice between the use of methods suited for large but not huge datasets, therefore applied to summaries of the Big Data (Google Trends) • nowcasting with large datasets can be based on factor models, large BVARs, or shrinkage regressions 3. Huge datasets can be handled by sparse principal components, linear models combined with heuristic optimization, or a variety of machine learning methods such as LASSO & LARS regression 4. In case of mixed frequency data, methods such as UMIDAS and, as a second best, Bridge, should be privileged.
  • 19. Step 5: Results evaluation of Big Data based nowcasting Recommendations 1. Run a critical and comprehensive assessment of the contribution of Big Data for nowcasting the indicator of interest based, e.g., on standard criteria such as MSE or MAE. 2. In order to reduce the extent of data and model snooping, a cross- validation approach should be followed: • various models and indicators, with and without Big Data, estimated over a first sample and selected and/or pooled according to their performance • then the performance of the preferred approaches re-evaluated over a second sample
  • 20. Case study: No News, Good News? - Implementation of all these steps for nowcasting IP growth, inflation and unemployment in several EU countries in a pseudo out of sample context, using Google trends for specific and carefully selected keywords for each country and variable - - Overall, the results are mixed but there are several cases where Google trends, when combined with rather sophisticated econometric techniques, yield forecasting gains, though generally small.
  • 21. Eurostat Statistical Methods: findings • Big Data specific features: transform unstructured into structured data, time series decompositions, handling mixed frequency data • Sparse regression (LASSO) works for fat, huge data • Data reduction techniques (PLS) helpful for large variables • (U)-MIDAS or bridge modelling for mixed frequency data • Dimensionality reduction improves nowcasting
  • 23. Some other lessons learned..  Data access is challenging  Quality of sources is an issue o Timeliness gains o Granularity benefits o Validation impacts  Data mesh-ups: new products  New metrics: measure the unmeasured
  • 25. …but it is also about culture & management automate model document version metadata program process reuse share test How are resources (mis)used: Duplication? Redundancy? Cost? ?
  • 26. Open questions for debate • Reversing the approach: starting from the data rather than from the goal? • How can a statistician become a data scientist? • Are Big data more useful for benchmarking purposes or rather to measure new phenomena? • Design based to model based to algorithm based: maybe there is a possible link between SAE and statistical learning? • How to measure data uncertainty? Dario.Buono@ec.europa.eu @darbuonoorrespond to those of the corresponding organisations of affiliation