SlideShare a Scribd company logo
1 of 47
Download to read offline
Tamara Louie
October 21, 2018
PyData LA 2018
Applying statistical modeling and
machine learning to perform
time-series forecasting
1
● If you want to follow along with this tutorial presentation, the presentation
is available here:
https://goo.gl/xTgB7o
First things first.
2
3
Background and context
Data due diligence
Analysis of time series data
Modeling time series data
Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
● Please let me know if you want to go more or less in
depth into a particular subject.
● Please feel free to stop me and ask any clarifying
questions.
Some notes before we begin
4
5
Background and context
Data due diligence
Analysis of time series data
Modeling time series data
Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
Applications in many fields, including politics, finance, health, etc.
Why is forecasting important (or at least,
interesting)?
6
Source of images, from left to right. 1. FiveThirtyEight. Last updated October 10, 2018. Last accessed October 10, 2018. Link. 2. Bloomberg. Last accessed
October 10, 2018. Link. FiveThirtyEight. Last updated April 2, 2018. Last updated October 10, 2018. Link.
● Why time-series data is different.
● How to process time-series data.
● How to better understand time-series data.
● How to apply statistical and machine learning methods to time-series
problems.
● Understand some strengths and weaknesses of these models.
● How to evaluate, interpret, and convey output from forecasting models.
What are some things you may learn in this
tutorial?
7
Toolbox:
● Colaboratory (Access to Chrome / Firefox, Google account) OR
● Access to a iPython notebook-like environment (e.g., conda, own jupyter
environment)
Methodologies:
● Familiarity with data-related Python packages (e.g., numpy, pandas,
matplotlib, datetime).
● Basic knowledge of statistical modeling and machine learning.
What you will need for this tutorial
8
Who am I?
Why am I teaching this tutorial?
9Source: Placeholder
Numbers
2014 2018
Limits of Mechanistic versus
Machine Learning Models for
Influenza Forecasting in the
United States
Time-series
forecasting
Other things! Also some
forecasting work
Sequence
models
Cloud-based Electronic Health
Records for Real-time,
Region-specific Influenza
Surveillance
Source of images, from left to right. 1. Harvard T.H. Chan School of Public Health. Link. Legendary Entertainment. Link. Twitter. Link. Pinterest. Link. All last
accessed October 10, 2018.
10
Background and context
Data due diligence
Analysis of time series data
Modeling time series data
Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
● I think that it is important to start with a very concrete question that you
think can be answered with data, specifically with time-series data.
● Some examples of questions that I might answer with forecasting:
○ What is the expected voter turnout for California for the 2018 midterm
elections?
○ What is the future expected price of Apple stock over the next year?
○ What is the expected life expectancy of the average US female in 50
years?
What question do I want to answer?
11
● Time-series data can be defined in many ways. I tend
to describe it as “data collected on the same metric or
same object at regular or irregular time intervals.”
● In terms of whether one needs time-series data
depends on the question.
● I think about whether there is an inherent relationship or
structure between data at various time points (e.g., is
there a time-dependence), and whether we can
leverage that time-ordered information.
What is time-series data? Do I need time-series
data to answer this question?
12
Source of images. Alice Oglethorpe, Fitbit. Link. Last accessed October 10, 2018.
● I think that this step is very important, as it will determine if I can even
tackle this question.
● I tend to choose questions where there is available data, or the capability
to create the data myself.
● I am planning on using data from this link, which appears to be from public
data available about Airbnb.
● Before I start figuring out how to ingest this data, I first want to read about
this data and understand if it is what I want, and if it fulfills some basic
requirements about data quality.
How do we get some time-series data?
13
How is the data is generated?
14
Source of images. Abivin. Last updated December 25, 2017. Link. Last accessed October 10, 2018.
See here for some information. According to the site:
● Utilizes public information compiled from the Airbnb website, including the
availability calendar for 365 days in the future, and the reviews for each
listing
● Not associated with or endorsed by Airbnb or any of Airbnb's competitors.
Is the data clean or dirty? Is additional
processing necessary?
15
Source of images. Abivin. Last updated December 25, 2017. Link. All last accessed October 10, 2018.
Is the data clean or dirty?
● Data was verified, cleansed, analyzed and aggregated by someone. I do
not know how it was pre-processed.
● The author does not know if there has been additional processing or
modification from Airbnb in releasing their public data.
Is additional processing necessary?
● Are there nulls? Yes, there are a lack of entries (i.e., rows) for some dates.
● What are the data types? Units? Need to change the dates to date type.
Units appear okay.
● This aspect is quite time intensive if
your data is large and / or not in a
state to be ingested by Python.
● In fact, this step may take the
majority of your time when you are
performing a modeling or machine
learning task, between getting and
reading in the data, cleaning data,
and making sure the data is being
ingested correctly, and verifying that
the data is being logged correctly.
Where do I store my data? How do I access my
data?
16
Source of images. Steve Lohr, New York Times. Last updated June 19, 2013. Link. Last accessed October 10, 2018.
How do we access the data without crashing?
● It is just a 23MB CSV file. Most single clusters or machines can handle this
size data.
Do I need to sample my data?
● Not in this case.
Can I load it all into memory on a single machine?
● Yes, we can load it all into memory on a single machine.
Now, let’s download the data.
How do we load the data into a place where we
can access it?
17
Please use the link below to download the following CSV file (23 MB) from the
“Los Angeles” section http://insideairbnb.com/get-the-data.html
reviews.csv
Download the data
18
Source of images. Inside Airbnb. Link. Last accessed October 10, 2018.
For the rest of this tutorial, let’s refer to the
iPython notebook
19
20
[Colaboratory - preferred method]
● Try opening the Google Drive link below
https://goo.gl/r7CFcN
● Go to File → Save a copy in Drive…
● Save a copy of the iPython notebook:
PyData_LA_2018_Tutorial.ipynb
Download the iPython
notebook - option 1
[Alternative with Github]
● Download from my Github link below
https://github.com/tklouie/PyData_LA_2018
● Upload the following iPython notebook to your preferred python / Jupyter
environment PyData_LA_2018_Tutorial.ipynb
Download the iPython notebook - option 2
21
Let’s open the iPython notebook and proceed.
I will also add in some of the notes from the
iPython notebook here in the relevant sections.
22
● Ideally, one should set up their own virtual environment and determine the
versions of each library that they are using.
● Here, we will assume that the Colaboratory environment has some shared
environment with access to common Python libraries and the ability to
install other libraries necessary.
Import some relevant packages
23
● How many rows are in the dataset?
● How many columns are in this dataset?
● What data types are the columns?
● Is the data complete? Are there nulls? Do we have to infer values?
● What is the definition of these columns?
● What are some other caveats to the data?
Look at the data
24
● Understand the limitations of your data and what potential questions can
be answered by data is important. These questions can reduce, expand, or
modify the scope of your project.
● If you defined a scope or goal for your project before digging into the data,
this might be a good time to revisit it.
Data. We have daily count of reviews for given listing ids for given dates.
Questions I could try to answer.
● Forecast future number of reviews for the Los Angeles area.
● Forecast the future number of reviews for specific listings in the Los
Angeles area.
What are some questions I can answer with this
data?
25
Statistical models
● Ignore the time-series aspect completely and model using traditional
statistical modeling toolbox (e.g., regression-based models).
● Univariate statistical time-series modeling (e.g., averaging and
smoothing models, ARIMA models).
● Slight modifications to univariate statistical time-series modeling (e.g.,
external regressors, multi-variate models).
● Additive or component models (e.g., Facebook Prophet package).
● Structural time series modeling (e.g., Bayesian structural time series
modeling, hierarchical time series modeling).
What techniques may help answer these
questions?
26
Machine learning models
● Ignore the time-series aspect completely and model using traditional
machine learning modeling toolbox. (e.g., Support Vector Machines
(SVMs), Random Forest Regression, Gradient-Boosted Decision Trees
(GBDTs), Neural Networks (NNs).
● Hidden markov models (HMMs).
● Other sequence-based models.
● Gaussian processes (GPs).
● Recurrent neural networks (RNNs).
What techniques may help answer these
questions?
27
What techniques may help answer these
questions?
28
Additional data considerations before choosing a model
● Whether or not to incorporate external data
● Whether or not to keep as univariate or multivariate (i.e., which features
and number of features)
● Outlier detection and removal
● Missing value imputation
29
Background and context
Data due diligence
Analysis of time series data
Modeling time series data
Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
Most time-series models assume that the underlying time-series data is
stationary. This assumption gives us some nice statistical properties that
allows us to use various models for forecasting.
If we are using past data to predict future data, we should assume that the data
will follow the same general trends and patterns as in the past. This general
statement holds for most training data and modeling tasks.
Look at stationarity
30
Stationarity is a statistical assumption that a
time-series has:
● Constant mean
● Constant variance
● Autocovariance does not depend on time
There are some good diagrams and
explanations on stationarity here and here.
Sometimes we need to transform the data in
order to make it stationary. However, this
transformation then calls into question if this
data is truly stationary and is suited to be
modeled using these techniques.
Look at stationarity
31
What happens if you do not correct for these
things?
32
Many things can happen, including:
● Variance can be mis-specified
● Model fit can be worse.
● Not leveraging valuable time-dependent nature of the data.
Here are some resources on the pitfalls of using traditional methods for time
series analysis.
● Quora link
● Quora link
It is common for time series data to have to correct for non-stationarity.
2 common reasons behind non-stationarity are:
1. Trend – mean is not constant over time.
2. Seasonality – variance is not constant over time.
There are ways to correct for trend and seasonality, to make the time series
stationary.
Correct for stationarity
33
Transformation
● Examples. Log, square root, etc.
Smoothing
● Examples. Weekly average, monthly average, rolling averages.
Differencing
● Examples. First-order differencing.
Polynomial Fitting
● Examples. Fit a regression model.
Decomposition
Eliminating trend and seasonality
34
35
Background and context
Data due diligence
Analysis of time series data
Modeling time series data
Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
"Forecasting can take many forms—staring into crystal balls or bowls of tea
leaves, combining the opinions of experts, brainstorming, scenario generation,
what-if analysis, Monte Carlo simulation, solving equations that are dictated by
physical laws or economic theories—but statistical forecasting, which is the
main topic to be discussed here, is the art and science of forecasting from
data, with or without knowing in advance what equation you should use."
Robert Nau, Principles and Risks of Forecasting
Why is statistical forecasting important (or at
least, interesting)?
36
We can use ARIMA models when we know there is dependence between
values and we can leverage that information to forecast.
ARIMA = Auto-Regressive Integrated Moving Average.
Assumptions. The time-series is stationary.
Depends on:
1. Number of AR (Auto-Regressive) terms (p).
2. Number of I (Integrated or Difference) terms (d).
3. Number of MA (Moving Average) terms (q)
Let us model some time-series data! Finally!
ARIMA models.
37
What do ARIMA models look like?
38
Venkat Reddy. Last accessed October 15, 2018. <https://www.slideshare.net/21_venkat/arima-26196965>.
How do we determine p, d, and q? For p and q, we can use ACF and PACF
plots (below).
● Autocorrelation Function (ACF). Correlation between the time series with
a lagged version of itself (e.g., correlation of Y(t) with Y(t-1)).
● Partial Autocorrelation Function (PACF). Additional correlation explained
by each successive lagged term.
How do we interpret ACF and PACF plots?
● p – Lag value where the PACF chart crosses the upper confidence interval
for the first time.
● q – Lag value where the ACF chart crosses the upper confidence interval
for the first time.
ACF and PACF Plots
39
Facebook Prophet is a tool that allows folks to forecast using additive or
component models relatively easily. It can also include things like:
● Day of week effects
● Day of year effects
● Holiday effects
● Trend trajectory
● Can do MCMC sampling
Let us model some time-series data! Finally!
Facebook Prophet package.
40
Below are some resources
available to learn more about
Bayesian structural time series
modeling (BSTS):
● Causal Impact package from
Google (available in R).
● Example implementation in
python
● Example implementation in
python in github
Let us model some time-series data! Finally!
Bayesian structural time series modeling.
41
Google, Inc. Last accessed October 15, 2018. <https://google.github.io/CausalImpact/CausalImpact.html>.
Here are some resources on recurrent neural networks (RNN) and Long
Short-Term Memory networks (LSTMs):
● Link 1
● Link 2
● Link 3
Let us model some time-series data! Finally!
LSTM for regression
42
One to One. Classic Neural Network.
One to Many. Classic Neural Network.
Neelabh Pant. Last accessed October 15, 2018. <https://blog.statsbot.co/time-series-prediction-using-recurrent-neural-networks-lstms-807fa6ca7f>.
Let us model some time-series data! Finally!
LSTM for regression
43
Neelabh Pant. Last accessed October 15, 2018. <https://blog.statsbot.co/time-series-prediction-using-recurrent-neural-networks-lstms-807fa6ca7f>.
Recurrent Neural Network (RNN) Long Short-Term Memory Network
(LSTM)
Able to capture longer-term dependencies
in a sequence.
One way is through traditional continuous evaluation modeling metrics, such as
RMSE, MAPE, etc.
Here are some additional resources to learn about intricacies of time series
model validation, specifically around cross-validation:
● Link 1
● Link 2
● Link 3
● Link 4
How do we evaluate the success of our models?
How does this evaluation differ from traditional
modeling tasks?
44
Thank you!
tamaralouie@pinterest.com
45
Appendix
46
Random Walk
47
Here are some resources on random walks.
● Link 1
● Link 2

More Related Content

What's hot

Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptxCodingWorld5
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom IndustrySatyam Barsaiyan
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learningShajun Nisha
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
House Price Prediction An AI Approach.
House Price Prediction An AI Approach.House Price Prediction An AI Approach.
House Price Prediction An AI Approach.Nahian Ahmed
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkNAVER Engineering
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsFrancesco Casalegno
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theorySeongwon Hwang
 

What's hot (20)

Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modeling
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptx
 
machine learning
machine learningmachine learning
machine learning
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Random forest
Random forestRandom forest
Random forest
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Machine learning & Time Series Analysis
Machine learning & Time Series AnalysisMachine learning & Time Series Analysis
Machine learning & Time Series Analysis
 
House Price Prediction An AI Approach.
House Price Prediction An AI Approach.House Price Prediction An AI Approach.
House Price Prediction An AI Approach.
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural network
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
 

Similar to Applying Statistical Modeling and Machine Learning to Perform Time-Series Forecasting

Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Raul Eulogio
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Using machine learning to try and predict taxi availability by Narahari Allam...
Using machine learning to try and predict taxi availability by Narahari Allam...Using machine learning to try and predict taxi availability by Narahari Allam...
Using machine learning to try and predict taxi availability by Narahari Allam...PYCON MY PLT
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...DATAVERSITY
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessChia-Chi Chang
 
Limits of Machine Learning
Limits of Machine LearningLimits of Machine Learning
Limits of Machine LearningAlexey Grigorev
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
Data analytics course archtype
Data analytics course archtypeData analytics course archtype
Data analytics course archtypenakshatraL
 
Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFuturice
 
Introduction To Predictive Modelling
Introduction To Predictive ModellingIntroduction To Predictive Modelling
Introduction To Predictive ModellingSpotle.ai
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with PythonBenjamin Bengfort
 

Similar to Applying Statistical Modeling and Machine Learning to Perform Time-Series Forecasting (20)

Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
Using machine learning to try and predict taxi availability by Narahari Allam...
Using machine learning to try and predict taxi availability by Narahari Allam...Using machine learning to try and predict taxi availability by Narahari Allam...
Using machine learning to try and predict taxi availability by Narahari Allam...
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
Limits of Machine Learning
Limits of Machine LearningLimits of Machine Learning
Limits of Machine Learning
 
Intro big data.pdf
Intro big data.pdfIntro big data.pdf
Intro big data.pdf
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
BDACA - Lecture8
BDACA - Lecture8BDACA - Lecture8
BDACA - Lecture8
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Data analytics course archtype
Data analytics course archtypeData analytics course archtype
Data analytics course archtype
 
Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl Weir
 
Introduction To Predictive Modelling
Introduction To Predictive ModellingIntroduction To Predictive Modelling
Introduction To Predictive Modelling
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Data Science at Udemy
Data Science at UdemyData Science at Udemy
Data Science at Udemy
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Applying Statistical Modeling and Machine Learning to Perform Time-Series Forecasting

  • 1. Tamara Louie October 21, 2018 PyData LA 2018 Applying statistical modeling and machine learning to perform time-series forecasting 1
  • 2. ● If you want to follow along with this tutorial presentation, the presentation is available here: https://goo.gl/xTgB7o First things first. 2
  • 3. 3 Background and context Data due diligence Analysis of time series data Modeling time series data Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
  • 4. ● Please let me know if you want to go more or less in depth into a particular subject. ● Please feel free to stop me and ask any clarifying questions. Some notes before we begin 4
  • 5. 5 Background and context Data due diligence Analysis of time series data Modeling time series data Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
  • 6. Applications in many fields, including politics, finance, health, etc. Why is forecasting important (or at least, interesting)? 6 Source of images, from left to right. 1. FiveThirtyEight. Last updated October 10, 2018. Last accessed October 10, 2018. Link. 2. Bloomberg. Last accessed October 10, 2018. Link. FiveThirtyEight. Last updated April 2, 2018. Last updated October 10, 2018. Link.
  • 7. ● Why time-series data is different. ● How to process time-series data. ● How to better understand time-series data. ● How to apply statistical and machine learning methods to time-series problems. ● Understand some strengths and weaknesses of these models. ● How to evaluate, interpret, and convey output from forecasting models. What are some things you may learn in this tutorial? 7
  • 8. Toolbox: ● Colaboratory (Access to Chrome / Firefox, Google account) OR ● Access to a iPython notebook-like environment (e.g., conda, own jupyter environment) Methodologies: ● Familiarity with data-related Python packages (e.g., numpy, pandas, matplotlib, datetime). ● Basic knowledge of statistical modeling and machine learning. What you will need for this tutorial 8
  • 9. Who am I? Why am I teaching this tutorial? 9Source: Placeholder Numbers 2014 2018 Limits of Mechanistic versus Machine Learning Models for Influenza Forecasting in the United States Time-series forecasting Other things! Also some forecasting work Sequence models Cloud-based Electronic Health Records for Real-time, Region-specific Influenza Surveillance Source of images, from left to right. 1. Harvard T.H. Chan School of Public Health. Link. Legendary Entertainment. Link. Twitter. Link. Pinterest. Link. All last accessed October 10, 2018.
  • 10. 10 Background and context Data due diligence Analysis of time series data Modeling time series data Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
  • 11. ● I think that it is important to start with a very concrete question that you think can be answered with data, specifically with time-series data. ● Some examples of questions that I might answer with forecasting: ○ What is the expected voter turnout for California for the 2018 midterm elections? ○ What is the future expected price of Apple stock over the next year? ○ What is the expected life expectancy of the average US female in 50 years? What question do I want to answer? 11
  • 12. ● Time-series data can be defined in many ways. I tend to describe it as “data collected on the same metric or same object at regular or irregular time intervals.” ● In terms of whether one needs time-series data depends on the question. ● I think about whether there is an inherent relationship or structure between data at various time points (e.g., is there a time-dependence), and whether we can leverage that time-ordered information. What is time-series data? Do I need time-series data to answer this question? 12 Source of images. Alice Oglethorpe, Fitbit. Link. Last accessed October 10, 2018.
  • 13. ● I think that this step is very important, as it will determine if I can even tackle this question. ● I tend to choose questions where there is available data, or the capability to create the data myself. ● I am planning on using data from this link, which appears to be from public data available about Airbnb. ● Before I start figuring out how to ingest this data, I first want to read about this data and understand if it is what I want, and if it fulfills some basic requirements about data quality. How do we get some time-series data? 13
  • 14. How is the data is generated? 14 Source of images. Abivin. Last updated December 25, 2017. Link. Last accessed October 10, 2018. See here for some information. According to the site: ● Utilizes public information compiled from the Airbnb website, including the availability calendar for 365 days in the future, and the reviews for each listing ● Not associated with or endorsed by Airbnb or any of Airbnb's competitors.
  • 15. Is the data clean or dirty? Is additional processing necessary? 15 Source of images. Abivin. Last updated December 25, 2017. Link. All last accessed October 10, 2018. Is the data clean or dirty? ● Data was verified, cleansed, analyzed and aggregated by someone. I do not know how it was pre-processed. ● The author does not know if there has been additional processing or modification from Airbnb in releasing their public data. Is additional processing necessary? ● Are there nulls? Yes, there are a lack of entries (i.e., rows) for some dates. ● What are the data types? Units? Need to change the dates to date type. Units appear okay.
  • 16. ● This aspect is quite time intensive if your data is large and / or not in a state to be ingested by Python. ● In fact, this step may take the majority of your time when you are performing a modeling or machine learning task, between getting and reading in the data, cleaning data, and making sure the data is being ingested correctly, and verifying that the data is being logged correctly. Where do I store my data? How do I access my data? 16 Source of images. Steve Lohr, New York Times. Last updated June 19, 2013. Link. Last accessed October 10, 2018.
  • 17. How do we access the data without crashing? ● It is just a 23MB CSV file. Most single clusters or machines can handle this size data. Do I need to sample my data? ● Not in this case. Can I load it all into memory on a single machine? ● Yes, we can load it all into memory on a single machine. Now, let’s download the data. How do we load the data into a place where we can access it? 17
  • 18. Please use the link below to download the following CSV file (23 MB) from the “Los Angeles” section http://insideairbnb.com/get-the-data.html reviews.csv Download the data 18 Source of images. Inside Airbnb. Link. Last accessed October 10, 2018.
  • 19. For the rest of this tutorial, let’s refer to the iPython notebook 19
  • 20. 20 [Colaboratory - preferred method] ● Try opening the Google Drive link below https://goo.gl/r7CFcN ● Go to File → Save a copy in Drive… ● Save a copy of the iPython notebook: PyData_LA_2018_Tutorial.ipynb Download the iPython notebook - option 1
  • 21. [Alternative with Github] ● Download from my Github link below https://github.com/tklouie/PyData_LA_2018 ● Upload the following iPython notebook to your preferred python / Jupyter environment PyData_LA_2018_Tutorial.ipynb Download the iPython notebook - option 2 21
  • 22. Let’s open the iPython notebook and proceed. I will also add in some of the notes from the iPython notebook here in the relevant sections. 22
  • 23. ● Ideally, one should set up their own virtual environment and determine the versions of each library that they are using. ● Here, we will assume that the Colaboratory environment has some shared environment with access to common Python libraries and the ability to install other libraries necessary. Import some relevant packages 23
  • 24. ● How many rows are in the dataset? ● How many columns are in this dataset? ● What data types are the columns? ● Is the data complete? Are there nulls? Do we have to infer values? ● What is the definition of these columns? ● What are some other caveats to the data? Look at the data 24
  • 25. ● Understand the limitations of your data and what potential questions can be answered by data is important. These questions can reduce, expand, or modify the scope of your project. ● If you defined a scope or goal for your project before digging into the data, this might be a good time to revisit it. Data. We have daily count of reviews for given listing ids for given dates. Questions I could try to answer. ● Forecast future number of reviews for the Los Angeles area. ● Forecast the future number of reviews for specific listings in the Los Angeles area. What are some questions I can answer with this data? 25
  • 26. Statistical models ● Ignore the time-series aspect completely and model using traditional statistical modeling toolbox (e.g., regression-based models). ● Univariate statistical time-series modeling (e.g., averaging and smoothing models, ARIMA models). ● Slight modifications to univariate statistical time-series modeling (e.g., external regressors, multi-variate models). ● Additive or component models (e.g., Facebook Prophet package). ● Structural time series modeling (e.g., Bayesian structural time series modeling, hierarchical time series modeling). What techniques may help answer these questions? 26
  • 27. Machine learning models ● Ignore the time-series aspect completely and model using traditional machine learning modeling toolbox. (e.g., Support Vector Machines (SVMs), Random Forest Regression, Gradient-Boosted Decision Trees (GBDTs), Neural Networks (NNs). ● Hidden markov models (HMMs). ● Other sequence-based models. ● Gaussian processes (GPs). ● Recurrent neural networks (RNNs). What techniques may help answer these questions? 27
  • 28. What techniques may help answer these questions? 28 Additional data considerations before choosing a model ● Whether or not to incorporate external data ● Whether or not to keep as univariate or multivariate (i.e., which features and number of features) ● Outlier detection and removal ● Missing value imputation
  • 29. 29 Background and context Data due diligence Analysis of time series data Modeling time series data Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
  • 30. Most time-series models assume that the underlying time-series data is stationary. This assumption gives us some nice statistical properties that allows us to use various models for forecasting. If we are using past data to predict future data, we should assume that the data will follow the same general trends and patterns as in the past. This general statement holds for most training data and modeling tasks. Look at stationarity 30
  • 31. Stationarity is a statistical assumption that a time-series has: ● Constant mean ● Constant variance ● Autocovariance does not depend on time There are some good diagrams and explanations on stationarity here and here. Sometimes we need to transform the data in order to make it stationary. However, this transformation then calls into question if this data is truly stationary and is suited to be modeled using these techniques. Look at stationarity 31
  • 32. What happens if you do not correct for these things? 32 Many things can happen, including: ● Variance can be mis-specified ● Model fit can be worse. ● Not leveraging valuable time-dependent nature of the data. Here are some resources on the pitfalls of using traditional methods for time series analysis. ● Quora link ● Quora link
  • 33. It is common for time series data to have to correct for non-stationarity. 2 common reasons behind non-stationarity are: 1. Trend – mean is not constant over time. 2. Seasonality – variance is not constant over time. There are ways to correct for trend and seasonality, to make the time series stationary. Correct for stationarity 33
  • 34. Transformation ● Examples. Log, square root, etc. Smoothing ● Examples. Weekly average, monthly average, rolling averages. Differencing ● Examples. First-order differencing. Polynomial Fitting ● Examples. Fit a regression model. Decomposition Eliminating trend and seasonality 34
  • 35. 35 Background and context Data due diligence Analysis of time series data Modeling time series data Jill Pelto Art. Last accessed October 6, 2018. <http://www.jillpelto.com/>.
  • 36. "Forecasting can take many forms—staring into crystal balls or bowls of tea leaves, combining the opinions of experts, brainstorming, scenario generation, what-if analysis, Monte Carlo simulation, solving equations that are dictated by physical laws or economic theories—but statistical forecasting, which is the main topic to be discussed here, is the art and science of forecasting from data, with or without knowing in advance what equation you should use." Robert Nau, Principles and Risks of Forecasting Why is statistical forecasting important (or at least, interesting)? 36
  • 37. We can use ARIMA models when we know there is dependence between values and we can leverage that information to forecast. ARIMA = Auto-Regressive Integrated Moving Average. Assumptions. The time-series is stationary. Depends on: 1. Number of AR (Auto-Regressive) terms (p). 2. Number of I (Integrated or Difference) terms (d). 3. Number of MA (Moving Average) terms (q) Let us model some time-series data! Finally! ARIMA models. 37
  • 38. What do ARIMA models look like? 38 Venkat Reddy. Last accessed October 15, 2018. <https://www.slideshare.net/21_venkat/arima-26196965>.
  • 39. How do we determine p, d, and q? For p and q, we can use ACF and PACF plots (below). ● Autocorrelation Function (ACF). Correlation between the time series with a lagged version of itself (e.g., correlation of Y(t) with Y(t-1)). ● Partial Autocorrelation Function (PACF). Additional correlation explained by each successive lagged term. How do we interpret ACF and PACF plots? ● p – Lag value where the PACF chart crosses the upper confidence interval for the first time. ● q – Lag value where the ACF chart crosses the upper confidence interval for the first time. ACF and PACF Plots 39
  • 40. Facebook Prophet is a tool that allows folks to forecast using additive or component models relatively easily. It can also include things like: ● Day of week effects ● Day of year effects ● Holiday effects ● Trend trajectory ● Can do MCMC sampling Let us model some time-series data! Finally! Facebook Prophet package. 40
  • 41. Below are some resources available to learn more about Bayesian structural time series modeling (BSTS): ● Causal Impact package from Google (available in R). ● Example implementation in python ● Example implementation in python in github Let us model some time-series data! Finally! Bayesian structural time series modeling. 41 Google, Inc. Last accessed October 15, 2018. <https://google.github.io/CausalImpact/CausalImpact.html>.
  • 42. Here are some resources on recurrent neural networks (RNN) and Long Short-Term Memory networks (LSTMs): ● Link 1 ● Link 2 ● Link 3 Let us model some time-series data! Finally! LSTM for regression 42 One to One. Classic Neural Network. One to Many. Classic Neural Network. Neelabh Pant. Last accessed October 15, 2018. <https://blog.statsbot.co/time-series-prediction-using-recurrent-neural-networks-lstms-807fa6ca7f>.
  • 43. Let us model some time-series data! Finally! LSTM for regression 43 Neelabh Pant. Last accessed October 15, 2018. <https://blog.statsbot.co/time-series-prediction-using-recurrent-neural-networks-lstms-807fa6ca7f>. Recurrent Neural Network (RNN) Long Short-Term Memory Network (LSTM) Able to capture longer-term dependencies in a sequence.
  • 44. One way is through traditional continuous evaluation modeling metrics, such as RMSE, MAPE, etc. Here are some additional resources to learn about intricacies of time series model validation, specifically around cross-validation: ● Link 1 ● Link 2 ● Link 3 ● Link 4 How do we evaluate the success of our models? How does this evaluation differ from traditional modeling tasks? 44
  • 47. Random Walk 47 Here are some resources on random walks. ● Link 1 ● Link 2