SlideShare a Scribd company logo
1 of 27
TIME SERIES FORECASTING
AUTOREGRESSIVE INTEGRATED MOVING AVERAGE
WITH EXOGENOUS VARIABLES (ARIMAX)
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y T o w a r d s A u g m e n t e d A n a l y t i c s
Introduction with
Example
Introduction
• An Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) model
can be viewed as a multiple regression model with one or more autoregressive (AR) terms
and/or one or more moving average (MA) terms
• This method is suitable for forecasting when data is stationary/non stationary, Multivariate
and has any type of data pattern : level/trend /seasonality/cyclicity
• ARIMAX is simply an ARIMA with additional explanatory variables in categorical and/or
numeric format
Example
Let’s take an example of year wise
GDP values of India
As shown in figure below, the plot of
these data suggests that this is non
stationary data with upward trend
Hence, we can choose ARIMAX
algorithm for forecasting GDP as there
would be more than one variable
affecting the GDP
Actual GDP (Trillion)
Years GDP
Y1 0.35
Y2 0.38
Y3 0.39
Y4 0.40
Y5 0.44
Y6 0.50
Y7 0.58
Y8 0.60
Y9 0.64
Y10 0.70
Forecasted GDP (Trillion)
Y11 0.82
Y12 0.94
Y13 1.00
Y14 1.22
Y15 1.42
Standard tuning
parameters
Standard tuning parameters
Model
parameters :
In ARIMA, there are mainly three parameters
to be set to fit the model :
•p: This is the component to apply autoregressive model
on series
•d: This is the component to apply differencing on series :
Basically it converts non stationary data to stationary
( stationary series : the series remains at a fairly constant
level over time)
•q: This is the component to apply moving average model
on series
Model
approach
In ARIMAX, there are two approaches to fitting a
model : Automatic and Manual
When p, d and q are automatically selected by
system than it’s called automatic approach and
when p, d and q are manually input by user than
it’s called manual approach
For better forecasts, automatic approach should
be chosen, as in this approach, model
automatically selects and applies the right
parameters based on the nature of data
Forecast period
For both type of approaches , user has to input
the forecast period value
For example, if user wants to predict the sales
value for 10 periods ahead then this value should
be input as 10
Note : Refer calculations section to understand the
model parameters
Sample UI For
Input/Tuning
Parameters And Output
Sample UI For Selecting Inputs
And Applying Tuning Parameters
Select the variable you would like to Forecast
Year
GDP
Consumer Inflation
Wholesale Inflation
Industrial Index of Production
4
1
In step 3 , user can select more
than one predictor
In step 4 , if user changes the
approach to Manual then this
box should be displayed, with
additional provision to set p ,d ,q
values
Tuning parameters
Approach
Forecast
Period
Automatic
Approach
Forecast
Period
AR(p)
I(d)
MA(q)
Manual
By default this box should be
displayed with default approach
as Automatic. In this case
parameters to fit ARIMAX will be
automatically detected and
applied by algorithm
Select the time stamp
Year
GDP
Consumer Inflation
Wholesale Inflation
Industrial Index of Production
Select the predictors
Year
GDP
Consumer Inflation
Wholesale Inflation
Industrial Index of Production
3
Sample UI For Output
MAPE should not exceed beyond 10 % as it
represents the margin of error in forecasting
Accuracy shows how much accurate the
forecasts are, ideally it should be greater
than or equal to 90% else there is a need to
revise and fine tune the model (apply some
transformations on input data , check if basic
assumptions of ARIMAX are met, etc.)
Actual GDP (Trillion)
Years GDP
Y1 0.35
Y2 0.38
Y3 0.39
Y4 0.40
Y5 0.44
Y6 0.50
Y7 0.58
Y8 0.60
Y9 0.64
Y10 0.70
Forecasted GDP (Trillion)
Y11 0.82
Y12 0.94
Y13 1.00
Y14 1.22
Y15 1.42
Output will be forecasted values based on user specified time period
along with line charts showing actual and forecasted series and
prediction accuracy
Limitations
Limitations
It is based on an assumption of linear relationship
between the predictors (Xi) and the target variable(Y) i.e.
the scatter plot of each predictor versus target variable
should be nearly as shown in the figures 1 & 2 in right
Furthermore, there should not be multicollinearity in
data
• Multicollinearity generally occurs when there are
high correlations between two or more predictor
variables
• Examples of correlated predictor variables (also
called multicollinear predictors) are: a person’s height
and weight, age and sales price of a car, or years of
education and annual income
• An easy way to detect multicollinearity is to
calculate correlation coefficients for all pairs of
predictor variables, if it is close to or exactly 1 then
one of the predictors should be removed from the
model if at all possible
Note : Refer calculations section to understand Multicollinearity & Autocorrelation
Figure 1 Figure 2
Limitations
The Forecast error also known as
“Residuals” should show nearly
constant trend over time i.e. it
should be time independent as
shown in the figure 1 below in
contrast to the increasing/
decreasing trend shown in figure 2
below:
Note : Refer calculations section to understand Multicollinearity & Autocorrelation
Time dependent error ( decreasing with time)Time independent error ( fairly constant over
time & lying within certain range)
Figure 1 Figure 2
Business use case
Business use case
Business benefit:
•For various combination of
GDP/Consumer Inflation and Population
growth rates , company would be able to
forecast its product growth
•Moreover , company can analyze the gap
between targeted and estimated growth
and decide upon the strategy to reduce
this gap and achieve desired results
Business problem :
•A company wants to forecast its product
line growth for next couple of years
based on past 30 years’ yearly data
•The predictor variables in this case
would be as follows:
•Yearly consumer inflation rate
•Yearly GDP data
•Yearly population growth rate
•Data pattern : Input data exhibits non
stationarity , an upward trend pattern
as well as seasonality
Calculations
Calculations - Autoregression (AR)
• In an autoregressive model, which is one of the components in ARIMAX model, we
forecast the variable of interest using a linear combination of past values of the variable
• The term autoregression indicates that it is a regression of the variable against itself
• An autoregressive model of order p, denoted by AR(p) model, can be written as
where ,
c is a constant,
∅ is lag’s coefficient,
𝐞 𝒕 is an error term,
𝐩 is autoregressive model of order
• This is like a multiple regression but with lagged values of yt as predictors
• Order of this component (order of autoregression : AR) is given by parameter p while
fitting the model : ARIMAX (p,d, q)
Lagged values : past values of the variable
Calculations - Integration (I) / Differencing (d)
• The second component of ARIMAX model i.e. I (for "integration") , is
used to replace the series with the difference between their current
values and the previous values (and this differencing process can be
performed more than once as per the requirement )
• For example,
• The equation for first order differencing is 𝒚 𝒕 = 𝒚 𝒕 − 𝒚 𝒕−𝟏
• Hence, for 𝒚 𝒕 =2 and 𝒚 𝒕−𝟏 = 1 ; 𝒚 𝒕 will be 1
• Similarly second order differencing , 𝒚 𝒕 = (𝒚 𝒕 − 𝒚 𝒕−𝟏) −(𝒚 𝒕−𝟏 −
𝒚 𝒕−𝟐)
• Order of this component (order of differencing) is applied by
parameter d while fitting a model : ARIMAX (p,d,q)
Calculations - Moving average (MA)
• A moving average model, the third component in ARIMAX uses past
forecast errors as a series in a model
• A Moving average model of order q, denoted by MA(q) model, can be
written as
where ,
yt is predictor ,
c is a constant,
θ is lag’s coefficient,
𝐞 𝒕 is an error term,
q is moving average order
• Order of this component (order of moving average : MA ) is applied by
parameter q while fitting a model : ARIMAX (p ,d ,q )
Calculations - Exogenous variables (X)
• ARIMAX is the simply an ARIMA model with the inclusion of
exogenous variables (additional explanatory variables/predictors)
• It means you simply add one or more explanatory variables/
regressors to the forecasting equation
• For example, predictors such as Consumer Price Index , Producer Price
Index and Employment Statistics which directly/indirectly impacts the
GDP can be considered as exogenous variables to forecast the GDP
using ARIMAX
Identification of p,d,q values
• Values of p and q are determined based on the autocorrelation(ACF) and partial auto correlation(PACF) plots
and value of d depends on level of stationarity in data
• In PACF plot, number of spikes indicate the order of the autoregression/AR (value of p in ARIMAX(p,d,q))
• For instance, as you can see in the right figure below, there is one spike falling out of range, hence, the order
of AR i.e. value of p would be 1
• In ACF plot, number of spikes indicate the order of the moving average (value of q in ARIMAX(p,d,q))
• For instance , as you can see in the left figure there are five spikes falling out of range, hence, the order of
MA i.e. value of q would be 5
Identification of p,d,q values
 Thus, p, d and q parameters in ARIMAX(p , d , q) are substituted with integer values where p and q take
any values between 0 to 5 and value of d is set between 0 to 2
 For example, ARIMAX(2,1,1) means that you have a second order autoregressive model with a first
order moving average component and series has been differenced once to induce stationarity
 A value of 0 can be used for any of the above mentioned parameters indicating that particular
component (AR/ I/ MA) should not be used. This way, the ARIMAX model can be configured to perform
the function of an ARMAX model, and even a simple AR, I, or MA model depending on the data
Other default parameters
• Below are the other default parameters while taking manual approach of
fitting the model :
• Max Lag : The maximum lag order should be set to 20 (up to which lag you are asking
the model to check ACF and PACF plots to set the p, d and q parameters)
• Include Original Xreg : Value of a boolean flag indicating if the non-lagged predictors
should be included in the model. Default should be set to True
• True: Fit ARIMAX model on data using the matrix of predictors (Xi)
• False : Fit ARIMA model on data excluding the matrix of predictors (Xi)
• Include Intercept : Value of a boolean flag indicating if the model should be fit with
an intercept term. Default should be set to True
• True : The final equation (model) will have a constant term added
• False : The final equation (model) will not have any constant term added
• -> This is an adjustment factor which is constant over time , value of true/false depends on
the underlying business problem
Here intercept is minimum forecasted value considering all Xi=0
Other Default Parameters
Include Intercept :
For instance , below are
the examples of forecasts
with and without
intercept for rainfall
forecasting model :
Multicollinearity & Autocorrelation
• Multicollinearity means correlation between one or more predictors
• Variance Inflation Factor test is used to detect Multicollinearity in data
o For instance , VIF >5 depicts multicollinearity and hence one or more correlated variables
which are not significant for business should be dropped from the analysis
o Alternatively , predictors can be standardized([(x-min(x)/max(x)-min(x)] ) to reduce the
multicollinearity
• Auto correlated residuals mean a linear relationship between consecutive residuals
• To check autocorrelation Durbin–Watson test is conducted
o For instance, at 95% confidence interval, if p value <0.05 , then we conclude that auto
correlation exists in residuals. If p value >0.05 then auto correlation does not exist in residuals
Example
• The automatic approach will select ideal values of
Auto regression(p), differencing(d) and moving
average(q) parameters based on the data pattern
• For instance, if there is non stationarity in data, the
algorithm will apply differencing(d) by applying d=1 in
order to make it stationary
• In case of manual approach, user will select optimum
values of p, d and q parameters, which gives minimum
value for MAPE (Mean absolute percentage error) in
order to get better accuracy. This is a bit iterative
process as there may be many iterations involved till
the desired accuracy is achieved
• After the ARIMAX model is run, it will provide
forecasted values of target variable(GDP) for user
specified periods ahead , let’s say 5 as shown in blue
text in table: Forecasted values
Actual GDP (Trillion)
Years GDP
Y1 0.35
Y2 0.38
Y3 0.39
Y4 0.40
Y5 0.44
Y6 0.50
Y7 0.58
Y8 0.60
Y9 0.64
Y10 0.70
Forecasted GDP (Trillion)
Y11 0.82
Y12 0.94
Y13 1.00
Y14 1.22
Y15 1.42
Model Accuracy
• Along with forecasted values, MAPE and prediction accuracy is displayed so user knows how much accurate the forecasts
are
• How MAPE and Accuracy is calculated is explained below :
• Using this formula , MAPE can be calculated for 5 years ahead forecasts using recent most 5 years’ actual and predicted
data as shown in table below:
MAPE :
Where Yt is the actual, known series value for time period t,
Y^t is the forecast value of the variable Y for time period t
N is number of observations
MAPE = 7%
Hence accuracy = 100-MAPE = 93% , So model is accurate
Years Actual Y Predicted Y^ Abs((Actual - predicted)/actual)*100
Y11 0.5 0.49 2.00
Y12 0.58 0.57 1.72
Y13 0.6 0.61 1.67
Y14 0.64 0.64 0.00
Y15 0.7 0.69 1.43
MAPE = Sum(Y11 to Y15) =07
Accuracy =100-MAPE=93
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

What's hot (20)

Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptx
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arima
 
Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1
 
Analysis of Time Series
Analysis of Time SeriesAnalysis of Time Series
Analysis of Time Series
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 
Time series-ppts.ppt
Time series-ppts.pptTime series-ppts.ppt
Time series-ppts.ppt
 
1634 time series and trend analysis
1634 time series and trend analysis1634 time series and trend analysis
1634 time series and trend analysis
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statistics
 
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
 
Time series.ppt
Time series.pptTime series.ppt
Time series.ppt
 
Lesson 4 ar-ma
Lesson 4 ar-maLesson 4 ar-ma
Lesson 4 ar-ma
 
AR model
AR modelAR model
AR model
 
Time Series Analysis with R
Time Series Analysis with RTime Series Analysis with R
Time Series Analysis with R
 
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 

Similar to What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?

Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model
AkarshAvinash
 
Ts 16949 quality management system
Ts 16949 quality management systemTs 16949 quality management system
Ts 16949 quality management system
selinasimpson381
 
Construction quality management
Construction quality managementConstruction quality management
Construction quality management
selinasimpson0401
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentation
selinasimpson1501
 
Training quality management
Training quality managementTraining quality management
Training quality management
selinasimpson371
 

Similar to What is ARIMAX Forecasting and How is it Used for Enterprise Analysis? (20)

What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
 
Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdf
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
Dynamic Pricing Of stocks
Dynamic Pricing Of stocksDynamic Pricing Of stocks
Dynamic Pricing Of stocks
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Ts 16949 quality management system
Ts 16949 quality management systemTs 16949 quality management system
Ts 16949 quality management system
 
working with python
working with pythonworking with python
working with python
 
Construction quality management
Construction quality managementConstruction quality management
Construction quality management
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentation
 
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutEnhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
 
ARIMA.pptx
ARIMA.pptxARIMA.pptx
ARIMA.pptx
 
Japanese quality management
Japanese quality managementJapanese quality management
Japanese quality management
 
Management and quality
Management and qualityManagement and quality
Management and quality
 
Training quality management
Training quality managementTraining quality management
Training quality management
 
Diploma quality management
Diploma quality managementDiploma quality management
Diploma quality management
 
Quality management training
Quality management trainingQuality management training
Quality management training
 
Forecasting (1)
Forecasting (1)Forecasting (1)
Forecasting (1)
 
System quality management
System quality managementSystem quality management
System quality management
 

More from Smarten Augmented Analytics

Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?

  • 1. TIME SERIES FORECASTING AUTOREGRESSIVE INTEGRATED MOVING AVERAGE WITH EXOGENOUS VARIABLES (ARIMAX) A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y T o w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Introduction • An Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) model can be viewed as a multiple regression model with one or more autoregressive (AR) terms and/or one or more moving average (MA) terms • This method is suitable for forecasting when data is stationary/non stationary, Multivariate and has any type of data pattern : level/trend /seasonality/cyclicity • ARIMAX is simply an ARIMA with additional explanatory variables in categorical and/or numeric format
  • 4. Example Let’s take an example of year wise GDP values of India As shown in figure below, the plot of these data suggests that this is non stationary data with upward trend Hence, we can choose ARIMAX algorithm for forecasting GDP as there would be more than one variable affecting the GDP Actual GDP (Trillion) Years GDP Y1 0.35 Y2 0.38 Y3 0.39 Y4 0.40 Y5 0.44 Y6 0.50 Y7 0.58 Y8 0.60 Y9 0.64 Y10 0.70 Forecasted GDP (Trillion) Y11 0.82 Y12 0.94 Y13 1.00 Y14 1.22 Y15 1.42
  • 6. Standard tuning parameters Model parameters : In ARIMA, there are mainly three parameters to be set to fit the model : •p: This is the component to apply autoregressive model on series •d: This is the component to apply differencing on series : Basically it converts non stationary data to stationary ( stationary series : the series remains at a fairly constant level over time) •q: This is the component to apply moving average model on series Model approach In ARIMAX, there are two approaches to fitting a model : Automatic and Manual When p, d and q are automatically selected by system than it’s called automatic approach and when p, d and q are manually input by user than it’s called manual approach For better forecasts, automatic approach should be chosen, as in this approach, model automatically selects and applies the right parameters based on the nature of data Forecast period For both type of approaches , user has to input the forecast period value For example, if user wants to predict the sales value for 10 periods ahead then this value should be input as 10 Note : Refer calculations section to understand the model parameters
  • 8. Sample UI For Selecting Inputs And Applying Tuning Parameters Select the variable you would like to Forecast Year GDP Consumer Inflation Wholesale Inflation Industrial Index of Production 4 1 In step 3 , user can select more than one predictor In step 4 , if user changes the approach to Manual then this box should be displayed, with additional provision to set p ,d ,q values Tuning parameters Approach Forecast Period Automatic Approach Forecast Period AR(p) I(d) MA(q) Manual By default this box should be displayed with default approach as Automatic. In this case parameters to fit ARIMAX will be automatically detected and applied by algorithm Select the time stamp Year GDP Consumer Inflation Wholesale Inflation Industrial Index of Production Select the predictors Year GDP Consumer Inflation Wholesale Inflation Industrial Index of Production 3
  • 9. Sample UI For Output MAPE should not exceed beyond 10 % as it represents the margin of error in forecasting Accuracy shows how much accurate the forecasts are, ideally it should be greater than or equal to 90% else there is a need to revise and fine tune the model (apply some transformations on input data , check if basic assumptions of ARIMAX are met, etc.) Actual GDP (Trillion) Years GDP Y1 0.35 Y2 0.38 Y3 0.39 Y4 0.40 Y5 0.44 Y6 0.50 Y7 0.58 Y8 0.60 Y9 0.64 Y10 0.70 Forecasted GDP (Trillion) Y11 0.82 Y12 0.94 Y13 1.00 Y14 1.22 Y15 1.42 Output will be forecasted values based on user specified time period along with line charts showing actual and forecasted series and prediction accuracy
  • 11. Limitations It is based on an assumption of linear relationship between the predictors (Xi) and the target variable(Y) i.e. the scatter plot of each predictor versus target variable should be nearly as shown in the figures 1 & 2 in right Furthermore, there should not be multicollinearity in data • Multicollinearity generally occurs when there are high correlations between two or more predictor variables • Examples of correlated predictor variables (also called multicollinear predictors) are: a person’s height and weight, age and sales price of a car, or years of education and annual income • An easy way to detect multicollinearity is to calculate correlation coefficients for all pairs of predictor variables, if it is close to or exactly 1 then one of the predictors should be removed from the model if at all possible Note : Refer calculations section to understand Multicollinearity & Autocorrelation Figure 1 Figure 2
  • 12. Limitations The Forecast error also known as “Residuals” should show nearly constant trend over time i.e. it should be time independent as shown in the figure 1 below in contrast to the increasing/ decreasing trend shown in figure 2 below: Note : Refer calculations section to understand Multicollinearity & Autocorrelation Time dependent error ( decreasing with time)Time independent error ( fairly constant over time & lying within certain range) Figure 1 Figure 2
  • 14. Business use case Business benefit: •For various combination of GDP/Consumer Inflation and Population growth rates , company would be able to forecast its product growth •Moreover , company can analyze the gap between targeted and estimated growth and decide upon the strategy to reduce this gap and achieve desired results Business problem : •A company wants to forecast its product line growth for next couple of years based on past 30 years’ yearly data •The predictor variables in this case would be as follows: •Yearly consumer inflation rate •Yearly GDP data •Yearly population growth rate •Data pattern : Input data exhibits non stationarity , an upward trend pattern as well as seasonality
  • 16. Calculations - Autoregression (AR) • In an autoregressive model, which is one of the components in ARIMAX model, we forecast the variable of interest using a linear combination of past values of the variable • The term autoregression indicates that it is a regression of the variable against itself • An autoregressive model of order p, denoted by AR(p) model, can be written as where , c is a constant, ∅ is lag’s coefficient, 𝐞 𝒕 is an error term, 𝐩 is autoregressive model of order • This is like a multiple regression but with lagged values of yt as predictors • Order of this component (order of autoregression : AR) is given by parameter p while fitting the model : ARIMAX (p,d, q) Lagged values : past values of the variable
  • 17. Calculations - Integration (I) / Differencing (d) • The second component of ARIMAX model i.e. I (for "integration") , is used to replace the series with the difference between their current values and the previous values (and this differencing process can be performed more than once as per the requirement ) • For example, • The equation for first order differencing is 𝒚 𝒕 = 𝒚 𝒕 − 𝒚 𝒕−𝟏 • Hence, for 𝒚 𝒕 =2 and 𝒚 𝒕−𝟏 = 1 ; 𝒚 𝒕 will be 1 • Similarly second order differencing , 𝒚 𝒕 = (𝒚 𝒕 − 𝒚 𝒕−𝟏) −(𝒚 𝒕−𝟏 − 𝒚 𝒕−𝟐) • Order of this component (order of differencing) is applied by parameter d while fitting a model : ARIMAX (p,d,q)
  • 18. Calculations - Moving average (MA) • A moving average model, the third component in ARIMAX uses past forecast errors as a series in a model • A Moving average model of order q, denoted by MA(q) model, can be written as where , yt is predictor , c is a constant, θ is lag’s coefficient, 𝐞 𝒕 is an error term, q is moving average order • Order of this component (order of moving average : MA ) is applied by parameter q while fitting a model : ARIMAX (p ,d ,q )
  • 19. Calculations - Exogenous variables (X) • ARIMAX is the simply an ARIMA model with the inclusion of exogenous variables (additional explanatory variables/predictors) • It means you simply add one or more explanatory variables/ regressors to the forecasting equation • For example, predictors such as Consumer Price Index , Producer Price Index and Employment Statistics which directly/indirectly impacts the GDP can be considered as exogenous variables to forecast the GDP using ARIMAX
  • 20. Identification of p,d,q values • Values of p and q are determined based on the autocorrelation(ACF) and partial auto correlation(PACF) plots and value of d depends on level of stationarity in data • In PACF plot, number of spikes indicate the order of the autoregression/AR (value of p in ARIMAX(p,d,q)) • For instance, as you can see in the right figure below, there is one spike falling out of range, hence, the order of AR i.e. value of p would be 1 • In ACF plot, number of spikes indicate the order of the moving average (value of q in ARIMAX(p,d,q)) • For instance , as you can see in the left figure there are five spikes falling out of range, hence, the order of MA i.e. value of q would be 5
  • 21. Identification of p,d,q values  Thus, p, d and q parameters in ARIMAX(p , d , q) are substituted with integer values where p and q take any values between 0 to 5 and value of d is set between 0 to 2  For example, ARIMAX(2,1,1) means that you have a second order autoregressive model with a first order moving average component and series has been differenced once to induce stationarity  A value of 0 can be used for any of the above mentioned parameters indicating that particular component (AR/ I/ MA) should not be used. This way, the ARIMAX model can be configured to perform the function of an ARMAX model, and even a simple AR, I, or MA model depending on the data
  • 22. Other default parameters • Below are the other default parameters while taking manual approach of fitting the model : • Max Lag : The maximum lag order should be set to 20 (up to which lag you are asking the model to check ACF and PACF plots to set the p, d and q parameters) • Include Original Xreg : Value of a boolean flag indicating if the non-lagged predictors should be included in the model. Default should be set to True • True: Fit ARIMAX model on data using the matrix of predictors (Xi) • False : Fit ARIMA model on data excluding the matrix of predictors (Xi) • Include Intercept : Value of a boolean flag indicating if the model should be fit with an intercept term. Default should be set to True • True : The final equation (model) will have a constant term added • False : The final equation (model) will not have any constant term added • -> This is an adjustment factor which is constant over time , value of true/false depends on the underlying business problem Here intercept is minimum forecasted value considering all Xi=0
  • 23. Other Default Parameters Include Intercept : For instance , below are the examples of forecasts with and without intercept for rainfall forecasting model :
  • 24. Multicollinearity & Autocorrelation • Multicollinearity means correlation between one or more predictors • Variance Inflation Factor test is used to detect Multicollinearity in data o For instance , VIF >5 depicts multicollinearity and hence one or more correlated variables which are not significant for business should be dropped from the analysis o Alternatively , predictors can be standardized([(x-min(x)/max(x)-min(x)] ) to reduce the multicollinearity • Auto correlated residuals mean a linear relationship between consecutive residuals • To check autocorrelation Durbin–Watson test is conducted o For instance, at 95% confidence interval, if p value <0.05 , then we conclude that auto correlation exists in residuals. If p value >0.05 then auto correlation does not exist in residuals
  • 25. Example • The automatic approach will select ideal values of Auto regression(p), differencing(d) and moving average(q) parameters based on the data pattern • For instance, if there is non stationarity in data, the algorithm will apply differencing(d) by applying d=1 in order to make it stationary • In case of manual approach, user will select optimum values of p, d and q parameters, which gives minimum value for MAPE (Mean absolute percentage error) in order to get better accuracy. This is a bit iterative process as there may be many iterations involved till the desired accuracy is achieved • After the ARIMAX model is run, it will provide forecasted values of target variable(GDP) for user specified periods ahead , let’s say 5 as shown in blue text in table: Forecasted values Actual GDP (Trillion) Years GDP Y1 0.35 Y2 0.38 Y3 0.39 Y4 0.40 Y5 0.44 Y6 0.50 Y7 0.58 Y8 0.60 Y9 0.64 Y10 0.70 Forecasted GDP (Trillion) Y11 0.82 Y12 0.94 Y13 1.00 Y14 1.22 Y15 1.42
  • 26. Model Accuracy • Along with forecasted values, MAPE and prediction accuracy is displayed so user knows how much accurate the forecasts are • How MAPE and Accuracy is calculated is explained below : • Using this formula , MAPE can be calculated for 5 years ahead forecasts using recent most 5 years’ actual and predicted data as shown in table below: MAPE : Where Yt is the actual, known series value for time period t, Y^t is the forecast value of the variable Y for time period t N is number of observations MAPE = 7% Hence accuracy = 100-MAPE = 93% , So model is accurate Years Actual Y Predicted Y^ Abs((Actual - predicted)/actual)*100 Y11 0.5 0.49 2.00 Y12 0.58 0.57 1.72 Y13 0.6 0.61 1.67 Y14 0.64 0.64 0.00 Y15 0.7 0.69 1.43 MAPE = Sum(Y11 to Y15) =07 Accuracy =100-MAPE=93
  • 27. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018