Automatic algorithms for time series forecasting

1. Rob J Hyndman Automatic algorithms for time series forecasting

2. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Motivation 2

3. Motivation Automatic algorithms for time series forecasting Motivation 3

8. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Automatic algorithms for time series forecasting Motivation 4

9. Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. Speciﬁcations Automatic forecasting algorithms must: ¯ determine an appropriate time series model; ¯ estimate the parameters; ¯ compute the forecasts with prediction intervals. Automatic algorithms for time series forecasting Motivation 4

10. Example: Asian sheep Automatic algorithms for time series forecasting Motivation 5 Numbers of sheep in Asia Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550

11. Example: Asian sheep Automatic algorithms for time series forecasting Motivation 5 Automatic ETS forecasts Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550

12. Example: Cortecosteroid sales Automatic algorithms for time series forecasting Motivation 6 Monthly cortecosteroid drug sales in Australia Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4

13. Example: Cortecosteroid sales Automatic algorithms for time series forecasting Motivation 6 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6

14. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Forecasting competitions 7

15. Makridakis and Hibon (1979) Automatic algorithms for time series forecasting Forecasting competitions 8

16. Makridakis and Hibon (1979) Automatic algorithms for time series forecasting Forecasting competitions 8

17. Makridakis and Hibon (1979) This was the ﬁrst large-scale empirical evaluation of time series forecasting methods. Highly controversial at the time. Difﬁculties: How to measure forecast accuracy? How to apply methods consistently and objectively? How to explain unexpected results? Common thinking was that the more sophisticated mathematical models (ARIMA models at the time) were necessarily better. If results showed ARIMA models not best, it must be because analyst was unskilled. Automatic algorithms for time series forecasting Forecasting competitions 9

18. Makridakis and Hibon (1979) It is amazing to me, however, that after all this exercise in identifying models, transforming and so on, that the autoregressive moving averages come out so badly. I wonder whether it might be partly due to the authors not using the backwards forecasting approach to obtain the initial errors. — W.G. Gilchrist I ﬁnd it hard to believe that Box-Jenkins, if properly applied, can actually be worse than so many of the simple methods . . . these authors are more at home with simple procedures than with Box-Jenkins. — C. Chatﬁeld Automatic algorithms for time series forecasting Forecasting competitions 10

19. Makridakis and Hibon (1979) It is amazing to me, however, that after all this exercise in identifying models, transforming and so on, that the autoregressive moving averages come out so badly. I wonder whether it might be partly due to the authors not using the backwards forecasting approach to obtain the initial errors. — W.G. Gilchrist I ﬁnd it hard to believe that Box-Jenkins, if properly applied, can actually be worse than so many of the simple methods . . . these authors are more at home with simple procedures than with Box-Jenkins. — C. Chatﬁeld Automatic algorithms for time series forecasting Forecasting competitions 10

20. Consequences of MH (1979) As a result of this paper, researchers started to: ¯ consider how to automate forecasting methods; ¯ study what methods give the best forecasts; ¯ be aware of the dangers of over-ﬁtting; ¯ treat forecasting as a different problem from time series analysis. Makridakis Hibon followed up with a new competition in 1982: 1001 series Anyone could submit forecasts (avoiding the charge of incompetence) Multiple forecast measures used. Automatic algorithms for time series forecasting Forecasting competitions 11

21. Consequences of MH (1979) As a result of this paper, researchers started to: ¯ consider how to automate forecasting methods; ¯ study what methods give the best forecasts; ¯ be aware of the dangers of over-ﬁtting; ¯ treat forecasting as a different problem from time series analysis. Makridakis Hibon followed up with a new competition in 1982: 1001 series Anyone could submit forecasts (avoiding the charge of incompetence) Multiple forecast measures used. Automatic algorithms for time series forecasting Forecasting competitions 11

22. M-competition Automatic algorithms for time series forecasting Forecasting competitions 12

23. M-competition Main ﬁndings (taken from Makridakis Hibon, 2000) 1 Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones. 2 The relative ranking of the performance of the various methods varies according to the accuracy measure being used. 3 The accuracy when various methods are being combined outperforms, on average, the individual methods being combined and does very well in comparison to other methods. 4 The accuracy of the various methods depends upon the length of the forecasting horizon involved. Automatic algorithms for time series forecasting Forecasting competitions 13

24. M3 competition Automatic algorithms for time series forecasting Forecasting competitions 14

25. Makridakis and Hibon (2000) “The M3-Competition is a final attempt by the authors to settle the accuracy issue of various time series methods. . . The extension involves the inclusion of more methods/ researchers (in particular in the areas of neural networks and expert systems) and more series.” 3003 series All data from business, demography, finance and economics. Series length between 14 and 126. Either non-seasonal, monthly or quarterly. All time series positive. MH claimed that the M3-competition supported the findings of their earlier work. However, best performing methods far from “simple”. Automatic algorithms for time series forecasting Forecasting competitions 15

26. Makridakis and Hibon (2000) Best methods: Theta A very confusing explanation. Shown by Hyndman and Billah (2003) to be average of linear regression and simple exponential smoothing with drift, applied to seasonally adjusted data. Later, the original authors claimed that their explanation was incorrect. Forecast Pro A commercial software package with an unknown algorithm. Known to ﬁt either exponential smoothing or ARIMA models using BIC. Automatic algorithms for time series forecasting Forecasting competitions 16

27. M3 results (recalculated) Method MAPE sMAPE MASE Theta 17.42 12.76 1.39 ForecastPro 18.00 13.06 1.47 ForecastX 17.35 13.09 1.42 Automatic ANN 17.18 13.98 1.53 B-J automatic 19.13 13.72 1.54 Automatic algorithms for time series forecasting Forecasting competitions 17

28. M3 results (recalculated) Method MAPE sMAPE MASE Theta 17.42 12.76 1.39 ForecastPro 18.00 13.06 1.47 ForecastX 17.35 13.09 1.42 Automatic ANN 17.18 13.98 1.53 B-J automatic 19.13 13.72 1.54 Automatic algorithms for time series forecasting Forecasting competitions 17 ® Calculations do not match published paper. ® Some contestants apparently submitted multiple entries but only best ones published.

29. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Exponential smoothing 18

30. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M Automatic algorithms for time series forecasting Exponential smoothing 19

31. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 19

32. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Automatic algorithms for time series forecasting Exponential smoothing 19

33. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method Automatic algorithms for time series forecasting Exponential smoothing 19

34. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Automatic algorithms for time series forecasting Exponential smoothing 19

35. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method Automatic algorithms for time series forecasting Exponential smoothing 19

36. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method Automatic algorithms for time series forecasting Exponential smoothing 19

37. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M N,N: Simple exponential smoothing A,N: Holt’s linear method Ad,N: Additive damped trend method M,N: Exponential trend method Md,N: Multiplicative damped trend method A,A: Additive Holt-Winters’ method A,M: Multiplicative Holt-Winters’ method Automatic algorithms for time series forecasting Exponential smoothing 19

38. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exp. smoothing methods. Automatic algorithms for time series forecasting Exponential smoothing 19

39. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exp. smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Automatic algorithms for time series forecasting Exponential smoothing 19

40. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exp. smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Only 19 models are numerically stable. Automatic algorithms for time series forecasting Exponential smoothing 19

41. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M There are 15 separate exp. smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Only 19 models are numerically stable. Multiplicative trend models give poor forecasts leaving 15 models. Automatic algorithms for time series forecasting Exponential smoothing 19

42. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20

43. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20

44. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20

45. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20

46. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20

47. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20

48. Exponential smoothing methods Seasonal Component Trend N A M Component (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad,N Ad,A Ad,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md,N Md,A Md,M General notation E T S : ExponenTial Smoothing ↑ Error Trend Seasonal Examples: A,N,N: Simple exponential smoothing with additive errors A,A,N: Holt’s linear method with additive errors M,A,M: Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 20 Innovations state space models ¯ All ETS models can be written in innovations state space form (IJF, 2002). ¯ Additive and multiplicative versions give the same point forecasts but different prediction intervals.

49. ETS state space model xt−1 εt yt Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

50. ETS state space model xt−1 εt yt xt Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

51. ETS state space model xt−1 εt yt xt yt+1 εt+1 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

52. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

53. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 yt+2 εt+2 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

54. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt+2 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

55. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt+2 yt+3 εt+3 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

56. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt+2 yt+3 εt+3 xt+3 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

57. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt+2 yt+3 εt+3 xt+3 yt+4 εt+4 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal)

58. ETS state space model xt−1 εt yt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt+2 yt+3 εt+3 xt+3 yt+4 εt+4 Automatic algorithms for time series forecasting Exponential smoothing 21 State space model xt = (level, slope, seasonal) Estimation Compute likelihood L from ε1, ε2, . . . , εT. Optimize L wrt model parameters.

59. Innovations state space models Let xt = ( t, bt, st, st−1, . . . , st−m+1) and εt iid ∼ N(0, σ2 ). yt = h(xt−1) + k(xt−1)εt Observation equation µt et xt = f(xt−1) + g(xt−1)εt State equation Additive errors: k(xt−1) = 1. yt = µt + εt. Multiplicative errors: k(xt−1) = µt. yt = µt(1 + εt). εt = (yt − µt)/µt is relative error. Automatic algorithms for time series forecasting Exponential smoothing 22

60. Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation L∗ (θ, x0) = n log n t=1 ε2 t /k2 (xt−1) + 2 n t=1 log |k(xt−1)| = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = ( 0, b0, s0, s−1, . . . , s−m+1). Automatic algorithms for time series forecasting Exponential smoothing 23

65. Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation L∗ (θ, x0) = n log n t=1 ε2 t /k2 (xt−1) + 2 n t=1 log |k(xt−1)| = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = ( 0, b0, s0, s−1, . . . , s−m+1). Automatic algorithms for time series forecasting Exponential smoothing 23 Q: How to choose between the 15 useful ETS models?

66. Cross-validation Traditional evaluation Automatic algorithms for time series forecasting Exponential smoothing 24 q q q q q q q q q q q q q q q q q q q q q q q q q time Training data Test data

67. Cross-validation Traditional evaluation Standard cross-validation Automatic algorithms for time series forecasting Exponential smoothing 24 q q q q q q q q q q q q q q q q q q q q q q q q q time Training data Test data q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q qqq qq q q q q q q q q q q q q q q q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q q q qq q qq q

68. Cross-validation Traditional evaluation Standard cross-validation Time series cross-validation Automatic algorithms for time series forecasting Exponential smoothing 24 q q q q q q q q q q q q q q q q q q q q q q q q q time Training data Test data q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q qqq qq q q q q q q q q q q q q q q q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

69. Cross-validation Traditional evaluation Standard cross-validation Time series cross-validation Automatic algorithms for time series forecasting Exponential smoothing 24 q q q q q q q q q q q q q q q q q q q q q q q q q time Training data Test data q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q qqq qq q q q q q q q q q q q q q q q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

70. Cross-validation Traditional evaluation Standard cross-validation Time series cross-validation Automatic algorithms for time series forecasting Exponential smoothing 24 q q q q q q q q q q q q q q q q q q q q q q q q q time Training data Test data q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q qqq qq q q q q q q q q q q q q q q q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Also known as “Evaluation on a rolling forecast origin”

71. Akaike’s Information Criterion AIC = −2 log(L) + 2k where L is the likelihood and k is the number of estimated parameters in the model. This is a penalized likelihood approach. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asymptotically equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via time series cross-validation. Automatic algorithms for time series forecasting Exponential smoothing 25

76. Akaike’s Information Criterion AIC = −2 log(L) + 2k Corrected AIC For small T, AIC tends to over-ﬁt. Bias-corrected version: AICC = AIC + 2(k+1)(k+2) T−k Bayesian Information Criterion BIC = AIC + k[log(T) − 2] BIC penalizes terms more heavily than AIC Minimizing BIC is consistent if there is a true model. Automatic algorithms for time series forecasting Exponential smoothing 26

79. What to use? Choice: AIC, AICc, BIC, CV-MSE CV-MSE too time consuming for most automatic forecasting purposes. Also requires large T. As T → ∞, BIC selects true model if there is one. But that is never true! AICc focuses on forecasting performance, can be used on small samples and is very fast to compute. Empirical studies in forecasting show AIC is better than BIC for forecast accuracy. Automatic algorithms for time series forecasting Exponential smoothing 27

84. ets algorithm in R Automatic algorithms for time series forecasting Exponential smoothing 28 Based on Hyndman, Koehler, Snyder Grose (IJF 2002): Apply each of 15 models that are appropriate to the data. Optimize parameters and initial values using MLE. Select best method using AICc. Produce forecasts using best method. Obtain prediction intervals using underlying state space model.

88. Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 29 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600

89. Exponential smoothing fit - ets(livestock) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting Exponential smoothing 30 Forecasts from ETS(M,A,N) Year millionsofsheep 1960 1970 1980 1990 2000 2010 300400500600

90. Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 31 Forecasts from ETS(M,N,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6

91. Exponential smoothing fit - ets(h02) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting Exponential smoothing 32 Forecasts from ETS(M,N,M) Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.41.6

92. Exponential smoothing fit ETS(M,N,M) Smoothing parameters: alpha = 0.4597 gamma = 1e-04 Initial states: l = 0.4501 s = 0.8628 0.8193 0.7648 0.7675 0.6946 1.2921 1.3327 1.1833 1.1617 1.0899 1.0377 0.9937 sigma: 0.0675 AIC AICc BIC -115.69960 -113.47738 -69.24592 Automatic algorithms for time series forecasting Exponential smoothing 33

93. M3 comparisons Method MAPE sMAPE MASE Theta 17.42 12.76 1.39 ForecastPro 18.00 13.06 1.47 ForecastX 17.35 13.09 1.42 Automatic ANN 17.18 13.98 1.53 B-J automatic 19.13 13.72 1.54 ETS 17.38 13.13 1.43 Automatic algorithms for time series forecasting Exponential smoothing 34

94. Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 35

95. Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 35 www.OTexts.org/fpp

96. Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 35

97. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting ARIMA modelling 36

98. ARIMA models yt−1 yt−2 yt−3 yt Inputs Output Automatic algorithms for time series forecasting ARIMA modelling 37

99. ARIMA models yt−1 yt−2 yt−3 εt yt Inputs Output Automatic algorithms for time series forecasting ARIMA modelling 37 Autoregression (AR) model

100. ARIMA models yt−1 yt−2 yt−3 εt εt−1 εt−2 yt Inputs Output Automatic algorithms for time series forecasting ARIMA modelling 37 Autoregression moving average (ARMA) model

101. ARIMA models yt−1 yt−2 yt−3 εt εt−1 εt−2 yt Inputs Output Automatic algorithms for time series forecasting ARIMA modelling 37 Autoregression moving average (ARMA) model Estimation Compute likelihood L from ε1, ε2, . . . , εT. Use optimization algorithm to maximize L.

102. ARIMA models yt−1 yt−2 yt−3 εt εt−1 εt−2 yt Inputs Output Automatic algorithms for time series forecasting ARIMA modelling 37 Autoregression moving average (ARMA) model Estimation Compute likelihood L from ε1, ε2, . . . , εT. Use optimization algorithm to maximize L. ARIMA model Autoregression moving average (ARMA) model applied to differences.

103. ARIMA modelling Automatic algorithms for time series forecasting ARIMA modelling 38

106. Auto ARIMA Automatic algorithms for time series forecasting ARIMA modelling 39 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550

107. Auto ARIMA fit - auto.arima(livestock) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting ARIMA modelling 40 Forecasts from ARIMA(0,1,0) with drift Year millionsofsheep 1960 1970 1980 1990 2000 2010 250300350400450500550

108. Auto ARIMA Automatic algorithms for time series forecasting ARIMA modelling 41 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4

109. Auto ARIMA fit - auto.arima(h02) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting ARIMA modelling 42 Forecasts from ARIMA(3,1,3)(0,1,1)[12] Year Totalscripts(millions) 1995 2000 2005 2010 0.40.60.81.01.21.4

110. Auto ARIMA fit Series: h02 ARIMA(3,1,3)(0,1,1)[12] Coefficients: ar1 ar2 ar3 ma1 ma2 ma3 sma1 -0.3648 -0.0636 0.3568 -0.4850 0.0479 -0.353 -0.5931 s.e. 0.2198 0.3293 0.1268 0.2227 0.2755 0.212 0.0651 sigma^2 estimated as 0.002706: log likelihood=290.25 AIC=-564.5 AICc=-563.71 BIC=-538.48 Automatic algorithms for time series forecasting ARIMA modelling 43

111. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Automatic algorithms for time series forecasting ARIMA modelling 44 Algorithm choices driven by forecast accuracy.

112. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AICc. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Automatic algorithms for time series forecasting ARIMA modelling 44 Algorithm choices driven by forecast accuracy.

113. How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AICc. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Automatic algorithms for time series forecasting ARIMA modelling 44 Algorithm choices driven by forecast accuracy.

114. How does auto.arima() work? A seasonal ARIMA process Φ(Bm )φ(B)(1 − B)d (1 − Bm )D yt = c + Θ(Bm )θ(B)εt Need to select appropriate orders p, q, d, P, Q, D, and whether to include c. Hyndman Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select D using OCSB unit root test. Select p, q, P, Q, c by minimising AICc. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Automatic algorithms for time series forecasting ARIMA modelling 45

115. M3 comparisons Method MAPE sMAPE MASE Theta 17.42 12.76 1.39 ForecastPro 18.00 13.06 1.47 B-J automatic 19.13 13.72 1.54 ETS 17.38 13.13 1.43 AutoARIMA 19.12 13.85 1.47 Automatic algorithms for time series forecasting ARIMA modelling 46

116. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 47

117. Automatic nonlinear forecasting Automatic ANN in M3 competition did poorly. Linear methods did best in the NN3 competition! Very few machine learning methods get published in the IJF because authors cannot demonstrate their methods give better forecasts than linear benchmark methods, even on supposedly nonlinear data. Some good recent work by Kourentzes and Crone on automated ANN for time series. Watch this space! Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48

122. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Time series with complex seasonality 49

123. Examples Automatic algorithms for time series forecasting Time series with complex seasonality 50 US finished motor gasoline products Weeks Thousandsofbarrelsperday 1992 1994 1996 1998 2000 2002 2004 6500700075008000850090009500

124. Examples Automatic algorithms for time series forecasting Time series with complex seasonality 50 Number of calls to large American bank (7am−9pm) 5 minute intervals Numberofcallarrivals 100200300400 3 March 17 March 31 March 14 April 28 April 12 May

125. Examples Automatic algorithms for time series forecasting Time series with complex seasonality 50 Turkish electricity demand Days Electricitydemand(GW) 2000 2002 2004 2006 2008 10152025

126. TBATS model TBATS Trigonometric terms for seasonality Box-Cox transformations for heterogeneity ARMA errors for short-term dynamics Trend (possibly damped) Seasonal (including multiple and non-integer periods) Automatic algorithm described in AM De Livera, RJ Hyndman, and RD Snyder (2011). “Forecasting time series with complex seasonal patterns using exponential smoothing”. Journal of the American Statistical Association 106(496), 1513–1527. Automatic algorithms for time series forecasting Time series with complex seasonality 51

127. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt

128. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt Box-Cox transformation

129. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt Box-Cox transformation M seasonal periods

130. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt Box-Cox transformation M seasonal periods global and local trend

131. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt Box-Cox transformation M seasonal periods global and local trend ARMA error

132. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt Box-Cox transformation M seasonal periods global and local trend ARMA error Fourier-like seasonal terms

133. TBATS model yt = observation at time t y (ω) t = (yω t − 1)/ω if ω = 0; log yt if ω = 0. y (ω) t = t−1 + φbt−1 + M i=1 s (i) t−mi + dt t = t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + βdt dt = p i=1 φidt−i + q j=1 θjεt−j + εt s (i) t = ki j=1 s (i) j,t Automatic algorithms for time series forecasting Time series with complex seasonality 52 s (i) j,t = s (i) j,t−1 cos λ (i) j + s ∗(i) j,t−1 sin λ (i) j + γ (i) 1 dt s (i) j,t = −s (i) j,t−1 sin λ (i) j + s ∗(i) j,t−1 cos λ (i) j + γ (i) 2 dt Box-Cox transformation M seasonal periods global and local trend ARMA error Fourier-like seasonal terms TBATS Trigonometric Box-Cox ARMA Trend Seasonal

134. Examples fit - tbats(gasoline) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting Time series with complex seasonality 53 Forecasts from TBATS(0.999, {2,2}, 1, {52.1785714285714,8}) Weeks Thousandsofbarrelsperday 1995 2000 2005 70008000900010000

135. Examples fit - tbats(callcentre) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting Time series with complex seasonality 54 Forecasts from TBATS(1, {3,1}, 0.987, {169,5, 845,3}) 5 minute intervals Numberofcallarrivals 0100200300400500 3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June

136. Examples fit - tbats(turk) fcast - forecast(fit) plot(fcast) Automatic algorithms for time series forecasting Time series with complex seasonality 55 Forecasts from TBATS(0, {5,3}, 0.997, {7,3, 354.37,12, 365.25,4}) Days Electricitydemand(GW) 2000 2002 2004 2006 2008 2010 10152025

137. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Hierarchical and grouped time series 56

138. Hierarchical time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Total A AA AB AC B BA BB BC C CA CB CC Examples Net labour turnover Tourism by state and region Automatic algorithms for time series forecasting Hierarchical and grouped time series 57

141. Hierarchical time series Total A B C Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t.

142. Hierarchical time series Total A B C Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t.

143. Hierarchical time series Total A B C yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1       YA,t YB,t YC,t   Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t.

144. Hierarchical time series Total A B C yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t.

145. Hierarchical time series Total A B C yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   bt Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t.

146. Hierarchical time series Total A B C yt = [Yt, YA,t, YB,t, YC,t] =     1 1 1 1 0 0 0 1 0 0 0 1     S   YA,t YB,t YC,t   bt yt = Sbt Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t.

147. Hierarchical time series Total A AX AY AZ B BX BY BZ C CX CY CZ yt =             Yt YA,t YB,t YC,t YAX,t YAY,t YAZ,t YBX,t YBY,t YBZ,t YCX,t YCY,t YCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        YAX,t YAY,t YAZ,t YBX,t YBY,t YBZ,t YCX,t YCY,t YCZ,t        bt Automatic algorithms for time series forecasting Hierarchical and grouped time series 59

148. Hierarchical time series Total A AX AY AZ B BX BY BZ C CX CY CZ yt =             Yt YA,t YB,t YC,t YAX,t YAY,t YAZ,t YBX,t YBY,t YBZ,t YCX,t YCY,t YCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        YAX,t YAY,t YAZ,t YBX,t YBY,t YBZ,t YCX,t YCY,t YCZ,t        bt Automatic algorithms for time series forecasting Hierarchical and grouped time series 59

149. Hierarchical time series Total A AX AY AZ B BX BY BZ C CX CY CZ yt =             Yt YA,t YB,t YC,t YAX,t YAY,t YAZ,t YBX,t YBY,t YBZ,t YCX,t YCY,t YCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        YAX,t YAY,t YAZ,t YBX,t YBY,t YBZ,t YCX,t YCY,t YCZ,t        bt Automatic algorithms for time series forecasting Hierarchical and grouped time series 59 yt = Sbt

150. Forecasting notation Let ˆyn(h) be vector of initial h-step forecasts, made at time n, stacked in same order as yt. (They may not add up.) Reconciled forecasts are of the form: ˜yn(h) = SPˆyn(h) for some matrix P. P extracts and combines base forecasts ˆyn(h) to get bottom-level forecasts. S adds them up Automatic algorithms for time series forecasting Hierarchical and grouped time series 60

155. General properties ˜yn(h) = SPˆyn(h) Forecast bias Assuming the base forecasts ˆyn(h) are unbiased, then the revised forecasts are unbiased iff SPS = S. Forecast variance For any given P satisfying SPS = S, the covariance matrix of the h-step ahead reconciled forecast errors is given by Var[yn+h − ˜yn(h)] = SPWhP S where Wh is the covariance matrix of the h-step ahead base forecast errors. Automatic algorithms for time series forecasting Hierarchical and grouped time series 61

158. BLUF via trace minimization Theorem For any P satisfying SPS = S, then min P = trace[SPWhP S ] has solution P = (S W † hS)−1 S W † h. W † h is generalized inverse of Wh. ˜yn(h) = S(S W † hS)−1 S W † h ˆyn(h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ˆyn(h) = Sβn(h) + εh where ε ∼ N(0, Wh). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

164. Optimal combination forecasts Revised forecasts Base forecasts Solution 1: OLS ˜yn(h) = S(S S)−1 S ˆyn(h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = khW1. Easy to estimate, and places weight where we have best one-step forecasts. ˜yn(h) = S(S ΛS)−1 S Λˆyn(h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 63 ˜yn(h) = S(S W † hS)−1 S W † h ˆyn(h)

171. Challenges Computational difﬁculties in big hierarchies due to size of the S matrix and singular behavior of (S ΛS). Loss of information in ignoring covariance matrix in computing point forecasts. Still need to estimate covariance matrix to produce prediction intervals. Automatic algorithms for time series forecasting Hierarchical and grouped time series 64 ˜yn(h) = S(S ΛS)−1 S Λˆyn(h)

174. Australian tourism Automatic algorithms for time series forecasting Hierarchical and grouped time series 65

175. Australian tourism Automatic algorithms for time series forecasting Hierarchical and grouped time series 65 Hierarchy: States (7) Zones (27) Regions (82)

176. Australian tourism Automatic algorithms for time series forecasting Hierarchical and grouped time series 65 Hierarchy: States (7) Zones (27) Regions (82) Base forecasts ETS (exponential smoothing) models

177. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: Total Year Visitornights 1998 2000 2002 2004 2006 2008 600006500070000750008000085000

178. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: NSW Year Visitornights 1998 2000 2002 2004 2006 2008 18000220002600030000

179. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: VIC Year Visitornights 1998 2000 2002 2004 2006 2008 1000012000140001600018000

180. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: Nth.Coast.NSW Year Visitornights 1998 2000 2002 2004 2006 2008 50006000700080009000

181. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: Metro.QLD Year Visitornights 1998 2000 2002 2004 2006 2008 800090001100013000

182. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: Sth.WA Year Visitornights 1998 2000 2002 2004 2006 2008 400600800100012001400

183. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: X201.Melbourne Year Visitornights 1998 2000 2002 2004 2006 2008 40004500500055006000

184. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: X402.Murraylands Year Visitornights 1998 2000 2002 2004 2006 2008 0100200300

185. Base forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 66 Domestic tourism forecasts: X809.Daly Year Visitornights 1998 2000 2002 2004 2006 2008 020406080100

186. Reconciled forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 67 Total 2000 2005 2010 650008000095000

187. Reconciled forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 67 NSW 2000 2005 2010 180002400030000 VIC 2000 2005 2010 100001400018000 QLD 2000 2005 2010 1400020000 Other 2000 2005 2010 1800024000

188. Reconciled forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 67 Sydney 2000 2005 2010 40007000 OtherNSW 2000 2005 2010 1400022000 Melbourne 2000 2005 2010 40005000 OtherVIC 2000 2005 2010 600012000 GCandBrisbane 2000 2005 2010 60009000 OtherQLD 2000 2005 2010 600012000 Capitalcities 2000 2005 2010 1400020000 Other 2000 2005 2010 55007500

189. Forecast evaluation Select models using all observations; Re-estimate models using ﬁrst 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Automatic algorithms for time series forecasting Hierarchical and grouped time series 68

193. Hierarchy: states, zones, regions MAPE h = 1 h = 2 h = 4 h = 6 h = 8 Average Top Level: Australia Bottom-up 3.79 3.58 4.01 4.55 4.24 4.06 OLS 3.83 3.66 3.88 4.19 4.25 3.94 WLS 3.68 3.56 3.97 4.57 4.25 4.04 Level: States Bottom-up 10.70 10.52 10.85 11.46 11.27 11.03 OLS 11.07 10.58 11.13 11.62 12.21 11.35 WLS 10.44 10.17 10.47 10.97 10.98 10.67 Level: Zones Bottom-up 14.99 14.97 14.98 15.69 15.65 15.32 OLS 15.16 15.06 15.27 15.74 16.15 15.48 WLS 14.63 14.62 14.68 15.17 15.25 14.94 Bottom Level: Regions Bottom-up 33.12 32.54 32.26 33.74 33.96 33.18 OLS 35.89 33.86 34.26 36.06 37.49 35.43 WLS 31.68 31.22 31.08 32.41 32.77 31.89 Automatic algorithms for time series forecasting Hierarchical and grouped time series 69

194. hts package for R Automatic algorithms for time series forecasting Hierarchical and grouped time series 70 hts: Hierarchical and grouped time series Methods for analysing and forecasting hierarchical and grouped time series Version: 4.5 Depends: forecast (≥ 5.0), SparseM Imports: parallel, utils Published: 2014-12-09 Author: Rob J Hyndman, Earo Wang and Alan Lee Maintainer: Rob J Hyndman Rob.Hyndman at monash.edu BugReports: https://github.com/robjhyndman/hts/issues License: GPL (≥ 2)

195. Outline 1 Motivation 2 Forecasting competitions 3 Exponential smoothing 4 ARIMA modelling 5 Automatic nonlinear forecasting? 6 Time series with complex seasonality 7 Hierarchical and grouped time series 8 Recent developments Automatic algorithms for time series forecasting Recent developments 71

196. Further competitions 1 2011 tourism forecasting competition. 2 Kaggle and other forecasting platforms. 3 GEFCom 2012: Point forecasting of electricity load and wind power. 4 GEFCom 2014: Probabilistic forecasting of electricity load, electricity price, wind energy and solar energy. Automatic algorithms for time series forecasting Recent developments 72

200. Forecasts about forecasting 1 Automatic algorithms will become more general — handling a wide variety of time series. 2 Model selection methods will take account of multi-step forecast accuracy as well as one-step forecast accuracy. 3 Automatic forecasting algorithms for multivariate time series will be developed. 4 Automatic forecasting algorithms that include covariate information will be developed. Automatic algorithms for time series forecasting Recent developments 73

204. For further information robjhyndman.com Slides and references for this talk. Links to all papers and books. Links to R packages. A blog about forecasting research. Automatic algorithms for time series forecasting Recent developments 74

Automatic algorithms for time series forecasting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Automatic algorithms for time series forecasting

Similar to Automatic algorithms for time series forecasting (20)

Recently uploaded

Recently uploaded (20)

Automatic algorithms for time series forecasting