Solved – modeling time series data with lm()

rregressiontime series

After you decompose a univariate time series with stl() function in R you are left with the trend, seasonal and random components of the time series. Is it valid to use those components to then model the original timer series with additional other variables?

For example:

> tsData
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012  22  26  34  33  40  39  39  45  50  58  64  78
2013  51  60  80  80  93 100  96 108 111 119 140 164
2014 103 112 154 135 156 170 146 156 166 176 193 204

> stl(tsData, s.window = "periodic")
 Call:
 stl(x = tsData, s.window = "periodic")

Components
            seasonal     trend   remainder
Jan 2012 -24.0219753  36.19189   9.8300831
Feb 2012 -20.2516062  37.82808   8.4235219
Mar 2012  -0.4812396  39.46428  -4.9830367
Apr 2012 -10.1034302  41.32047   1.7829612
...
Sep 2014   2.2193527 165.55136  -1.7707170
Oct 2014   7.3239448 169.33893  -0.6628760
Nov 2014  18.4285405 173.12650   1.4449614
Dec 2014  30.5244146 176.84390  -3.3683103

Now if I wanted to model the time series with a linear model with some other variables is it valid to do so?

lm(index ~ trend + seasonal + s1 + s2, data)

When I run that model I get an R-squared = .98 which make sense considering that the original time series index is just the sum of trend + season + error. I guess what I'm concerned about is using a linear model with time series data I want to make sure I'm not violating some major rules of linear regression. I figure since I have the seasonal variable I'm essentially controlling for that element and hopefully reducing the auto correlation or am I since the R-squared is so high? Any help is appreciated!

Best Answer

Once you have decomposed your original $index$ series into $seasonal$, $trend$ and $remainder$, you know that

$$index=seasonal+trend+remainder$$

holds exactly with unit coefficients in front of the three components.

You then remove the last component $remainder$ and put in two regressors $s_1$ and $s_2$ instead.
If you kept the coefficients in front of $seasonal$ and $trend$ fixed at 1 and ran a regression

$$index=\beta_0+1 \cdot seasonal+1 \cdot trend+\beta_1 s_1+\beta_2 s_2+\varepsilon$$

then it would be equivalent to running the following regression

$$remainder=\beta_0+\beta_1 s_1+\beta_2 s_2+\varepsilon$$

This could very well make sense if you were interested in explaining the $remainder$ component using regressors $s_1$ and $s_2$ (and did not care about explaining $seasonal$ and $trend$ components).

What you actually do is leave the coefficients in front of $seasonal$ and $trend$ unrestricted. This implies that you do not completely "agree" with stl decomposition as you allow $seasonal$ and $trend$ to be multiplied by some coefficients.

I wonder how you would interpret that? If you got an OLS estimate of the $seasonal$ coefficient of 1.2, would you say that seasonality is 1.2 times more variable than stl suggests? I am not sure this makes much sense especially when the three components are not observed but rather derived using stl. So first you "agree" with the stl assumptions to derive the components and later you start to "disagree" and try to fit those components using OLS.

Regarding the high $R^2$ value it need not surprise you. The three stl components give a perfect fit for $index$. If the variability in $seasonal$ and $trend$ is large compared with $remainder$, you should expect to get a high $R^2$ in the OLS regresion of $index$ on just $seasonal$ and $trend$ (excluding $remainder$) even without the extra regressors $s_1$ and $s_2$.

Regarding autocorrelation and possible violations of OLS assumptions, you can simply test for those once you have estimated your regression.

Related Solutions

Solved – Forecasting beyond one season using Holt-Winters’ exponential smoothing

I am not very familiar with Holt-Winters, however I have this excellent book by @Rob Hyndman. The package forecast (which is based on the book) of statistical package R gives the following result on your data:

> hw<-read.table("~/R/stackoverflow/hw.txt")
> tt<-ts(hw[,3],start=c(1999,1),freq=12)

> aa<-forecast(tt)
> plot(aa)
> summary(aa)

Forecast method: ETS(M,N,A)

Model Information:
ETS(M,N,A) 

Call:
 ets(y = object) 

  Smoothing parameters:
    alpha = 0.1701 
    gamma = 1e-04 

  Initial states:
    l = 870.4847 
    s = -278.0815 -143.6584 151.959 -135.595 514.2527 236.9216
           -32.7679 128.8337 115.0829 47.5922 -234.4105 -370.1288

  sigma:  0.1122

     AIC     AICc      BIC 
1892.756 1896.346 1933.115 

In-sample error measures:
         ME        RMSE         MAE         MPE        MAPE        MASE 
 18.1543007 121.8594668  70.7086492   0.8480306   7.0006920   0.2893504

Here is the graph of the forecast together with the confidence intervals: enter image description here

Note that the function forecast picks automatically the best exponential smoothing model from 30 models which are classified by the type of trend model, seasonal part model and the additivity or multiplicity of error.

The best model found in your data is with multiplicative error, no trend and additive seasonality, which is less complicated model than you are trying to fit. The way function forecast works is however that the more complicated model was considered and rejected in favor the final model.

If you provide the exact formulas it would be possible to fit the precise model to see whether the problem you described is really property of the model.

Solved – How to calculate time series seasonality index in R

Just extract the "figure" component from your "decomposed.ts" object. The seasonal component is just the recycled figure over the time range of the time series.

As for the calculation, I find the explanation in the details section of the manual page helpful: The function first determines the trend component using a moving average (if filter is NULL, a symmetric window with equal weights is used), and removes it from the time series. Then, the seasonal figure is computed by averaging, for each time unit, over all periods. The seasonal figure is then centered. Finally, the error component is determined by removing trend and seasonal figure (recycled as needed) from the original time series.

Of course, there are other methods for constructing such a seasonal index/figure, including the mentioned tslm (or dynlm from the package of the same name) or stl (in stats).

Best Answer

Related Solutions

Solved – Forecasting beyond one season using Holt-Winters’ exponential smoothing

Solved – How to calculate time series seasonality index in R

Related Question