Seasonal Regression Analysis – Differences Between Raw Data and Residuals

armageneralized-least-squaresregressionseasonalitytime series

I have a multiple linear model for time series data for which the regression residuals are autocorrelated and display seasonal behavior. This seasonal behavior is induced deterministically by a cyclic variable written into the model. In order to calculate corrected standard errors for the regression coefficients, I intend to use generalized least squares with correction for the autocorrelation in the residuals.

But for the seasonality, I am not sure for which time series I should perform the differencing: If for raw original data (observations), or for the regression residuals. For the data, I could superimpose each cycle and take the mean, thus removing season effects, but this would alter the data, and also would generalize bad for other kinds of data. For the regression residuals, I dont know if I have to difference the residuals time series substracting for each period, or to handle it as some kind of seasonal ARMA process.

Any hints?

LAST EDIT: The problem behind this question might have been resolved already, and might have been product of a misconception.

Regression was being done over simulations of the model. These simulations did not contain any stochastic error factor, so the simulations were purely of a deterministic nature. Regression errors were just showing an unperfect fit to the data and the regression residuals were obviously following a deterministic pattern. This pattern was not a result of some neglected explanatory variable in the model, and thus there was no reason to model it (by means of an ARMA model or any kind). When adding white noise to the data -which should be a crucial step on any simulation of real data- regression residuals were mostly dominated by stochasticity, loosing the autocorrelated behavior.

Best Answer

Edited after comments

In order to calculate corrected standard errors for the regression coefficients, I intend to use generalized least squares with correction for the autocorrelation in the residuals.

Note that generalized least squares (GLS) would affect not only the standard errors but also the point estimates. Anyhow, you could gain power by estimating regression with an explicitly specified error structure, e.g. regression with ARMA errors as can be done using functions stats::arima or forecast::auto.arima in R. There you use maximum likelihood estimation instead of GLS. See related blog posts by Francis X. Diebold "The HAC Emperor has no Clothes" and "The HAC Emperor has no Clothes: Part 2" where he encourages explicit error specification as a way to get better coefficient estimates and gain predictive power. Although he discusses the case of HAC there, I believe similar conclusions apply here, too.

But for the seasonality, I am not sure for which time series I should perform the differencing: If for raw original data (observations), or for the regression residuals.

Since the problem arises due to a cyclic regressor, you could remove the deterministic component of the cyclic variable before including it in the model, or alternatively you could include some seasonal terms (dummies or Fourier terms) in the model.

For the data, I could superimpose each cycle and take the mean, thus removing season effects, but this would alter the data, and also would generalize bad for other kinds of data.

I am a little confused here, but I will try addressing this nevertheless.
With regards to the regressor, you can adjust using a model, and so altering data is not really a problem because you keep track of how you did it and you can recreate the original variable if you need to.
Regarding generalization, if the cyclic behaviour is unique for this instance, keeping it untreated would not help. If, on the other hand, it is similar across this instance and the ones you want to generalize to, you would not lose by removing the deterministic component before running the regression but then using it to adjust the other cases similarly.

A technical note: If you are doing a regression with ARMA errors, then it is the error that gets differenced. If the errors is some SARIMA process, regular treatment of SARIMA models applies (roughly speaking, you do not have to worry that it is a regression error rather than raw data).

Related Solutions

Solved – Fitting multiple linear regression in R: autocorrelated residuals

Try

library(forecast)
fit <- auto.arima(rate, xreg=cbind(askings,questions))

That will fit the linear model as will as automatically identify an ARMA structure for the errors. It uses MLE rather than GLS, but they are asymptotically equivalent.

Time Series – Difference Between Autocorrelated Time-Series and Serially Autocorrelated Errors

Is seems to me that you are getting hung up on the difference between autoregression (temperature today is influenced by temperature yesterday, or my consumption of heroin today depends on my previous drug use) and autocorrelated errors (which have to do with the off-diagonal terms in variance-covariance terms for $\epsilon$ being non-zero. Sticking with your weather example, suppose you model temperature as a function of time, but it is also influenced by things like volcanic eruptions, which you left out of your model. The volcano sends up clouds of dust, which block out the sun, lowering the temperature. This random disturbance will persist over more than one period. This will make your time trend appear less steep than it should be. To be fair, it is probably the case that both autoregression and autocorrelated errors are an issue with temperature.

Autocorrelated errors can also arise in cross-sectional spatial data, where a random shock that affects economic activity in one region will spill over to other areas because they have economic ties. A shock that kills grapes in California will also lower sales of beef from Montana. You can also induce autocorrelated disturbances if you omit a relevant and autocorrelated independent variable from your time-series model.

Best Answer

Related Solutions

Solved – Fitting multiple linear regression in R: autocorrelated residuals

Time Series – Difference Between Autocorrelated Time-Series and Serially Autocorrelated Errors

Related Question