Solved – Can you use heteroskedastic time series variables within a regression model

heteroscedasticitymulticollinearityregressionstepwise regression

We are working on a multivariate linear regression model. Our objective is to forecast the quarterly % growth in mortgage loans outstanding.

The independent variables are:
1) Dow Jones level.
2) % change in Dow Jones over past quarter.
3) Case Shiller housing price index.
4) % change in Case Shiller housing price index over past quarter.

In a stepwise regression process, all above variables were selected. And, variables 1) and 3) were surprisingly significant. But, that is when used in combination. Somehow, there is something about the difference in those two indeces that does partly explain the % change in mortgage loans outstanding.

For my part, I find variables 1) and 3) problematic. This is because I believe they are heteroskedastic. And, their respective coefficients confidence interval are therefore unreliable. I also think they may have multicollinearity issues with their related variables based on % change. They may also cause some autocorrelation issues.

However, at first it seems some of my concerns may be overstated. After graphing the residuals of the whole model they seem OK. They don't trend upward. So, it appears the heteroskedasticity is not an issue for the whole model. Multicollinearity is not too bad. The variable with the highest VIF is around 5 much lower than the usual threshold of 10.

Nevertheless, I am still somewhat concerned that even though the whole model seems OK; the specific regression coefficients of the mentioned variables may not be (or more specifically the related confidence intervals).

Best Answer

Your dependent variable is growth. For economic time-series data it is more likely that growth will be a stationary process. This means that it will have constant mean. The level data on the other hand is usually non-stationary. Since your model is linear regression you assume that the true data generating process is

$$Y_t=\alpha_0+X_{1t}\alpha_1+...+X_{kt}\alpha_k+u_t$$

where $u_t$ is a white noise, $Y_t$ is stationary and $X_{kt}$ are non-stationary. Now stationarity implies that

$$EY_t=const=\alpha_0+\alpha_1EX_{1t}+...+\alpha_kEX_{kt}$$.

Now $EX_{kt}$ are functions of time which for non-stationary processes changes with time. So you are implying that

$$\alpha_0+\alpha_1\mu_1(t)+...+\alpha_k\mu_k(t)=const$$

for some non-constant functions $\mu_k(t)$. This places quite severe restrictions on non-stationary processes $X_{kt}$. For example if we have only one independent variable this restriction becomes

$$\alpha_1\mu_1(t)=const-\alpha_0$$

so either $\mu_1$ is constant or $\alpha_1$ is zero. In first case this contradicts the presumption, that $X_1$ is non-stationary in the second case the regression model is of no use.

So in general this is why it is not a good idea to mix levels and growths in regression, unless you are really sure that they are all stationary.

Another problem with time-series regression that for certain class of non-stationarity processes the regression can be spurious. In this case you cannot trust the least squares estimates $\hat{\alpha}_k$, since in spurious regression case their distribution is not normal and does not tend to normal, so usual regression statistics do not apply. For example you can get that $\alpha_k$ is significantly non-zero, when it is actually zero. So before the regression it is always a good idea to test whether your variables are not integrated using some variant of Dickey-Fuller test. I strongly suspect that Dow Jones index is integrated process.

Now as others pointed out, heteroscedasticity in the independent regression variable is harmless. The problems can arise if regression errors are heteroscedastic. Then the least squares estimates will be consistent, but inefficient, also the standard errors should be adjusted, for hypothesis testing.

Related Question