Interpreting Regression with First Differenced Variables – Applied Guide

regressiontime series

I have two time-series:

A proxy for the market risk premium (ERP; red line)
The risk-free rate, proxied by a government bond (blue line)

Risk premium proxy and risk-free rate over time

I want to test if the risk-free rate can explain the ERP. Hereby, I basically followed the advice of Tsay (2010, 3rd edition, p. 96): Financial Time Series:

Fit the linear regression model and check serial correlations of the residuals.
If the residual series is unit-root nonstationarity, take the first difference of both the dependent and explanatory variables.

Doing the first step, I get the following results:

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     6.77019    0.25103   26.97   <2e-16 ***
Risk_Free_Rate -0.65320    0.04123  -15.84   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As expected from the figure, the relation is negative and significant. However, the residuals are serially correlated:

ACF function of the residuals of the regression of risk-free rate on ERP

Therefore, I first difference both the dependent and explanatory variable. Here is what I get:

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.002077   0.016497  -0.126      0.9    
Risk_Free_Rate -0.958267   0.053731 -17.834   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And the ACF of the residuals looks like:

ACF function of the residuals of the regression of risk-free rate on ERP (differenced)

This result looks great: First, the residuals are now uncorrelated. Second, the relation seems to be more negative now.

Here are my questions (you probably wondered by now 😉 The first regression, I would have interpreted as (econometric problems aside) "if the riskfree rate rises by one percentage point, the ERP falls by 0.65 percentage points." Actually, after pondering about this for a while, I would interpret the second regression just the same (now resulting in a 0.96 percentage points fall though). Is this interpretation correct? It just feels weird that I transform my variables, but don't have to change my interpretation. If this, however, is correct, why do the results change? Is this just the result of econometric problems? If so, does anyone have an idea why my second regression seems to be even "better"? Normally, I always read that you can have spurious correlations that vanish after you do it correctly. Here, it seems the other way round.

Best Answer

Suppose that we have the model $$\begin{equation*} y_t = \beta_0 + \beta_1 x_t + \beta_2 t + \epsilon_t. \end{equation*}$$ You say that these coefficients are easier to interpret. Let's subtract $y_{t-1}$ from the lefthand side and $\beta_0 + \beta_1 x_{t-1} + \beta_2 ({t-1}) + \epsilon_{t-1}$, which equals $y_{t-1}$, from the righthand side. We have $$\begin{equation*} \Delta y_t = \beta_1 \Delta x_t + \beta_2 + \Delta \epsilon_t. \end{equation*}$$ The intercept in the difference equation is the time trend. And the coefficient on $\Delta x$ has the same interpretation as $\beta_1$ in the original model.

If the errors were non-stationary such that $$\begin{equation*} \epsilon_t = \sum_{s=0}^{t-1}{\nu_s}, \end{equation*}$$ such that $\nu_s$ is white noise, the the differenced error is white noise.

If the errors have a stationary AR(p) distribution, say, then the differenced error term would have a more complicated distribution and, notably, would retain serial correlation. Or if the original $\epsilon$ are already white noise (An AR(1) with a correlation coefficient of 0 if you like), then differencing induces serial correlation between the errors.

For these reasons, it is important to only difference processes that are non-stationary due to unit roots and use detrending for so-called trend stationary ones.

(A unit root causes the variance of a series to change and it actually explode over time; the expected value of this series is constant, however. A trend stationary process has the opposite properties.)

Related Solutions

Solved – Comparing two linear regression models

If you set up the data in one long column with A and B as a new column, you then can run your regression model as a GLM with a continuous time variable and a nominal "experiment" variable (A, B). The output of the ANOVA will give you the significance of the difference between the parameters. "intercept' is the common intercept and the "experiment" factor will reflect differences between the intercepts (actually overall means) between the experiments. the "Time" factor will be the common slope, and the interaction is the difference between the experiments with respect to the slope.

I have to admit I cheat (?) and run the models separately first to get the two sets of parameters and their errors and then run the combined model to acquire the differences between the treatments (in your case A and B)...

Multiple Regression – How to Perform Multiple Regression as a Sequence of Univariate Regressions

As per Bill Huber's comments and answer elsewhere, the trick is to remove the influence of the independent variables on each other whenever producing each sequential regression. In other words instead of:

lm(lm(x ~ y1)$residuals ~ y2)

We want:

lm(lm(x ~ y1)$residuals ~ lm(y2 ~ y1)$residuals)

In this case, we DO get back to the multiple regression:

enter image description here

Moreover, we can show the coefficients are the same:

> round(coef(lm(lm(it30 ~ itpc1)$residuals ~ lm(itpc2 ~ itpc1)$residuals)), 5) 
(Intercept) lm(itpc2 ~ itpc1)$residuals  #$
    0.00000                    -0.21846 
> round(coef(lm(lm(it30 ~ itpc2)$residuals ~ lm(itpc1 ~ itpc2)$residuals)), 5) 
(Intercept) lm(itpc1 ~ itpc2)$residuals  #$
    0.00000                     0.29197 
> round(coef(lm(it30 ~ itpc1 + itpc2)), 5)
(Intercept)       itpc1       itpc2 
    0.01186     0.29197    -0.21846

Interestingly, and as expected, if the independent variables are orthogonal as in PCA regression, then we do not need to take out the influence of each of the regressors against each other. In this case it is true that:

lm(lm(x ~ y1)$residuals ~ y2)$residuals

is perfectly correlated with:

lm(x ~ y1 + y2)$residuals

as can be seen here:

enter image description here

This is because the orthogonal principal components have a zero-slope regression line and thus the residuals are equal to the dependent variable (with a vertical translation to mean=0).

enter image description here

Best Answer

Related Solutions

Solved – Comparing two linear regression models

Multiple Regression – How to Perform Multiple Regression as a Sequence of Univariate Regressions

Related Question