Interpreting Regression with First Differenced Variables – Applied Guide

regressiontime series

I have two time-series:

  1. A proxy for the market risk premium (ERP; red line)
  2. The risk-free rate, proxied by a government bond (blue line)

Risk premium proxy and risk-free rate over time

I want to test if the risk-free rate can explain the ERP. Hereby, I basically followed the advice of Tsay (2010, 3rd edition, p. 96): Financial Time Series:

  1. Fit the linear regression model and check serial correlations of the residuals.
  2. If the residual series is unit-root nonstationarity, take the first difference of both the dependent and explanatory variables.

Doing the first step, I get the following results:

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     6.77019    0.25103   26.97   <2e-16 ***
Risk_Free_Rate -0.65320    0.04123  -15.84   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As expected from the figure, the relation is negative and significant. However, the residuals are serially correlated:

ACF function of the residuals of the regression of risk-free rate on ERP

Therefore, I first difference both the dependent and explanatory variable. Here is what I get:

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.002077   0.016497  -0.126      0.9    
Risk_Free_Rate -0.958267   0.053731 -17.834   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And the ACF of the residuals looks like:

ACF function of the residuals of the regression of risk-free rate on ERP (differenced)

This result looks great: First, the residuals are now uncorrelated. Second, the relation seems to be more negative now.

Here are my questions (you probably wondered by now 😉 The first regression, I would have interpreted as (econometric problems aside) "if the riskfree rate rises by one percentage point, the ERP falls by 0.65 percentage points." Actually, after pondering about this for a while, I would interpret the second regression just the same (now resulting in a 0.96 percentage points fall though). Is this interpretation correct? It just feels weird that I transform my variables, but don't have to change my interpretation. If this, however, is correct, why do the results change? Is this just the result of econometric problems? If so, does anyone have an idea why my second regression seems to be even "better"? Normally, I always read that you can have spurious correlations that vanish after you do it correctly. Here, it seems the other way round.

Best Answer

Suppose that we have the model $$\begin{equation*} y_t = \beta_0 + \beta_1 x_t + \beta_2 t + \epsilon_t. \end{equation*}$$ You say that these coefficients are easier to interpret. Let's subtract $y_{t-1}$ from the lefthand side and $\beta_0 + \beta_1 x_{t-1} + \beta_2 ({t-1}) + \epsilon_{t-1}$, which equals $y_{t-1}$, from the righthand side. We have $$\begin{equation*} \Delta y_t = \beta_1 \Delta x_t + \beta_2 + \Delta \epsilon_t. \end{equation*}$$ The intercept in the difference equation is the time trend. And the coefficient on $\Delta x$ has the same interpretation as $\beta_1$ in the original model.

If the errors were non-stationary such that $$\begin{equation*} \epsilon_t = \sum_{s=0}^{t-1}{\nu_s}, \end{equation*}$$ such that $\nu_s$ is white noise, the the differenced error is white noise.

If the errors have a stationary AR(p) distribution, say, then the differenced error term would have a more complicated distribution and, notably, would retain serial correlation. Or if the original $\epsilon$ are already white noise (An AR(1) with a correlation coefficient of 0 if you like), then differencing induces serial correlation between the errors.

For these reasons, it is important to only difference processes that are non-stationary due to unit roots and use detrending for so-called trend stationary ones.

(A unit root causes the variance of a series to change and it actually explode over time; the expected value of this series is constant, however. A trend stationary process has the opposite properties.)

Related Question