Solved – smooth the dependent variable with a moving average in an OLS (or feasible GLS) regression

moving averagepanel datasmoothing

I have a monthly unbalanced data panel of balance sheet data of 70 banks (on average 170 observations per bank) over 20 years.
I have autocorrelation and heteroskedasticity within panels.

I am trying to test the hyphotesis that there is a relation between transactional deposits and credit commitment (limit of line of credit). I would like to say "all being equal, on average a bank with more transactional deposits would offer more credit commitments to their clients". Namely, the argument is that in order to "produce" credit commitments a bank needs transactional depostis.
In another step of the research the final goal is to build some kind of "production function" for credit commitments.

My dependent variable is the ratio of credit commitments to total loans (comitRatio).
My independent variable (regressor) is the ratio of transactional deposits to total deposits (depRatio).

I am using natural logarithms of these variable because I would like to interpret estimated coefficients as elasticities. So I have ln(comitRatio) and ln(depRatio).

$\ln(comitRatio_{it} ) = b_0 + b_1 \ln(depRatio_{it}) + b_2 X_{it} + e_{it}$

where $X_{it}$ are control variables, most of them time-invariant.

I chose to use moving average of 3, 6 and 12 months for independent variable ln(depRatio). This is because the hypothesis states that, at point $t$ in time, commitments offer (comitRatio) is decided base on depRatio from previous periods.
Also, since the regressor depRatio for some banks (particulary the smaller ones) varies too much from one month to the other I suppose a moving average would show me a better picture.

My main concern is the ceteris paribus effect of the independent variable over the dependent.

I am working with a fixed effect model with autocorrelation (AR(1) model) and robust standard error (xtpcse command in Stata).

The model works better (higher $R^2$, higher coefficient estimates, higher $z$-statistics) when I use moving average than when I use the original values (even when I use lagged values such us L.depRatio).

My question is, given this setup, can I use moving average only on independent variable?
Shall I smooth independent as well as dependent variable, or would it not be advisable to smooth the dependent variable?

Best Answer

The purpose of your modelling could be, e.g., (1) descriptive, (2) explanatory or (3) predictive.

  • If (1), smoothing could be useful. You could elicit the slow moving trend and use that for data visualizations. You would see the relations between the slowly-moving trend components of the different series more clearly than using the original series. Of course, you would have to acknowledge that smoothing has taken place and that the relations you have elicited only hold for smoothed components, while the real variables are more erratic.
  • If (2), directly using smoothed variables would mess up point estimates and their standard errors in your models. Therefore, you could not test hypotheses in a straightforward way. Time series decomposition (see below) could probably be helpful here.
  • If (3), instead of smoothing you could try decomposing the time series in the slow-moving trend, seasonal and remainder components. Then you could try modelling and forecasting each of them separately, and then put these forecasts together to obtain a forecast of the original variable. On the other hand, pure smoothing could make you lose valuable information.

Your case seems to be explanatory. If you are interested in the long-term relation between variables, you should probably use time series decomposition and interpret your findings accordingly. That is, you should not claim a relationship between the original variables but just between specific components. You should then also think whether that has a sensible subject-matter interpretation.

Edit (after an edit of the question):

Removing the noise from the independent variable by smoothing gives you higher $R^2$ (as you note), but this is an artefact of smoothing, so it should be taken with a grain of salt. Once you have smoothed the independent variable, you should not be making direct inference w.r.t. the original variable. That is something to be careful about -- see my point (2) above. However, in your case it seems that smoothing could make sense as the dependent variable at time $t$ does not depend on the regressor as of a precise time point in the past, but rather over a time interval. Thus you would explicitly define your regressor as a smooth version of the original variable and you would make inference with respect to this smoothed regressor. That could work.

If you smooth the dependent variable, too, you will probably increase the $R^2$ even more, but you will depart even further from direct interpretation, because again the change in $R^2$ will be an artefact of smoothing.

As an alternative, you could probably sample your data less frequently. Then you should see more signal relative to noise (as signal would accumulate between the infrequent sample points while noise would not), but you would still be able to interpret the results directly (unlike in the case of smoothing). However, this approach could immediately be criticized as throwing away data. There could probably be better alternatives. If the smoothed regressor makes sense on its own, you do not need to do this.

Related Question