I have a monthly unbalanced data panel of balance sheet data of 70 banks (on average 170 observations per bank) over 20 years.
I have autocorrelation and heteroskedasticity within panels.
I am trying to test the hyphotesis that there is a relation between transactional deposits and credit commitment (limit of line of credit). I would like to say "all being equal, on average a bank with more transactional deposits would offer more credit commitments to their clients". Namely, the argument is that in order to "produce" credit commitments a bank needs transactional depostis.
In another step of the research the final goal is to build some kind of "production function" for credit commitments.
My dependent variable is the ratio of credit commitments to total loans (comitRatio
).
My independent variable (regressor) is the ratio of transactional deposits to total deposits (depRatio
).
I am using natural logarithms of these variable because I would like to interpret estimated coefficients as elasticities. So I have ln(comitRatio)
and ln(depRatio)
.
$\ln(comitRatio_{it} ) = b_0 + b_1 \ln(depRatio_{it}) + b_2 X_{it} + e_{it}$
where $X_{it}$ are control variables, most of them time-invariant.
I chose to use moving average of 3, 6 and 12 months for independent variable ln(depRatio)
. This is because the hypothesis states that, at point $t$ in time, commitments offer (comitRatio
) is decided base on depRatio
from previous periods.
Also, since the regressor depRatio
for some banks (particulary the smaller ones) varies too much from one month to the other I suppose a moving average would show me a better picture.
My main concern is the ceteris paribus effect of the independent variable over the dependent.
I am working with a fixed effect model with autocorrelation (AR(1) model) and robust standard error (xtpcse
command in Stata).
The model works better (higher $R^2$, higher coefficient estimates, higher $z$-statistics) when I use moving average than when I use the original values (even when I use lagged values such us L.depRatio
).
My question is, given this setup, can I use moving average only on independent variable?
Shall I smooth independent as well as dependent variable, or would it not be advisable to smooth the dependent variable?
Best Answer
The purpose of your modelling could be, e.g., (1) descriptive, (2) explanatory or (3) predictive.
Your case seems to be explanatory. If you are interested in the long-term relation between variables, you should probably use time series decomposition and interpret your findings accordingly. That is, you should not claim a relationship between the original variables but just between specific components. You should then also think whether that has a sensible subject-matter interpretation.
Edit (after an edit of the question):
Removing the noise from the independent variable by smoothing gives you higher $R^2$ (as you note), but this is an artefact of smoothing, so it should be taken with a grain of salt. Once you have smoothed the independent variable, you should not be making direct inference w.r.t. the original variable. That is something to be careful about -- see my point (2) above. However, in your case it seems that smoothing could make sense as the dependent variable at time $t$ does not depend on the regressor as of a precise time point in the past, but rather over a time interval. Thus you would explicitly define your regressor as a smooth version of the original variable and you would make inference with respect to this smoothed regressor. That could work.
If you smooth the dependent variable, too, you will probably increase the $R^2$ even more, but you will depart even further from direct interpretation, because again the change in $R^2$ will be an artefact of smoothing.
As an alternative, you could probably sample your data less frequently. Then you should see more signal relative to noise (as signal would accumulate between the infrequent sample points while noise would not), but you would still be able to interpret the results directly (unlike in the case of smoothing). However, this approach could immediately be criticized as throwing away data. There could probably be better alternatives.If the smoothed regressor makes sense on its own, you do not need to do this.