Solved – the advantage of transforming variables into First Difference of the Natural Log instead of % change from one period to the next

data transformationeconometricsmacroeconomicsmultiple regressiontime series

I am dealing with macroeconomics time series data, and I build econometrics models. I am aware that some econometrists like to transform such variables as the First Difference in the Natural Log (FDNL) from one period to the next. I have more commonly used the % change from one period to the next. Both variable forms result in very similar values. But, they are not the same.

Using the FDNL has certain advantages:
1) It reduces the Skewness and Kurtosis of the distribution of the variable. By doing so, it strengthens the Normal distribution assumption of the regression.
2) Doing so, may improve the Goodness-of-fit (higher R Square, lower standard error) of the model.
3) It may also improve the testing of the model. Residuals may be less heteroskedastic, and be closer to Normally distributed.

On the other hand, using % change has its own advantages:
1) It is far more transparent, and easy to communicate to various audience. When you forecast a resulting 3% annual growth in Real GDP, you mean exactly that. When you convey a similar figure using FDNL, you actually mean something slightly different. And, it is not so easy to explain.
2) When using % change, you do not tone down the Outliers (you do when using FDNL). Those Outliers may have much information. This information may be very useful in different circumstances such as if you run a VAR version of the model to explore related Impulse Response Functions (IRFs); and, also when running Stress Test scenarios. Using a model based on % change, you may be less likely to underestimate the impact of a recession or various other economic shocks.

Am I missing something? How do you see the pros and cons of either variable form? What do you use yourself when developing such models?

Best Answer

Transformations are like drugs, some are good for you and some aren't. You should presume neither transformation but rather detect the appropriate solution based upon the data that you are trying to model. My answer to the log issue When (and why) should you take the log of a distribution (of numbers)? suggests that there is a logical procedure to determine to take logs or not. Taking logs or any other power transformation can remedy remedy non-constant error processes. Unnecessary/unneeded differencing of a time series can inject structure which then has to be taken out via model coefficients. Differencing is a form of an auto-regressive model and such should be identified as useful in separating signal from noise i.e. developing a useful ARIMA model. Outliers are not remedied by differencing this is accomplished by adding deterministic structure to an ARIMA model. See http://www.unc.edu/~jbhill/tsay.pdfthe which focuses on alternative schemes to deal with deterministic non-constant error process and the possible need for adding deterministic structure i.e. Pulses, Level Shifts, Seasonal Pulses and/or Local Time Trends.