Regression – Interpreting Log-Transformed Percentages in OLS

data transformationeconometricslogarithmregression

In a log-log model, such as $\log(y) = b_0 + b_1 \log(x)$, I know that with OLS the standard interpretation is a "1% increase in x is associated with a $b_1$% increase in y."

I have three related questions:

  1. If x is a percentage variable that was logged to correct skew (ex: x
    ranges from .01 to .99), what is the correct interpretation of the
    resulting regression? A 1% increase in x no longer seems easily
    interpretable in this instance.
  2. How does the interpretation change (if at all), if the model is
    differenced? For example: $\log(y_t) – \log(y_{t-1}) = b_1 \left( \log(x_t) –
    \log(x_{t-1}) \right)$. This is frequently observed in panel data. To me, this
    seems to be also modeling a percentage relationship between y and x,
    and it's unclear if the interpretation would differ.
  3. Do native predict functions in either R or Stata handle logarithmic
    transformations accurately, or do they need to be
    exponentiated/corrected for bias?

To sum up, I would like to know how to accurately generate the predicted percent change in y given a (percentage point) value of x — in contexts where x may be logged and/or differenced.

Best Answer

As you're going through this, keep in mind that the interpretation of a "unit change in a logarithm" as a "percent change" is a local approximation.

1.

You're looking at a percent change in percentage points. Say $x$ measures how full a glass of water is. Some glasses are 25% full, others are 26% full. Un-logged, a 1-unit change in $x$ (i.e. moving from 25% to 26%) is associated with a $b$-unit change in $y$. The fact that the unit is a percentage point is irrelevant here.

Now take the log of $x$ and $y$. A 1-unit change in $\log{x}$ is associated with a $b$-unit change in $\log{y}$. So in the percent-change interpretation, a 1-percent change in $x$ is associated with a $b$-percent change in $y$. That is, moving from a glass that is 25% full to one that is 25.25% full is associated with a $b$% change in $y$.

What if $x$ is already a percent change in something else? Let's say, instead of "glass fullness," $x$ is now how much water has evaporated from a glass over some period of time, measured as a percentage of the original water level. Then a 1% change in $x$, i.e. going from 25% change to 25.25% change, is associated with a $b$% change in $y$.

Is that meaningful? Sure, if it's what you want to model. And chances are good that taking a logarithm to "correct skew" is unnecessary for the independent variable in a regression.

2.

Recall that $\log{u}-\log{v}=\log{u/v}$. So in the "percent change" interpretation, a 1% increase in the ratio of $x_t$ and $x_{t-1}$ is associated with a $b_1$% increase in the ratio of $y_t$ and $y_{t-1}$. This is a slightly messier case than before, but it's still a percent change in percentage points as above. Let's say $x_t=1$ and $x_{t-1}=2$. Then their ratio is $0.5$. Moving from $log{0.5}$ to $log{0.5}+1$ is the same thing as moving that ratio from $0.5$ to $0.5e^{1}=0.5e$, since $\log{e^1}=1$. By the same expansion, this is associated with moving the $y$ ratio from $r$ to $re^b$.

This, of course, is completely different from taking the logarithm of the first differences.

3.

There's no "bias" to correct for. I'm going to assume you mean to ask whether the predict functions automatically transform the data back to original scale. They don't.

R's built-in lm function doesn't (and in some sense can't, and probably shouldn't) keep track of any transformations you apply to your variables. So predict will just take whatever $x$ you feed it and plug it into the fitted line. So if you fit l = lm(log(y) ~ log(x)), predict(l,x) will give you $\widehat{\log{y}}$ and it will assume x is already on a log scale. That doesn't mean you can't write a wrapper function for lm that allows you to keep track of such transformations and a corresponding predict method that undoes these transformations, but that's one for StackOverflow.

The same is even more true in Stata, where a command like reg log(y) log(x) is downright invalid. You have to first do something like gen logx = log(x), gen logy = log(y), and finally reg logy logx. So predict yhat will, as in R, return a log scale variable and assume you are feeding it a log scale variable.