Solved – Percentage interpretation of negative values when you can’t use log transformation

data transformationmathematical-statisticsregression

I have a data set of 5 indicators of the stock market. 2 of the indicators have negative values: e.g. they range from say -50 to 100. After running a regression I would like to be able to compare the results on a percentage basis.

For example if my model is $y = B_1+B_2 x$, $y$ = dep variable, $x$ = indep then I want to know how much $y$ change in $\%$ if $x$ changes $1\%$.

Notes:

  1. As I have negative values I can't use the log transformation
  2. The values for the indicators are very different (some range 2000-1000, some 50-100) which is why I want the % interpretation for easier interpretation.

Best Answer

You need to think carefully both about your model and what you mean by "relative importance."

First, it's not completely clear from your question if you are going to compare results from multiple single-variable regressions or to compare coefficients within a combined multiple regression. The latter is almost certainly the way to proceed.

Second, you need to make sure that your predictor variables have been transformed in a way that provides reliable linear regression results. If you have reliable linear regression results with untranformed predictor variables, then each coefficient will represent the change in y per unit change in x. In that case, attempts to convert to percentages will depend on the baseline value you assume for x and percentage changes may be misleading. This is particularly a problem with predictors that go from negative to positive: what would you take as the percentage change in a variable that has an initial value of 0?

Third, even if you were in a situation where percentage changes were appropriate for regression (on log-transformed variables) and for interpretation, you would have to consider your purpose in evaluating "relative importance" in your model. Say, for example, that 2 predictors had identical regression coefficients but one coefficient had a much higher standard error than the other: would you really want to consider them equally important?

Your best approach would be to get a reliable regression model first, then to use some well chosen examples to illustrate how the predictor variables can inform decisions. That would incorporate the coefficients themselves, the typical ranges of the predictor variables, and your confidence in the values of the regression coefficients. Useful illustrations don't need to be based on percentage changes, and should not be based on them if percentages changes are misleading.