Solved – Linear regression effect sizes when using transformed variables

data transformationeffect-sizeregression

When performing linear regression, it is often useful to do a transformation such as log-transformation for the dependent variable to achieve better normal distribution conformation. Often it is also useful to inspect beta's from the regression to better assess the effect size/real relevance of the results.

This raises the problem that when using e.g. log transformation, the effect sizes will be in log scale, and I've been told that because of non-linearity of the used scale, back-transforming these beta's will result in non-meaningful values that do not have any real world usage.

This far we have usually performed linear regression with transformed variables to inspect the significance and then linear regression with the original non-transformed variables to determine the effect size.

Is there a right/better way of doing this? For the most part we work with clinical data, so a real life example would be to determine how a certain exposure affects continues variables such as height, weight or some laboratory measurement, and we would like to conclude something like "exposure A had the effect of increasing weight by 2 kg".

Best Answer

I would suggest that transformations aren't important to get a normal distribution for your errors. Normality isn't a necessary assumption. If you have "enough" data, the central limit theorem kicks in and your standard estimates become asymptotically normal. Alternatively, you can use bootstrapping as a non-parametric means to estimate the standard errors. (Homoskedasticity, a common variance for the observations across units, is required for your standard errors to be right; robust options permit heteroskedasticity).

Instead, transformations help to ensure that a linear model is appropriate. To give a sense of this, let's consider how we can interpret the coefficients in transformed models:

  • outcome is units, predictors is units: A one unit change in the predictor leads to a beta unit change in the outcome.
  • outcome in units, predictor in log units: A one percent change in the predictor leads to a beta/100 unit change in the outcome.
  • outcome in log units, predictor in units: A one unit change in the predictor leads to a beta x 100% change in the outcome.
  • outcome in log units, predictor in log units: A one percent change in the predictor leads to a beta percent change in the outcome.

If transformations are necessary to have your model make sense (i.e., for linearity to hold), then the estimate from this model should be used for inference. An estimate from a model that you don't believe isn't very helpful. The interpretations above can be quite useful in understanding the estimates from a transformed model and can often be more relevant to the question at hand. For example, economists like the log-log formulation because the interpretation of beta is an elasticity, an important measure in economics.

I'd add that the back transformation doesn't work because the expectation of a function is not the function of the expectation; the log of the expected value of beta is not the expected value of the log of beta. Hence, your estimator is not unbiased. This throws off standard errors, too.

Related Question