Solved – Life after the Box-Cox transformation

data transformationnormal distribution

Suppose, we have a set of measurements of some quantity in some units of measurement. We also have a nice model that heavily relies on the properties of the Gaussian distribution. The model is tailored for data in some specific units of measurement with some physical meaning behind (like watt, ohm, etc.). It turns out that the distribution of the data does not exactly follow the normal distribution and has some undesired features (like skewness). We apply the popular Box-Cox transformation and obtain a more or less normally distributed data set. The problem now is that we have logarithms, powers, etc. of the original measurements, which contradicts with our nice model.

The question is, what can one do in such a situation? I need to change the model such that it can handle the new data? And in general, if I got everything correctly, why do people what to study transformed data that have lost their physical meaning? Because, at the end of the day, one will, probably, have to return back to the original units of measurement.

Best Answer

First of all, if you mean a linear regression model, it does not assume the data are normally distributed, it assumes the error as estimated by the residuals is normally distributed (in fact, they should be iid $\mathcal{N}(0,\sigma)$).

Second, if that assumption is violated and you want to keep your original units, you can use some other form of regression - there are a variety of robust regression models, loess models, spline models, etc.