Solved – Plotting raw data, but running statistics on log-transformed data

lognormal distributionmeanmedian

My data is non-normal, I want to show my raw data, in a scientific journal, by median +/- mad, to show the true nature of the data.

However, if log-transformed, the data is normal. Can I then indicate significance based on calculation on log-transformed data? Which results in median+/-mad presented along with results from parametric tests.

Best Answer

My rule of thumb is that if you do a statistical test on a transformation of the data that you will plot, then you should plot that transformation of the data.

Ideally the transformation should be motivated by the data type; for example, suppose you are looking cell counts in a Petri dish. Since these grow exponentially (at least until they hit the limit allowed by the dish), a log transformation is well justified, both scientifically and in terms of making the data look more normal. In this case, the log transformed data better answers the scientific question of interest and eases the statistical methodology, so it's clearly the form of the data the reader should be interested in.

In the real world, sometimes log transformations are used almost entirely because doing so simplifies the analysis (e.g. taming outliers, etc). In this case, I would still plot the transformed data: this is the form of the data for which you are answering questions about (i.e. "are the means of the log-transformed data different?") and as such, the reader should be most interested in this form of the data.