Solved – R: Box-plot on log scale vs. log-transforming *then* creating box-plot: Don’t get same result

boxplotdata transformationlognormal distributionr

In the boxplot() function in R, there exists the log = argument for specifying whether or not an axis should be on the log scale.

To me, if I choose this option (specify log = "y" as an argument), the shape of the box-plot should look the same as if I manually transform the data first with the log, then plot that log-transformed data (I recognize the labels on the axis will be different, but I'm referring to the shape of the plot). However, this isn't the case.

Here is a simple working example:

set.seed(923489)
data <- rlnorm(300, meanlog = 0, sdlog = 1)
boxplot(data) # Highly skewed right raw data
boxplot(data, log="y") # Data on log scale; less right-skewed
boxplot(log10(data)) # Log base 10-transform data; shape not the same as when specify log="y"
boxplot(log(data)) # Natural log and base 10 give same shape plot (just different axis labels)

Why is this so?

Best Answer

Obviously, the box with the median "belt" looks the same. The difference are the whiskers. In the default settings, ?boxplot tells us that

If ‘range’ is positive, the whiskers extend to the most extreme data point which is no more than ‘range’ times the interquartile range from the box.

range is positive, namely 1.5 in the default. So do the whiskers extend 1.5 times the box, but in which scale? If you call boxplot(data, log="y"), it is 1.5 on the unscaled data; thus the lower whisker becomes longer. If you call boxplot(log(data)) the whiskers are necessarily symmetric.

Related Question