In the boxplot()
function in R, there exists the log =
argument for specifying whether or not an axis should be on the log scale.
To me, if I choose this option (specify log = "y"
as an argument), the shape of the box-plot should look the same as if I manually transform the data first with the log, then plot that log-transformed data (I recognize the labels on the axis will be different, but I'm referring to the shape of the plot). However, this isn't the case.
Here is a simple working example:
set.seed(923489)
data <- rlnorm(300, meanlog = 0, sdlog = 1)
boxplot(data) # Highly skewed right raw data
boxplot(data, log="y") # Data on log scale; less right-skewed
boxplot(log10(data)) # Log base 10-transform data; shape not the same as when specify log="y"
boxplot(log(data)) # Natural log and base 10 give same shape plot (just different axis labels)
Why is this so?
Best Answer
Obviously, the box with the median "belt" looks the same. The difference are the whiskers. In the default settings,
?boxplot
tells us thatrange
is positive, namely 1.5 in the default. So do the whiskers extend 1.5 times the box, but in which scale? If you callboxplot(data, log="y")
, it is 1.5 on the unscaled data; thus the lower whisker becomes longer. If you callboxplot(log(data))
the whiskers are necessarily symmetric.