Solved – Inter quartile range and outliers in boxplot with logarithmic y-axis

data visualizationdistributionsr

Let's say I have a set of data which are lognormal-distributed and I want to
create boxplots. The part of the values which is symbolized by the box may lie somewhere around 1. In that case, the inter quartile range will also be single digit. With the lower values lying somewhere below 0.1, even the most extreme values will be covered by 1.5 times the IQR, and thus by the lower whisker. On the contrary, the upper whisker will be quite short, making every value of 10 or higher a possible outlier.

As I have up to 9000 observations in each plot, ranging from 10^-7 to 1, I always get plots with very long lower whiskers and a huge field of outliers to the top.

This makes me wonder:
1. Is there a rule (of thumb) for what should be considered an outlier
among logarithmically distributed data?
2. Is there a more elegant way to configure the whiskers for boxplots with logarithmic y-axis?

For R-users, this may serve as a minimal example:
data <- rlnorm(50) # random-generated lognormal data
IQR(data) # interquartile range
boxplot(data, log='y')

Best Answer

Since none of your values are negative or zero, you can just take the log of the value and then do a boxplot of that. Given that you say your values are lognormal distributed, this seems like the easiest way to proceed.