Solved – Boxplots vs. Confidence Intervals

boxplotconfidence interval

I designed a heuristic that solves a problem concerning network graphs. It was tested on thousands of different instances that have various different characteristics: Topology, template, number and position of users, capacities, … It produced more than 300000 results that also depend on the random seed that was used.

In order to evaluate the results, I decided to use boxplots that I created with JFreeChart. I created different diagrams for the different topologies and with separate plots for every template. I felt that was a good way to visually summarize the results.

I was asked, why I didn't use confidence intervals instead to give an estimate. From what I know, these depend on a underlying distribution of a population parameter while boxplots don't. However, I summarize results of different seeds, numbers of users and capacities. All these influence the results. So I think I would not be possible to use confidence intervals unless I distinguished every single network characteristic.

Is that true? What are other advantages and disadvantages? And how could I argue, that I only use boxplots and not confidence intervals?

Best Answer

Choosing box plots means that you print the 25th and 75th percentiles. Why not choose to print 2.5 and 97.5 percentiles? At n=300000 and unknown distribution that would be the most sensible definition of a confidence interval. You might even consider printing both in just one plot.

The purpose of the data evaluation is not perfectly clear and thus there is no better or worse to advise. If this is all about description, I personally feel that both descriptors contain too little of the available information. Have you considered violin plots? They might tell a lot more about the data's distribution than a boxplot or a confidence interval and take no more space than boxplots.