There are at least two different factors potentially contributing to differences in the boxplots produced by Matlab and pgfplots.

## 1. `<=`

and `>=`

(Matlab) vs `<`

and `>`

(pgfplots)

There is a difference in the definitions of whiskers and outliers.

From the manual of pgfplots (I have emphasized the key fact):

lower whisker is the smallest data value which is *larger than* lower quartile−1.5 · IQR

and

upper whisker is the largest data value which is *smaller than* upper quartile+1.5 · IQR

From the manual of Matlab (emphasis added):

Points are drawn as outliers if they are *larger than* Q3+W*(Q3-Q1) or
*smaller than* Q1-W*(Q3-Q1), where Q1 and Q3 are the 25th and 75th
percentiles, respectively. The default value 1.5 corresponds to
approximately +/- 2.7 sigma and 99.3 coverage if the data are normally
distributed. *The plotted whisker extends to* the adjacent value, which
is *the most extreme data value that is not an outlier*.

## 2. Different methods for computing box limits / quartiles

It would be all too easy to say that the box limits are the 1st and 3rd quartile (a.k.a. 25th and 75th percentile, a.k.a. quantiles with probabilities 0.25 and 0.75) and leave it at that. Alas, there are many methods for computing quantiles. Without going into too much detail, there are no less than 9 different `quantile()`

method variants in R. For the example data set, these methods give 7 unique results for the pair of numbers (25th and 75th percentile). These are:

- 0 and 5
- 0.5 and 4.5
- 0.75 and 6.25
- 0.9166667 and 5.4166667
- 0.9375 and 5.3125
- 1 and 5
- 1.25 and 4.75

Matlab finds the box limits to be 1 and 5. According to @Jake (see comments), the limits in pgfplots are 1.5 and 4.5, which can also be approximately confirmed by looking at the picture attached by the original poster. Note that this corresponds to yet another definition of quantile. The pgfplots manual, Revision 1.11 (2014/08/04), states the following about the computation of quantiles:

I am not sure how exactly this maps the quartiles of the example data set to *1.5* and *4.5*. We have x1=-9, x2=0, x3=1, x4=2, x5=x6=3, x7=4, x8=5, x9=10 and x10=14. Following the formula in the pgfplots manual, we get *lower quartile* 0.5*(x2+x3)=0.5*(0+1)=*0.5* and *upper quartile* 0.5*(x7+x8)=0.5*(4+5)=*4.5*. One consequence of using the formula presented in the pgfplots manual is that for computing small quantiles, one would need the non-existent value x0.

## How this works with the example data

Assuming what I wrote above is correct and everything works as documented, we can work through a boxplot of the example data from the perspective of both Matlab and pgfplots.

*Matlab*

Assuming that the quartiles are 1 and 5, the inter-quartile range (IQR) is 4. Now 1.5*IQR is 6. Third quartile + 1.5*IQR is 11. Data value 14 is larger than that which makes it an outlier. However, 10 is not an outlier. Thus the upper whisker extends to 10.

*pgfplots*

Assuming that the quartiles are 1.5 and 4.5, the inter-quartile range (IQR) is 3. Now 1.5*IQR is 4.5. Third quartile + 1.5*IQR is 9. Data value 14 is larger than that which makes it an outlier. Also 10 is an outlier. 5 is the largest number which is not an outlier. Thus the upper whisker extends to 5.

## Conclusion

I cannot tell for sure which of the boxplot computation methods is better or correct. I can add another data point: In case of the example data, `boxplot()`

in R gives the same result as Matlab.

One should also note that due to the limitations of computer arithmetics and the discrete nature of the whisker locations, different implementations using the same formula may also produce a different result depending on the data set.

## Best Answer

To prevent PGFPlots from adding padding around your data, set

`enlargelimits=false`

: