I am working with a small dataset (21 observations) and have the following normal QQ plot in R:

Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed to the right would be a better fit, is that right? Also, what other conclusions can we draw from the data?

## Best Answer

If the values lie along a line the distribution has the same shape (up to location and scale) as the theoretical distribution we have supposed.

Local behaviour: When looking at sorted sample values on the y-axis and (approximate) expected quantiles on the x-axis, we can identify from how the values in some section of the plot differ locally from an overall linear trend by seeing whether the values are more or less concentrated than the theoretical distribution would suppose in that section of a plot:As we see, less concentrated points increase more and more concentrated points increase less rapidly than an overall linear relation would suggest, and in the extreme cases correspond to a gap in the density of the sample (shows as a near-vertical jump) or a spike of constant values (values aligned horizontally). This allows us to spot a heavy tail or a light tail and hence, skewness greater or smaller than the theoretical distribution, and so on.

Overall apppearance:Here's what QQ-plots look like (for particular choices of distribution)

on average:But randomness tends to obscure things, especially with small samples:

Note that at $n=21$ the results may be much more variable than shown there - I generated several such sets of six plots and chose a 'nice' set where you could kind of see the shape in all six plots at the same time. Sometimes straight relationships look curved, curved relationships look straight, heavy-tails just look skew, and so on - with such small samples, often the situation may be much less clear:

It's possible to discern more features than those (such as discreteness, for one example), but with $n=21$, even such basic features may be hard to spot; we shouldn't try to 'over-interpret' every little wiggle. As sample sizes become larger, generally speaking the plots 'stabilize' and the features become more clearly interpretable rather than representing noise. [With some very heavy-tailed distributions, the rare large outlier might prevent the picture stabilizing nicely even at quite large sample sizes.]

You may also find the suggestion here useful when trying to decide how much you should worry about a particular amount of curvature or wiggliness.

A more suitable guide for interpretation in general would also include displays at smaller and larger sample sizes.