Help me understand this qqplot

normal distributionnormality-assumptionqq-plotresiduals

I have plotted the qqplot of the residuals that my model generates with
the python module statsmodel

sm.qqplot(data, line ='r') and it looks like this
enter image description here

The points are placed on a straight line but the sample quantiles do not correspond to the theoretical quantiles expected from a normal distribution.

What does it mean?

Furthermore, I also tried using the scipy function probplot probplot(data,dist='norm',plot=plt) and I got

enter image description here

I don't understand: are points on the y-axis the sorted values or the quantiles?
the scipy documentation says

probplot generates a probability plot, which should not be confused
with a Q-Q or a P-P plot. Statsmodels has more extensive functionality
of this type, see statsmodels.api.ProbPlot.

Best Answer

It's the same plot. I am not an expert on your software, but the following is a confident series of guesses. The sorted residuals are one and the same as the quantiles in this context.

On the vertical axis are your residuals and on the horizontal axis are what you would get on average with a sample of the same size drawn from a normal distribution with the same mean (zero) and SD. If all points fell on the line, you would have a perfect normal distribution, but that is just an ideal. In fact experienced statistical people would expect faking of data in that case as readily as a genuine perfect fit.

In practice you have slightly fatter tails in the residuals than a normal distribution, which is not in itself cause for alarm. In essence, the model passes this particular health check. That doesn't mean that there might not be other diagnostics that would point to a better model.

It takes a bit of experience to know how much variability is acceptable and how much points to systematic departures that need to be addressed. One handle is a line-up test that goes back at least to Shewhart. Call up a random number routine to get several normal quantile plots, all drawn from a a normal with zero mean and the same SD. Then does the observed quantile plot stick out as very different from the fake plots. The idea is similar to a line-up in police procedure: show not just the suspect but other people too in a line-up and see whether a witness identifies the suspect. Another handle, and an even better one, is whether you can identify a change to the model that improves the quantile plot.

Related Question