Solved – QQ-plot doesn’t correspond with histogram

data visualizationhistogramqq-plotr

I made a histogram and QQ-plot, using this code:

hist(ang$Pkt, ,
     ylim=c(0,0.05),
     freq = F,
     breaks = 10)
curve(dnorm(x, mean=mean(ang$Pkt), 
     sd=sd(ang$Pkt)), 
     add=TRUE, col="red", lty="dotted", xaxt="n")
qqnorm(ang$Pkt)
qqline(ang$Pkt, col ="red")

I got those two images:

enter image description here

enter image description here

According to what I found it means that variables should be concentrated in the centre. But according to histogram, they are not. What can be the cause of wrong QQ plot?

Best Answer

I don't think there's much discrepancy in the impression of the two at all; the bin placement is slightly distorting the impression but not by all that much; I'd suggest roughly doubling the default bin count perhaps, but it looks much as I'd have expected.

Let's begin with what a uniform distribution would look like in a Q-Q plot and then we can discuss discrepancies from that.

Here's a normal Q-Q plot for 200 uniform values in (10,50):

Normal Q-Q plot of uniform data

You can see it sort of follows the line in the middle but flattens at each end.

The uniform is not "concentrated in the center". It has very short tails, but within its range it's not concentrated at all.

So what does your data do?

In the absence of the original data I have stretched the Q-Q plot so the scales line up, allowing us to see the Q-Q plot-points falling into the histogram bins:

Combined Q-Q plot and rotated histogram

We can see that each point in the Q-Q plot falls into a bin in the histogram. Notice that in the upper half, your QQ plot sort of follows the normal line until the top bin where it flattens sharply. Now consider that when the Q-Q plot is nearly flat, we have a lot of points stacking up together (see the top-right):

sections out of four Q-Q plots

(plot taken from this answer)

You can see in the upper part of your Q-Q plot this is happening, producing a taller bin. In the lower part the trend away from the line is much more gentle, running across more than one bin, producing slightly shorter bins.

So if I only had the Q-Q plot I'd expect it to look not too far from uniform, with a taller bin at the right end and shorter bins at the left end, which is more or less what we see.

To get the histogram to show a relative "concentration" in the center, you'd need longer tails than the normal and a steeper slope in the center, almost the exact reverse of what's going on in this QQ plot.

Related Question