+1 to @NickSabbe, for 'the plot just tells you that "something is wrong"', which is often the best way to use a qq-plot (as it can be difficult to understand how to interpret them). It is possible to learn how to interpret a qq-plot by thinking about how to make one, however.
You would start by sorting your data, then you would count your way up from the minimum value taking each as an equal percentage. For example, if you had 20 data points, when you counted the first one (the minimum), you would say to yourself, 'I counted 5% of my data'. You would follow this procedure until you got to the end, at which point you would have passed through 100% of your data. These percentage values can then be compared to the same percentage values from the corresponding theoretical normal (i.e., the normal with the same mean and SD).
When you go to plot these, you will discover that you have trouble with the last value, which is 100%, because when you've passed through 100% of a theoretical normal you are 'at' infinity. This problem is dealt with by adding a small constant to the denominator at each point in your data before calculating the percentages. A typical value would be to add 1 to the denominator; for example, you would call your 1st (of 20) data point 1/(20+1)=5%, and your last would be 20/(20+1)=95%. Now if you plot these points against a corresponding theoretical normal, you will have a pp-plot (for plotting probabilities against probabilities). Such a plot would most likely show the deviations between your distribution and a normal in the center of the distribution. This is because 68% of a normal distribution lies within +/- 1 SD, so pp-plots have excellent resolution there, and poor resolution elsewhere. (For more on this point, it may help to read my answer here: PP-plots vs. QQ-plots.)
Often, we are most concerned about what is happening in the tails of our distribution. To get better resolution there (and thus worse resolution in the middle), we can construct a qq-plot instead. We do this by taking our sets of probabilities and passing them through the inverse of the normal distribution's CDF (this is like reading the z-table in the back of a stats book backwards--you read in a probability and read out a z-score). The result of this operation is two sets of quantiles, which can be plotted against each other similarly.
@whuber is right that the reference line is plotted afterwards (typically) by finding the best fitting line through the middle 50% of the points (i.e., from the first quartile to the third). This is done to make the plot easier to read. Using this line, you can interpret the plot as showing you whether the quantiles of your distribution progressively diverge from a true normal as you move into the tails. (Note that the position of points further out from the center are not really independent of those closer in; so the fact that, in your specific histogram, the tails seem to come together after having the 'shoulders' differ does not mean that the quantiles are now the same again.)
You can interpret a qq-plot analytically by considering the values read from the axes compare for a given plotted point. If the data were well described by a normal distribution, the values should be about the same. For example, take the extreme point at the very far left bottom corner: its $x$ value is somewhere past $-3$, but its $y$ value is only a little past $-.2$, so it is much further out than it 'should' be. In general, a simple rubric to interpret a qq-plot is that if a given tail twists off counterclockwise from the reference line, there is more data in that tail of your distribution than in a theoretical normal, and if a tail twists off clockwise there is less data in that tail of your distribution than in a theoretical normal. In other words:
- if both tails twist counterclockwise you have heavy tails (leptokurtosis),
- if both tails twist clockwise, you have light tails (platykurtosis),
- if your right tail twists counterclockwise and your left tail twists clockwise, you have right skew
- if your left tail twists counterclockwise and your right tail twists clockwise, you have left skew
I think further clarification of exactly what you want to compute or estimate ultimately will help (most of us aren't physicists, so explain-like-we're-intelligent-eight-year-olds). One possibility would be to draw a diagram of an ideal situation (what it would be like if your bins were able to be super-narrow relative to the spot-widths), clarifying what you ideally want to find, and then perhaps draw another with wider bins/narrower spots to clarify the circumstances and again explain what you want to calculate/estimate in relation to what you're observing.
If this is a per-spot problem, something you're trying to do for each individual spot, (where the spots are well separated) perhaps you could describe your problem in terms of just doing it for a single spot. (If that's not the case, additional clarification may be needed on that as well.)
It sounds like you're trying to get the uncertainty (or perhaps a variance) in the estimate of the center of the spot in the presence of your data being (unavoidably) binned.
It looks like you have two sources of variation; one is the underlying error in the intensity around a spot, and the other is the error introduced by binning, which dominates when breadth of the spot is small.
You can't just assume your values are all at the center of the histogram bin; when the spot is so wide that the distribution within a bin is nearly uniform, maybe you can approximate it that way (though that biases your variance estimates), but when it's far smaller than one bin, you don't know where it is in the bin. If the spot is really narrow its center might be very far from the middle of the bin.
You can't just ignore that.
Best Answer
Of course, why not?
Here's an example (one of dozens I found with a simple google search):
(Image source is is the measuring usability blog, here.)
I've seen means, means plus or minus a standard deviation, various quantiles (like median, quartiles, 10th and 90th percentiles) all displayed in various ways.
Instead of drawing a line right across the plot, you might mark information along the bottom of it - like so:
There's an example (one of many to be found) with a boxplot across the top instead of at the bottom, here.
Sometimes people mark in the data:
(I have jittered the data locations slightly because the values were rounded to integers and you couldn't see the relative density well.)
There's an example of this kind, done in Stata, on this page (see the third one here)
Histograms are better with a little extra information - they can be misleading on their own
You just need to take care to explain what your plot consists of! (You'd want a better title and x-axis label than I used here, for starters. Plus an explanation in a figure caption explaining what you had marked on it.)
--
One last plot:
--
My plots are generated in R.
Edit:
As @gung surmised,
abline(v=mean...
was used to draw the mean-line across the plot andrug
was used to draw the data values (though I actually usedrug(jitter(...
because the data was rounded to integers).Here's a way to do the boxplot in between the histogram and the axis:
I'm not going to list what everything there is for, but you can check the arguments in the help (
?boxplot
) to find out what they're for, and play with them yourself.However, it's not a general solution - I don't guarantee it will always work as well as it does here (note I already changed the
at
andboxwex
options*). If you don't write an intelligent function to take care of everything, it's necessary to pay attention to what everything does to make sure it's doing what you want.Here's how to create the data I used (I was trying to show how Theil regression was really able to handle several influential outliers). It just happened to be data I was playing with when I first answered this question.
* -- an appropriate value for
at
is around -0.5 times the value ofboxwex
; that would be a good default if you write a function to do it;boxwex
would need to be scaled in a way that relates to the y-scale (height) of the boxplot; I'd suggest 0.04 to 0.05 times the upper y-limit might often be okay.Code for the marginal stripchart: