Histogram Image Processing – How to Compare Two Histograms Effectively
histogramimage processing
I would like to know what are the common techniques to compare two histograms?
I have histogram of two images and I want to see are they similar or not meaning that is there any correlation between them or not. The histograms are for two different parts of tissue.
Best Answer
If you understand a graphical technique under ''comparison'' you should probably try a QQ-plot (qqplot under R).
If you are thinking of an analytical way (i.e. statistical test), the two-sample Kolmogorov-Smirnov test is the most classical way (ks.test under R). There are other, more modern goodness-of-fit tests (this is how your task called statistically, this case in a two-sample sense) available, such as Anderson-Darling test (ad.test from the package kSamples).
You might also consider binning the data (with cut) and then performing a $\chi^2$-squared test (chisq.test).
(If you don't want to compare the whole distribution, just some characteristics, such as the mean or variance, you get back to elementary statistical tests. Also note that the above was a non-parametrical approach; if you're willing to assume some functional form for the distributions, you can also use classical tests.)
Note that for very large sample sizes (which might be the case judging from your histogram) you will likely obtain significant results from every test. (They'll have extremely high power, detecting even minuscule deviations from the null.) In that case, graphical techniques (QQ-plot) might be still useful to decide how large is the deviation.
Do the histograms both have to start at the same point (0 or 20)?
Highly recommended, and the axes should also end at the same number, and better if they are of the same length as well.
More babbling: It depends what do you mean by "comparing." From just the two histograms I can definitely compare and contrast the skewness of the distribution. But beyond that, visually it's not easy to say anything about the frequencies at different age because the class widths are different, and what's more crucial, the x-axes are different.
A panelled histogram that can successfully enhance visual comparison should have common y and x axes like this example:
If you allow the x- and y-axes to extend to wherever they want, it's difficult to compare across graphs.
Are these histograms sufficient/correct?
By themselves they are correct, but if it's for cross-gender comparison it's insufficient.
What if I said it is mandatory to use the formulas above to find the # of classes and class width?
I'd have a hard time not laughing. For many reasons:
The formula was probably aimed for making one histogram, and not for making two or more then put together for comparison. It sounds to me like a shoe-smith always wants to use up the whole piece of leather in making a shoe, and he can never sell any pair cause no any two of them actually make a pair.
Histogram has more parameters than just sample size. The min and the max, the distribution of the people, the nuances one would like to highlight, etc. It's unfair to overlook all other important information.
When it comes to visualization of data, it's important to realize that we are to communicate the data, not package it and force them down the throat of the viewers'. I'd probably stay away from all these pragmatic "you have to do this and do that..." bossy instructions. (Though I give those instructions as well, guilty as charged.)
But(!), if your professor/supervisor--who can decide if you will pass/fail the course--said you have to use this formula, then I'll suggest pick your battle wisely.
Should I use the same class start for each in the histograms?
Not necessarily, but recommended because this form is easiest to perceive. There are histogram that uses unequal bin sizes. If for some reason a certain age categorization is more important in females than males, you may make them unequal.
For instance, if some country's legal drinking age is 18, and another is 21, you may group all <18 into one column for the first country, and <21 into another column for the second country. The question is that, would this be more meaningful? Or would you see the same pattern if you bin them at single year age?
In a nut shell, you'll need to know what you want the viewers to know, and work backward. Avoid starting with recipes.
Best Answer
If you understand a graphical technique under ''comparison'' you should probably try a QQ-plot (
qqplot
under R).If you are thinking of an analytical way (i.e. statistical test), the two-sample Kolmogorov-Smirnov test is the most classical way (
ks.test
under R). There are other, more modern goodness-of-fit tests (this is how your task called statistically, this case in a two-sample sense) available, such as Anderson-Darling test (ad.test
from the packagekSamples
).You might also consider binning the data (with
cut
) and then performing a $\chi^2$-squared test (chisq.test
).(If you don't want to compare the whole distribution, just some characteristics, such as the mean or variance, you get back to elementary statistical tests. Also note that the above was a non-parametrical approach; if you're willing to assume some functional form for the distributions, you can also use classical tests.)
Note that for very large sample sizes (which might be the case judging from your histogram) you will likely obtain significant results from every test. (They'll have extremely high power, detecting even minuscule deviations from the null.) In that case, graphical techniques (QQ-plot) might be still useful to decide how large is the deviation.