Chi-Square Test – Comparing Two Histograms Using Chi-Square Distance

chi-squared-testcorrespondence-analysisdistancehistogramimage processing

I want to compare two images of faces. I calculated their LBP-histograms. So now I need to compare these two histograms and get something that will tell how much these histograms are equal (0 – 100%).

There are many ways of solving this task, but authors of LBP method emphasize (Face Description with Local Binary Patterns: Application to Face Recognition. 2004) that Chi-Square distance perfoms better than Histogram intersection and Log-likelihood statistic.

Authors also show a formula of Chi-Square distance:

$$
\sum_{i=1}^{n} \cfrac{(x_i – y_i)^2} {(x_i + y_i)}
$$

Where $n$ is a number of bins, $x_i$ is a value of first bin, $y_i$ is a value of second bin.

In some researches (for example The Quadratic-Chi Histogram Distance Family) I saw that the formula of Chi-Square distance is:

$$
\cfrac{1}{2}\sum_{i=1}^{n} \cfrac{(x_i – y_i)^2} {(x_i + y_i)}
$$

And there http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm I see that formula of Chi-Square distance is:

$$
\sum_{i=1}^{n} \cfrac{(x_i – y_i)^2} {y_i}
$$

I stuck with it. I have several questions:

  1. What expression should I use?
  2. How should I interpret a result of difference? I know that difference that is equal to 0 means that both histograms are equal, but how can I know when both histograms are totally different? Do I need to use a Chi-Square table for it? Or do I need to use a threshold? Basically I want to map difference to percents.
  3. Why these three expressions are different?

Best Answer

@Silverfish asked for an expansion of the answer by PolatAlemdar, which was not given, so I will try to expand on it here.

Why the name chisquare distance? The chisquare test for contingency tables is based on $$ \chi^2 = \sum_{\text{cells}} \frac{(O_i-E_i)^2}{E_i} $$ so the idea is to keep this form and use it as a distance measure. This gives the third formula of the OP, with $x_i$ interpreted as observation and $y_i$ as expectation, which explains PolatAlemdar's comment "It is used in discrete probability distributions", as for instance in goodness of fit testing. This third form is not a distance function, as it is asymmetric in the variables $x$ and $y$. For histogram comparison, we will want a distance function which is symmetric in $x$ and $y$, and the two first forms give this. The difference between them is only a constant factor $\frac12$, which is unimportant as long as you just chooses one form consistently (though the version with extra factor $\frac12$ is better if you want to compare with the asymmetric form). Note the similarity in these formulas with squared euclidean distance, that is not coincidence, chisquare distance is a kind of weighted euclidean distance. For that reason, the formulas in the OP is usually put under a root sign to get distances. In the following we follow this.

Chisquare distance is used also in correspondence analysis. To see the relationship to the form used there, let $x_{ij}$ be the cells of a contingency table with $R$ rows and $C$ columns. Denote the row totals be $x_{+j}=\sum_i x_{ij}$ and the column totals by $x_{i+}=\sum_j x_{ij}$. The the chisquare distance between rows $l,k$ is given by $$ \chi^2(l,k) = \sqrt{\sum_j \frac1{x_{+j}}\left(\frac{x_{lj}}{x_{l+}}-\frac{x_{kj}}{x_{k+}} \right)^2 } $$ For the case with only two rows (the two histograms) these recovers the OP's first formula (modulo the root sign).

EDIT

Answering to question in comments below: A book with long discussions of the chisquare distance is "CORRESPONDENCE ANALYSIS in PRACTICE (Second Edition)" by Michael Greenacre (Chapman & Hall). It is a well established name, coming from its similarity to chisquare as used with contingency tables. What distribution does it have? I have never studied that, but probably (under some conditions ...) it would have some chisquare distribution, approximately. Proofs should be similar to what is done with contingency tables, most literature about correspondence analysis does not go into distribution theory. A paper having some, maybe relevant such theory is ALTERNATIVE METHODS TO MULTIPLE CORRESPONDENCE ANALYSIS IN RECONSTRUCTING THE RELEVANT INFORMATION IN A BURT'S TABLE. Also see this other posts for some other relevant posts on this site.