Two of the most famous approaches to compare 2 normalized histogram $Q_i$ and $P_i$ where $i$ is the bin number $i=1:d$ are as followed:
- Histogram Intersection $~~~s_{IS} = \sum_i min(P_i,Q_i)$
- ChebyChev X2 $~~~d_{sq-chi} = \sum_i \frac{(P_i-Q_i)^2}{(P_i+Q_i)}$
where 1 is a "similarity metric" and 2 is a "distance metric". Refer to Cha's Survey for more examples on similarity and distance metrics.
@Silverfish asked for an expansion of the answer by PolatAlemdar, which was not given, so I will try to expand on it here.
Why the name chisquare distance? The chisquare test for contingency tables is based on
$$
\chi^2 = \sum_{\text{cells}} \frac{(O_i-E_i)^2}{E_i}
$$
so the idea is to keep this form and use it as a distance measure. This gives the third formula of the OP, with $x_i$ interpreted as observation and $y_i$ as expectation, which explains PolatAlemdar's comment "It is used in discrete probability distributions", as for instance in goodness of fit testing. This third form is not a distance function, as it is asymmetric in the variables $x$ and $y$. For histogram comparison, we will want a distance function which is symmetric in $x$ and $y$, and the two first forms give this. The difference between them is only a constant factor $\frac12$, which is unimportant as long as you just chooses one form consistently (though the version with extra factor $\frac12$ is better if you want to compare with the asymmetric form). Note the similarity in these formulas with squared euclidean distance, that is not coincidence, chisquare distance is a kind of weighted euclidean distance. For that reason, the formulas in the OP is usually put under a root sign to get distances. In the following we follow this.
Chisquare distance is used also in correspondence analysis. To see the relationship to the form used there, let $x_{ij}$ be the cells of a contingency table with $R$ rows and $C$ columns. Denote the row totals be $x_{+j}=\sum_i x_{ij}$ and the column totals by $x_{i+}=\sum_j x_{ij}$. The the chisquare distance between rows $l,k$ is given by
$$
\chi^2(l,k) = \sqrt{\sum_j \frac1{x_{+j}}\left(\frac{x_{lj}}{x_{l+}}-\frac{x_{kj}}{x_{k+}} \right)^2 }
$$
For the case with only two rows (the two histograms) these recovers the OP's first formula (modulo the root sign).
EDIT
Answering to question in comments below: A book with long discussions of the chisquare distance is "CORRESPONDENCE ANALYSIS in PRACTICE (Second Edition)" by Michael Greenacre (Chapman & Hall). It is a well established name, coming from its similarity to chisquare as used with contingency tables. What distribution does it have? I have never studied that, but probably (under some conditions ...) it would have some chisquare distribution, approximately. Proofs should be similar to what is done with contingency tables, most literature about correspondence analysis does not go into distribution theory. A paper having some, maybe relevant such theory is ALTERNATIVE METHODS TO MULTIPLE CORRESPONDENCE ANALYSIS IN RECONSTRUCTING THE RELEVANT INFORMATION IN A BURT'S TABLE. Also see this other posts for some other relevant posts on this site.
Best Answer
This is not correct. Histograms show how many units fall into one of a finite number of bins, or what probability there is for an observation to fall into one of these bins.
Sometimes you already have a "natural" finite number of bins, as in the dice throwing example here: 11 bins. Then everything is straightforward.
However, sometimes you have too many bins to plot usefully, or even an infinite number of possible distributions, e.g., when the underlying distribution is continuous. Then, to build a histogram, you discretize the observations into a small number of bins, where every bin corresponds to a range of possible observations (where ranges should of course not overlap) - and then you proceed as above.