I used Euclidean distance to compute the distance between two probability distribution. The example of computation shown in the Figure below.
As my understanding, the maximum distance occur while $A_1 = 0$ and $B_1 = 1$, $A_2 = 1$ and $B_1 = 0$. Whereas the Euclidean distance will be $$d = \sqrt{ 1^2 + 1^2 }$$
Is anyone have the theorem and mathematically prove about this? and does my assumption is correct, that $\sqrt2$ is the maximum bound for this distance?
Another question, while the bins more than two/ many bins, is that $\sqrt 2 $ still the maximum bound?
Best Answer
$d_{xy}^2 = \sum{(x-y)^2} = \sum x^2 + \sum y^2 - 2\sum xy$.
Given that in probability vectors all values are nonnegative, $d^2$ is max when the last term is zero. Then $d^2 = \sum x^2 + \sum y^2$.
In a probability vector all values (which are between 0 and 1) sum up to 1, $\sum x = \sum y = 1$. In such a vector, its theoretical maximum of $\sum v^2$ is attained when all its entries are 0 except one which is 1, and this maximum is $1 = \sum v$. Thus the theoretically maximal squared distance between two such vectors is 2: it is when $\sum x^2 = \sum x$ and $\sum y^2 = \sum y$. It also follows from the above description, that then $\sum xy$ can very easily happen to be zero (since in each vector there is just single nonzero element).