Solved – Calculating Bayes decision boundary on a practical example

bayesianclassification

In "The elements of Statistical Learning", Chapter 2, the following example is presented: first generate $10$ means $m_k$ from a bivariate Gaussian distribution $\mathcal{N}((1, 0)^t, I)$ and label this class BLUE. Similarly $10$ more is drawn from $\mathcal{N}((0, 1)^t, I)$ and are labelled ORANGE. Then for each class we generate $100$ observations as follows: for each observation pick $m_k$ at random with probability $1/10$ and then generate a $\mathcal{N}(m_k, I/5)$.

At the end of that chapter there's a question (Exercise 2.2):

Show how to compute the Bayes decision boundary for the simulation example.

In an unofficial solution manual for the book the following answer is present:

For this problem one is supposed to regard the points $p_i$ and $q_i$ below
as fixed. If one does not do this, and instead averages over possible
choices, then since the controlling points are $(1,0)$ and $(0,1)$, and all
probabilities are otherwise symmetric between these points when one
integrates out all the variation, the answer must be that the boundary is the
perpendicular bisector of the interval joining these two points. The
simulation draws 10 points $p_1, \dots , p_{10} \in \mathbb{R}^2$ from
$\mathcal{N}((1, 0)^t, I)$, and $10$ points $q_1, \dots, q_{10} \in \mathbb{R}^2$ from $\mathcal{N}((0, 1)^t, I)$. The formula for the Bayes
decision boundary is given by equating likelihoods. We get an equation in the
unknown $z \in \mathbb{R}^2$, giving a curve in the plane:

$$\sum_i \exp \left( -5 ||p_i – z||^2 / 2\right) = \sum_j \exp \left( -5 ||q_j – z||^2 / 2\right).$$

In this solution, the boundary is given as the equation of equality between
the two probabilities, with the $p_i$ and $q_j$ constant and fixed by
previously performed sampling. Each time one re-samples the $p_i$ and $q_j$,
one obtains a different Bayes decision boundary.

I have a few questions about the proposed solution:

1) Why likelihoods are just sums.

2) Why can we keep $p_i$ and $q_j$ fixed (the reasoning isn't clear)? Why the answer must be that the boundary is perpendicular bisector?

3) Shouldn't summation indices range over a set $[1, \dots, 100]$?

Best Answer

1) I am guessing that the likelihoods are sums because the gaussian means are extracted with equal probability. Actually all members of the sum should have a 1/10 weight but because all them have the same weight it was simply ignored. 2) The data were generated with the given pi and qj - that is why they are fixed 3) They are only 10 gaussians for each of the two classes - you do not need to sum to 100.