Solved – Bayes decision theory: Classification error probability

bayesbayesianclassificationdecision-theory

In Bayesian decision theory: Given $\omega_1$ and $\omega_2$ as two classes for classification, $P\left( \omega_1 \right)$ and $P\left( \omega_2\right)$ their prior probabilities, $x$ the feature vector representing an unknown pattern, $P\left( \omega_1 | x\right)$ and $P\left( \omega_2 | x\right)$ as posteriori probabilities, $p\left( x |\omega_i\right)$ the likelihood function of $\omega_i$ with respect to $x$; and $R_i$ the region of feature space where decision is in favor of $\omega_i$

why is it that the minimization of the probability error $P_e$ , given by:

\begin{align*}
P_e &= P\left( x \in R_1, \omega_2 \right) + P\left( x \in R_2, \omega_1 \right)\\
&=P\left(\omega_2 \right) \int \limits_{R_1}p\left( x |\omega_2 \right) dx + P\left(\omega_1 \right) \int \limits_{R_2}p\left( x |\omega_1 \right)dx\\
&= \int \limits_{R_1}P\left( \omega_2 | x\right)p(x) dx + \int \limits_{R_2}P\left(\omega_1 | x \right)p(x)dx
\end{align*}

by choosing regions $R_1$ and $R_2$ of feature space so that:

\begin{align}
R_1 &: P\left(\omega_1 | x \right) > P\left( \omega_2 | x\right)\\
R_2 &: P\left(\omega_2 | x \right) > P\left( \omega_1 | x\right)
\end{align}

is said to be not always the best for minimizing $P_e$ ?

Best Answer

Now I found that assertion. Actually, that is not exactly what the author of the book claims. First, I believe that you understand where that Bayes decision rule comes from. It is demonstrated in that book.

So the problem with this approach is, that all sort of mistakes are weighted equally, that is, confusing class $\omega_{2}$ for $\omega_{1}$ or viceversa is treated equally. There is no fundamental difference. However, in practice this is not the case. It is not the same saying that you don't suffer an illness when you actually have it, than otherwise. And so on. So there are cases where you need to weight them differently.

When using the Bayes decision criterion you find the minimum classification error, i.e. the maximum accuracy. But for diagnosis for example, you want to minimize the false negative rate (it is better to think that you need to make further tests that to release a sick person).

And that is why the author says it is not always optimal to use that decision rule, and goes on in the book introducing a weighted error function. So, basically, you do not minimize the same $P_{e}$.

Related Question