Solved – Minimizing the misclassification rate

decision-theorymachine learning

I am reading the book Pattern Recognition and Machine Learning, and have a specific question from a text snippet. I'll state a few lines in the text

Suppose that our goal is simply to make as few misclassifications as possible.
We need a rule that assigns each value of x to one of the available classes. Such a rule will divide the input space into regions Rk called
decision regions , one for each class, such that all points in Rk
are assigned to class Ck. {..skipping few lines}.In order to find the optimal decision rule, consider first of all the case of two classes, as in the cancer problem for instance. A mistake occurs when an input vector belonging to class C1 is assigned to class C2
or vice versa. The probability of this occurring is given by

$$\begin{equation}
p(mistake)=p(x \in R_1 ,C_2)+p(x \in R_2,C_1)
\end{equation}
$$
$$\begin{equation}
\hspace{5em}=\int_{R_1} p(x,C_2)dx+ \int_{R_2} p(x,C_1)dx
\end{equation}
$$

To minimize p(mistake) we should arrange that each x is assigned to whichever class has the smaller value of the integrand in the above equation.Thus if p(x,C1) >p(x,C2) for a given value of x , then we should assign that x to class C1.

Can anyone explain what is the last line supposed to mean? From what I could understand, if the probablity of a point x in the region R2 is classified to be in the region C2, this is increasing the misclassification. But the sentence says the opposite.

Please help.

Best Answer

For this you don't need to think about regions yet. For a given point x, you calculate

  • the probability that x belongs to C1 = p(x,C1)
  • the probability that x belongs to C2 = p(x,C2)

You will of course simply choose the class with the higher probability. That is what he states in the last sentence: if p(x,C1) > p(x,C2), then choose the class C1, otherwise choose C2.

Now lets think about regions: The region where you will choose C1 due to the above stated decision rule is denoted by R1, similarly the region where C2 is chosen is denoted by R2. Now the error consists of:

  • Points x which would belong to C2 but are in the region R1
  • Points x which would belong to C1 but are in the region R2

The probability of error is thus the integral over the probability that a point x in the region R1 belongs to C2 plus the integral over the probability that a point x in the region R2 belongs to the class C1.

Does this help to clarify?