Solved – Decision theory – reject option

In decision theory, we define a reject option ($\theta$) so that when making decision is difficult, the case will be ignored.

Suppose $1/k \leq \theta \leq1$:

If $\theta=1/k$ no cases will be rejected

and
If $\theta=1$ all cases will be rejected.

Why this happes and how can I prove it for $\theta=1$ and $1/k$?

Section 1.5.3, chapter 1 of Pattern recognition by Bishop: (reject option)

We have seen that classification errors arise from the regions of input space
where the largest of the posterior probabilities p(Ck|x) is significantly less than unity,
or equivalently where the joint distributions p(x, Ck) have comparable values. These
are the regions where we are relatively uncertain about class membership. In some
applications, it will be appropriate to avoid making decisions on the difficult cases
in anticipation of a lower error rate on those examples for which a classification decision
is made. This is known as the reject option. For example, in our hypothetical
medical illustration, it may be appropriate to use an automatic system to classify
those X-ray images for which there is little doubt as to the correct class, while leaving
a human expert to classify the more ambiguous cases. We can achieve this by
introducing a threshold θ and rejecting those inputs x for which the largest of the
posterior probabilities p(Ck|x) is less than or equal to θ. This is illustrated for the
case of two classes, and a single continuous input variable x, in Figure 1.26. Note
that setting θ = 1 will ensure that all examples are rejected, whereas if there are K
classes then setting θ < 1/K will ensure that no examples are rejected. Thus the
fraction of examples that get rejected is controlled by the value of θ.

Best Answer

An input $x$ is rejected if the largest posterior probability $\max_{i=1,\ldots,k}{\rm P}(\mbox{belongs to class } i|x)\leq\theta$.

There are $k$ classes and the probabilities must sum to 1: $$\sum_{i=1}^k{\rm P}(\mbox{belongs to class } i|x)=1.$$ A consequence of this is that $$\max_{i=1,\ldots,k}{\rm P}(\mbox{belongs to class } i|x)\geq 1/k$$ as the sum of probabilities otherwise necessarily must be less than $1$. This means that if $\theta=1/k$ the input is never rejected, since there is at least one posterior probability that is larger than $\theta$.

Similarly, we have that $$\max_{i=1,\ldots,k}{\rm P}(\mbox{belongs to class } i|x)\leq 1$$ with equality only if there are $x$ that only can belong to one class. If no such $x$ can exist, $$\max_{i=1,\ldots,k}{\rm P}(\mbox{belongs to class } i|x)< 1$$ which means that if $\theta=1$ all inputs are rejected, since the maximum posterior probability always is less than $1$.

Best Answer

Related Solutions

Bayesian Credible Intervals – Decision-Theoretic Justification of Procedures

Highest posterior density intervals

Quantile-based credible intervals

Related Question