# Machine Learning – Understanding the Accuracy of a Random Classifier

classificationmachine learning

I was wondering how to compare accuracy of my classifier to a random one.

I'm going to elaborate further. Let's say we have a binary classification problem. We have $n^+$ positive examples and $n^-$ negative examples in the test set. I say that and record is positive with a probability of $p$.

I can estimate that, on average, I get:

\begin{align}
TP &= pn^+ \\[2pt]
TN &= (1-p)n^- \\[2pt]
FN &= pn^- \\[2pt]
FP &= (1-p)n^+
\end{align}

thus:

\begin{align}
\mbox{acc} &= \frac{TP+TN}{TP+TN+FP+FN} \\[9pt]
&= \frac{pn^+ + (1-p)n^-}{pn^+(1p)n^-pn^- + (1-p)n^+} \\[9pt]
&= \frac{pn^+ + (1-p)n^-}{n^+ + n^-}
\end{align}

For example if we have $n^+=n^-$ accuracy is always $1/2$ for any $p$.

This can be extended in multiclass classification:
$$\mbox{acc} = \frac{\sum_{i=1}^c p_i n_i}{\sum n_i}$$
where $p_i$ is the probability to say "it is in the $i$th class", and $n_i$ is the count of records of class $i$. Also in this case, if $n_i = n/c \ \forall i$ then
$$\mbox{acc} = 1/c$$

But, how can I compare the accuracy of my classifier without citing a test set? For example if I said: my classifier accuracy is 70% (estimated somehow, e.g. Cross-Validation), is it good or bad compared to a random classifier?