Solved – Relation between precision, recall and sample size

descriptive statisticsmachine learningprecision-recallsample-sizesampling

I have a large data set for binary classification problem. Now in order to fit model to data I have been trying modeling using various sample size. For each sample size I gets a different precision recall values. One way to select the model based which has high precision or recall having sample size greater than certain threshold. How to determine this threshold of sample size or in general what is relation between precision, recall and sample size.

Best Answer

You can calculate the threshold on sample size for given precision/recall assurances. Refer this article: Statistics: An Introduction to sample size calculations

Edit:

As documented in the linked article,

There are two approaches to sample size calculations:

Precision-based: With what precision do you want to estimate the proportion, mean difference . . . (or whatever it is you are measuring)?

Power-based: How small a deviation from hypothesis is important to detect and with what degree of certainty? The smaller the difference you regard as important to detect, the greater the sample size required.

Suppose you want to be able to estimate your unknown parameter with a certain degree of precision. What you are essentially saying is that you want your confidence interval to be a certain width. In general a 95% confidence interval is given by the formula:

Estimate ± 2(approx) × SE

where SE is the standard error of whatever you are estimating. 95% confidence intervals are usually based on the normal distribution or a t-distribution; for a normal distribution the value is 1.96; for t-distributions the value is generally just over 2.

The formula for any standard error always contains n, the sample size. Therefore, if you specify the width of the 95% confidence interval, you have a formula that you can solve to find n.

Power-based sample size calculations relate to hypothesis testing.

As a matter of good scientific practice, a significance level is chosen before data collection and is often set to 0.05 (5%). This significance level, denoted by α, represents the conditional probability of type I error.

Suppose you want to compare the mean in one group to the mean in another (i.e. carry out an unpaired t-test). The number, n, required in each group is given by

n = f(α, β) · 2s^2/δ^2

Where: α is the significance level (using a two-sided test) — i.e. your cut-off for regarding the result as statistically significant.

1 − β is the statistical power of your test.

f(α, β) is a value calculated from α and β — see table for f(α, β) in article attached.

δ is the smallest difference in means that you regard as being important to be able to detect.

s is the standard deviation of whatever it is we’re measuring — this will need to be estimated from previous studies.

Similar formulae can be obtained for other types of analysis by reference to appropriate texts.

Related Solutions

Solved – Threshold in precision/recall curve

Short answer: Torgo describes the usual method of generating such curves.

You can choose your threshold (= cut-off limit in the cited text) at any value. The cited text refers to one such choice as a working point.
That is, for a given working point, you'll observe exactly one (precision; recall) pair, i.e. one point in your graph. The precision-recall-curve is obtained by varying the threshold over the whole range of the classifier's continuous output ("scores", posterior probabilities, "votes") thus generating a curve from many working points.

Edit with respect to the comment:

I think "varying the threshold" is the usual way to explain or define the curve.

For the calculation, it is more efficient to sort the scores, and then see how precision and recall change when adding the next case: precision and recall can only change when the change in the threshold is large enough to cover the next score.

Consider this example:

case   true class   predicted score (high => class B)
1      A            0.2
3      B            0.5
2      A            0.6
4      B            0.9

threshold      recall    precision
> 0.9          N/A       0.0
(0.6, 0.9]     0.5       1.0        
(0.5, 0.6]     0.5       0.5
(0.2, 0.5]     1.0       0.67
< 0.2          1.0       0.5

That is, the precision-recall-curve acutally consists of points. It jumps from one point to the next when the threshold "crosses" an acutally predicted score. A smooth curve will result only for large numbers of test cases.

Solved – Precision and recall are equal when the size is same

Let's call the number of users who are correctly classified as experts by $tp$ (true positive), the number of users who are incorrectly classified as non-experts (but they are experts) by $fn$ (false negative), and the number who are incorrectly classified as experts (because they are not) by $fp$ (false positive).

The precision is defined as $p = \frac{tp}{tp + fp}$, where the recall is defined as $r = \frac{tp}{tp + fn}$. If precision and recall are equal, we have $p=r$, and since they have the same denominator, we get $fp = fn$.

This means that our algorithm has classified an equal amount of users as false positives, as it classified false negatives. This may be a good thing if the data set had an equal set of experts/non-experts, but it also may be a hint of the fact that too many non-experts were classified as experts (if there were not many experts in the set), or too many experts were not classified as experts (of there were many experts in the set).

Best Answer

Related Solutions

Solved – Threshold in precision/recall curve

Solved – Precision and recall are equal when the size is same

Related Question