Solved – the empirical size of a test

hypothesis testingsimulationstatistical-powertype-i-and-ii-errors

Now I am doing research of a proposed test statistic. I want to calculate the empirical sizes for different sample size of the proposed test statistic under the nominal type I error,such as 0.05. What is the formula of the empirical size? How can it be realized in R? Is there a function in R that can be called?
What is the difference between empirical size and power?

Best Answer

Empirical size refers to the possibility that the nominal size that the user of the test chooses (say, 5%) may not coincide with the actual rejection frequency of the test. This for example may be the case when some assumption required for the test statistic's property is not met. E.g., many null distributions are derived asymptotically, i.e., under the assumption that $n\to\infty$. In finite samples, the empirical size may (and generally will) then differ from 5%.

A simple example is given by the t-test when sampling from a normal population. Here, we are in the exceptional situation that we may actually derive the null distribution ($t(n-1)$). If we approximate the null distribution by the normal distribution, as justified by the CLT as $n\to\infty$ we would be using 1.96 as the (two-sided) critical value, although the .975 c.v. of the $t(n-1)$ distribution would be more accurate. The fraction of rejections when using 1.96 as the critical value then is the empirical size, and the difference to 0.05 is commonly called size distortion.

Unfortunately, we do not know the exact finite-sample distribution in most cases. One alternative is to resort to simulation studies.

For my example (where it is actually superfluous, because analytical results are available), this might look as follows:

reps <- 100000
DecisionAsymptoticCriticalValue <- DecisionExactCriticalValue <- matrix(NA,reps)
n <- 20
AsymptoticCriticalValue <- qnorm(.975)
ExactCriticalValue <- qt(.975,n-1)

for (i in 1:reps){
  x <- rnorm(n)
  tstat <- sqrt(n)*mean(x)/sd(x)
  DecisionAsymptoticCriticalValue[i] <- (abs(tstat) > AsymptoticCriticalValue)
  DecisionExactCriticalValue[i] <- (abs(tstat) > ExactCriticalValue)
}

The results confirm the analytical predictions:

> (mean(DecisionAsymptoticCriticalValue))
[1] 0.06459

> (mean(DecisionExactCriticalValue))
[1] 0.05012
> 

Empirical size is therefore at best indirectly related to power, as it deals with rejection rates under the null. They are indirectly related because if a test is liberal (i.e., empirical size > nominal size) it will reject too often if the null is true, and will therefore typically also reject more often when the null is false, i.e., have higher power. That is however typically not viewed as a good thing because size is not "controlled" and rejections therefore are spurious.