Solved – Does the logic of “family-wise error” apply to effect size estimation

effect-sizefamilywise-errorstatistical significancetype-i-and-ii-errors

Background: As I understand, family-wise error refers to the inflation of Type I error when performing multiple hypothesis tests. For example, if I were to perform multiple post-hoc comparisons following an omnibus ANOVA test, then the collection of post-hoc comparisons would be the "family". As the number of tests increases, the risk of Type I error also increases. Hence, family-wise error corrections (like Bonferroni) adjust the alpha criterion in order to account for this Type I error inflation. For example, if I were to perform 6 tests, then I might divide the normative 0.05 alpha by 6 to obtain an adjusted alpha criterion of 0.008, and use this adjusted alpha to determine significance.

Question: Does the family-wise error logic also apply to effect size calculations? If so, are there any common correction procedures like Bonferroni that can be used to adjust effect sizes like eta-squared or Cohen's D? If not, why not?

Best Answer

Penalized maximum likelihood estimation is a good approach that leads to enhanced ability to take a point estimate out of context and have it not be overstated. For example, if one selects the group whose mean is farthest from the others, the result will be significantly biased, and penalization reduces this bias. James-Stein estimators also work, and the best of all approaches is a Bayesian hierarchical model because that is one of the few methods that allows full statistical inference to be carried out in the presence of shrinkage.

Related Solutions

ANOVA F-Test vs Multiple T-Tests – How Much Smaller Can P-Values Be?

Assuming equal $n$s [but see note 2 below] for each treatment in a one-way layout, and that the pooled SD from all the groups is used in the $t$ tests (as is done in usual post hoc comparisons), the maximum possible $p$ value for a $t$ test is $2\Phi(-\sqrt{2}) \approx .1573$ (here, $\Phi$ denotes the $N(0,1)$ cdf). Thus, no $p_t$ can be as high as $0.5$. Interestingly (and rather bizarrely), the $.1573$ bound holds not just for $p_F=.05$, but for any significance level we require for $F$.

The justification is as follows: For a given range of sample means, $\max_{i,j}|\bar y_i - \bar y_j| = 2a$, the largest possible $F$ statistic is achieved when half the $\bar y_i$ are at one extreme and the other half are at the other. This represents the case where $F$ looks the most significant given that two means differ by at most $2a$.

So, without loss of generality, suppose that $\bar y_.=0$ so that $\bar y_i=\pm a$ in this boundary case. And again, without loss of generality, suppose that $MS_E=1$, as we can always rescale the data to this value. Now consider $k$ means (where $k$ is even for simplicity [but see note 1 below]), we have $F=\frac{\sum n\bar y^2/(k-1)}{MS_E}= \frac{kna^2}{k-1}$. Setting $p_F=\alpha$ so that $F=F_\alpha=F_{\alpha,k-1,k(n-1)}$, we obtain $a =\sqrt{\frac{(k-1)F_\alpha}{kn}}$. When all the $\bar y_i$ are $\pm a$ (and still $MS_E=1$), each nonzero $t$ statistic is thus $t=\frac{2a}{1\sqrt{2/n}} = \sqrt{\frac{2(k-1)F_\alpha}{k}}$. This is the smallest maximum $t$ value possible when $F=F_\alpha$.

So you can just try different cases of $k$ and $n$, compute $t$, and its associated $p_t$. But notice that for given $k$, $F_\alpha$ is decreasing in $n$ [but see note 3 below]; moreover, as $n\rightarrow\infty$, $(k-1)F_{\alpha,k-1,k(n-1)} \rightarrow \chi^2_{\alpha,k-1}$; so $t \ge t_{min} =\sqrt{2\chi^2_{\alpha,k-1}/k}$. Note that $\chi^2/k=\frac{k-1}k \chi^2/(k-1)$ has mean $\frac{k-1}k$ and SD$\frac{k-1}k\cdot\sqrt{\frac2{k-1}}$. So $\lim_{k\rightarrow\infty}t_{min} = \sqrt{2}$, regardless of $\alpha$, and the result I stated in the first paragraph above is obtained from asymptotic normality.

It takes a long time to reach that limit, though. Here are the results (computed using R) for various values of $k$, using $\alpha=.05$:

k       t_min    max p_t   [ Really I mean min(max|t|) and max(min p_t)) ]
2       1.960     .0500
4       1.977     .0481   <--  note < .05 !
10      1.840     .0658
100     1.570     .1164
1000    1.465     .1428
10000   1.431     .1526

A few loose ends...

When k is odd: The maximum $F$ statistic still occurs when the $\bar y_i$ are all $\pm a$; however, we will have one more at one end of the range than the other, making the mean $\pm a/k$, and you can show that the factor $k$ in the $F$ statistic is replaced by $k-\frac 1k$. This also replaces the denominator of $t$, making it slightly larger and hence decreasing $p_t$.
Unequal $n$s: The maximum $F$ is still achieved with the $\bar y_i = \pm a$, with the signs arranged to balance the sample sizes as nearly equally as possible. Then the $F$ statistic for the same total sample size $N = \sum n_i$ will be the same or smaller than it is for balanced data. Moreover, the maximum $t$ statistic will be larger because it will be the one with the largest $n_i$. So we can't obtain larger $p_t$ values by looking at unbalanced cases.
A slight correction: I was so focused on trying to find the minimum $t$ that I overlooked the fact that we are trying to maximize $p_t$, and it is less obvious that a larger $t$ with fewer df won't be less significant than a smaller one with more df. However, I verified that this is the case by computing the values for $n=2,3,4,\ldots$ until the df are high enough to make little difference. For the case $\alpha=.05, k\ge 3$ I did not see any cases where the $p_t$ values did not increase with $n$. Note that the $df=k(n-1)$ so the possible df are $k,2k,3k,\ldots$ which get large fast when $k$ is large. So I'm still on safe ground with the claim above. I also tested $\alpha=.25$, and the only case I observed where the $.1573$ threshold was exceeded was $k=3,n=2$.

Best Answer

Related Solutions

ANOVA F-Test vs Multiple T-Tests – How Much Smaller Can P-Values Be?

Related Question