For a two-sample t test on samples from populations with
the same variance $\sigma^2,$ you have two proposed
variance estimates
$$ S_p^2 = \frac{(n_1 - 1)S^2_1+(n_2-1)S_2^2}{n_1+n_2-2},$$
and
$$ S_a^2 = \frac{(n_1S^2_1+n_2)S^2_2}{n_1+n_2}. $$
For $S_p^2,$ you have found $S_i^2; i=1,2,$ each of which requires computing a sample mean $\bar X_i, 1,2.$ So,
$$ \frac{\nu S_p^2}{\sigma^2} \sim
\mathsf{Chisq(\nu)}.$$ where $\nu = n_1+n_2 - 2.$
For $S_a^2,$ the distribution theory is not so clear.
You say something about $S_a^2$ being unbiased, but that
hardly specifies a distribution. Let's use The same
degrees of freedom $\nu$ as above for an experiment.
Simulation: Begin by looking at $m = 10\,000$ samples
x1
of size $n_1 = 2$ from
$\mathsf{Norm}(\mu_1 = 100, \sigma_1 = 15)$ and x2
of size $n_2=3$ from $\mathsf{Norm}(\mu_2 = 110, \sigma_2 = 15).$
We find the sample variances, the pooled variance estimat
and the average variance estimate. Then we look at the
corresponding chi-squared random variables.
set.seed(2022)
n1 = 2; m=10^5
M1 = matrix(rnorm(n1*m, 100, 15), nrow=m)
v1 = apply(M1, 1, var)
n2 = 3
M2 = matrix(rnorm(n2*m, 110, 15), nrow=m)
v2 = apply(M2, 1, var)
pool = (v1 + 2*v2)/(n1+n2-2)
q.p = (n1+n2-2)*pool/15^2
avg.v = (v1+v2)/(n1+n2) ####
q.a = (n1+n2)*avg.v/15^2
Then we compare the results with the density functions
of the corresponding chi-squared distribution.
For the pooled estimate $S_p^2$ we get a good match,
but for $S_a^2$ the fit is not good.
R code for graphs:
par(mfrow=c(1,2))
hist(q.p, prob=T, ylim=c(0,.35), col="skyblue2", main="Pooled")
curve(dchisq(x, n1+n2-2), add=T, lwd=2, col="orange")
hist(q.a, prob=T, ylim=c(0,.35), col="skyblue2", main="Averaged")
curve(dchisq(x, n1+n2-1), add=T, lwd=2, col="orange")
par(mfrow=c(1,1))
Best Answer
There appears to be a difference in the interpretation of a statistical formula. One quick, simple, and compelling way to resolve such differences is to simulate the situation. Here, you have noted there will be a difference when the players play different numbers of games. Let's therefore retain every aspect of the question but change the number of games played by the second player. We will run a large number ($10^5$) of iterations, collecting the two versions of the $F$ statistic in each case, and draw histograms of their results. Overplotting these histograms with the $F$ distribution ought to determine, without any further debate, which formula (if any!) is correct.
Here is
R
code to do this. It takes only a couple of seconds to execute.Although it is unnecessary, this code uses the common mean ($375$) and pooled standard deviation (computed as
s
in the first line) for the simulation. Also of note is that the histograms are drawn on logarithmic scales, because when the numbers of games get small (n2
, equal to $3$ here), the $F$ distribution can be extremely skewed.Here is the output. Which formula actually matches the $F$ distribution (the red curve)?
(The difference in the right hand side is so dramatic that even just $100$ iterations would suffice to show its formula has serious problems. Thus in the future you probably won't need to run $10^5$ iterations; one-tenth as many will usually do fine.)
If you like, modify this to fit some of the other examples you have looked at.