Solved – R code for two sample t-test in case of equal variances

hypothesis testingrself-study

In R, how can I perform a two sample t-test in the case of equal variances?

My actual problem is to compare different algorithms for length measurements in machine vision.

I have studied Applied Statistics and Probability for Engineers by Montgomery & Runger and I have tried to implement the test in R and compare the results to the R command t.test.

This is my code:

# Douglas C. Montgomery, George C. Runger
# Applied Statistics and Probability for Engineers
# Third Edition
# 10-3.1 Hypotheses Tests for a Difference in Means, Variances Unknown, p.337
# Case 1: sigma_1^2 = sigma_2^2 = sigma^2
#
# See also: EXAMPLE 10-5, p.339
two_sample_t_test_equal_variance <- function(x1,x2,Delta_0,alpha)
{
   n1 <- length(x1)
   n2 <- length(x2)
   dof <- n1+n2-2
   S1_squared <- var(x1)
   S2_squared <- var(x2)
   #pooled estimator:
   Sp_squared <- ((n1-1)*S1_squared + (n2-1)*S2_squared)/(n1+n2-2) 
   x1_bar <- mean(x1)
   x2_bar <- mean(x2)
   Sp <- sqrt(Sp_squared)
   t0 <- (x1_bar - x2_bar - Delta_0)/(Sp*sqrt(1/n1+1/n2))
   t_half_alpha_dof <- -qt(alpha/2,dof)
   reject_H0 <- (t0 > t_half_alpha_dof || t0 < - t_half_alpha_dof)
   if ( reject_H0 ) {
      cat("Reject H0 (alpha =",alpha,").\nH0 is mu1 - mu2 =", Delta_0, "\n\n" )
   } else {
      cat("Cannot reject H0 (alpha =",alpha,").\nH0 is mu1 - mu2 =", Delta_0, "\n\n" )
   }

   test_result <- t.test(x1, x2, alternative="two.sided", mu=Delta_0, 
                          paired=FALSE, var.equal = TRUE )

   accept_H0_by_R <- (test_result$p.value > alpha)
   if ( reject_H0 != !accept_H0_by_R ) {
      cat("WARNING: R t.test gives a different answer for accepting H0.\n" )
   }

   rel_error_warn <- 0.01
   if ( abs(test_result$statistic - t0)/abs(test_result$statistic) > rel_error_warn ) {
      cat("WARNING: t0 relative error is >", rel_error_warn, "\n"  )
   }
}

# data from EXAMPLE 10-5, p.339
cat1<-c(91.50, 94.18, 92.18, 95.39, 91.79, 89.07, 94.72, 89.21)
cat2<-c(89.19, 90.95, 90.46, 93.21, 97.19, 97.04, 91.07, 92.75)
two_sample_t_test_equal_variance(cat1,cat2,0,0.05)

Is my invocation of t.test correct for the case at hand?

Is there any error in the code?

I tagged the question with "homework" even if I am not at school since more than 10 years and I am self-studying statistics for job-related reasons.

Thank you.
Alessandro

Best Answer

If you were still in school then you might have a couple points taken off for saying that you accept the null hypothesis. The purists (well frequentist purists) will always say that we never accept the null, just fail to reject it.

As a style thing, generally test functions will return an object with the test statistic, p-value, etc. and not print anything. Then a print method will be used to print the results nicely. But for learning or simple use what you have done is fine (I would go the other route if you are planning on building on this, or doing more of your own tests).

In your last cat statement you use relative_error_warning, but I don't see it defined anywhere, did you mean rel_error_warn? Not that that line is ever likely to be run.

Everything looks correct, both your calculations and the running of t.test, I would expect the only differences you ever see to be rounding error (or due to handling of missing values).

Related Solutions

Solved – Two sample test for both equal variance and mean

You could do a likelihood ratio test. Calculate the MLE for each data set separately:

$$ L_1 \equiv \max_{\mu_{1}, \sigma_{1}} L_{1}(\mu_{1}, \sigma_{1}) $$

$$ L_2 \equiv \max_{\mu_{2}, \sigma_{2}} L_{2}(\mu_{2}, \sigma_{2}) $$

where $L_1$ is the log-likelihood function for the first data set and $L_2$ is the log-likelihood function for the second. Then, if the two data sets are independent, the maximized log-likelihood for the full data set (i.e. the two data sets together) is $L_1 + L_2$. This is the maximized log-likelihood when the two data sets are not restricted to having the same mean and variance.

Now, to get the MLE under the constraint that the two populations do have the same mean, you calculate

$$ L_{0} = \max_{\mu, \sigma} L(\mu, \sigma) $$

where $L$ is the log-likelihood function for the full data set. Then, under the null hypothesis you specified in your question,

$$ \lambda = 2 \bigg( (L_1 + L_2) - L_0 \bigg) $$

has an approximate (i.e. asymptotic) $\chi^2$ distribution on 2 degrees of freedom, assuming that the null hypothesis being tested doesn't include $\sigma_1 = \sigma_2 = 0$, which clearly can't be the case if you observe non-zero variance in your data. You can use that null distribution for significance testing.

Note: The joint MLE for the normally distributed data is the sample mean:

$$ \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} X_i $$

and the sample variance:

$$ \hat{\sigma}^{2} = \frac{1}{n} \sum_{i=1}^{n} (X_i-\hat{\mu})^2$$

Solved – Two-Sample t-Test for Equal Means with unequal variances for large samples

While you can compute the z-statistic, actually an ordinary Welch t-test will do that just fine - in R that's t.test with all its default options.

The form of test statistic is the same in both cases. The only difference is in which table is used, and if the size of the smaller group is large enough, the tests will give almost identical p-values.

The Welch test will handle very large sample sizes.

e.g. in R:

> x=rnorm(1e7,1.00001,1)
> y=rnorm(1e7,1.00002,2)
> t.test(x,y)

    Welch Two Sample t-test

data:  x and y
t = 0.9052, df = 14708415, p-value = 0.3654
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0007458214  0.0020259201
sample estimates:
mean of x mean of y 
 0.999757  0.999117

I don't see a problem

> # compare:
> 2*pnorm((-abs(mean(y)-mean(x))/sqrt(var(y)/length(y)+var(x)/length(x))))
[1] 0.3653657

The p-values turn out to be the same to all the places shown in the second figure.

If that's not what you want, you need to more carefully explain what you do want.

Example with very different $n$:

> x=rnorm(1e7,1.00001,1)
> y=rnorm(1e2,1.002,2)
> t.test(x,y)

    Welch Two Sample t-test

data:  x and y
t = 0.7382, df = 99.001, p-value = 0.4622
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2409398  0.5264124
sample estimates:
mean of x mean of y 
0.9998066 0.8570703 

> 2*pnorm((-abs(mean(y)-mean(x))/sqrt(var(y)/length(y)+var(x)/length(x))))
[1] 0.4604087

Once we're at 99df for the Welch, we start to notice a small difference in p-value from the asympotic result, but since we're at 99d.f., we're not really in the 'consider it as converged to normal' region.

Best Answer

Related Solutions

Solved – Two sample test for both equal variance and mean

Solved – Two-Sample t-Test for Equal Means with unequal variances for large samples

Related Question