Solved – Comparing means with unequal variance and very different sample sizes

meansample-sizesmall-samplestatistical significancet-test

I am trying to compare the means of the same variable between men and women. This is the statistics:

     N        Mean        Variance    Coef. Var.     Gender    
   2000      26.12         10.89         0.13         Male        
     50      56.10         25.01         0.09        Female

Neither variable is normally distributed but taking the log makes it pretty darn close. What is the appropriate way to test the means between males and females? Should I use the log or not? Any additional advice using Stata would be helpful.

My initial reaction is that females fare better than men, but I want to be statistically rigorous.

Best Answer

The traditional test for comparing two sample means is the t-test. There are no assumptions about the sizes of the samples, so it is OK if they are different.

However, you touch upon the normality assumption. Even if the population is not normally distributed, the Central Limit Theorem allows us to infer normality as the sample sizes increase. This means your test will be approximate, but the sample size for female is a little low.

Finally, the result of the t-test will be different for the original and log-ed data. Do you have a specific reason based on your data to use the logarithm? Perhaps there is another assumption you would like to test about the behavior of the log of your data? Do not take the log simply to create a normal curve if there is no deeper meaning, but for fun compare the difference between the two results anyway!

Related Question