Statistical Tests – How to Compare Two Groups Generated Through Subtraction of Different Control Groups?

t-test

I am still learning how to apply statistics properly so please bear with me (and point out) if I ask something stupid. I am sure this question has been answered already. However, I don't know the best search terms to find the answer. I have searched both here and google but didn't find anything of use. Please feel free just to point me to the right terminology to ask my question properly.

I will soon be trying to do a simple statistical comparison between two groups to find if their means are significantly different. However, these groups are created from subtracting these data from two separate control dataset means. I was planning to simply do a two sample t-test, however, my statistical spidey sense tingled. I feel that this is unreasonable comparison considering that these data have been generated from a distribution who's variation hasn't been taken into account.

Could I use a two sample t-test if the two control groups were not significantly different from each other? Assuming this is true please could we assume that the two groups are significantly different.

I would like to know if my statistical intuition is correct in thinking this wasn't really a valid way to provide a comparison. I don't know if it makes any difference but $n_{control}$ is 5 and $n_{samples}$ = 8. n is the same for both groups. The all the distributions are expected to be normal.

For clarity my data would be:

$\bar{C_1} – A_{1i} = X_{1i}$

$\bar{C_2} – A_{2i} = X_{2i}$

Where $C_1$ & $C_2$ are the control data, $A_1$ & $A_2$ are the initial datasets.
My aim is to compare $X_1$ and $X_2$. For clarity the control groups are different datasets.

My specific questions on this are:

  1. Am I correct that not considering the controls in the t-test is unreliable in this situation?
  2. If so, what is the correct way to compare these two groups?
  3. If the standard deviation of the control groups are not significantly different could would I use the t-test as is?
  4. If the answer to 3 is true could the average of the combination of the data in both controls be used? (some how this feels wrong to me).

My apologies in advance that someone may have to change the name of the question.

Thank you in advance for your help.

Best Answer

This is really a comment that I hope will lead you to a formal answer to your question.

For Group 1, with $n_1$ subjects, maybe you have pre-treatment scores $C_{1j}$ and paired post-treatment scores $A_{1j},$ for $j = 1,2,\dots, n_1,$ for each of the $n_1$ subjects.

Then, if data are normal, you could compare the C's with the A's using a paired t test to see if the treatment made a difference in Group 1. [A paired t test is the same as a one-sample test on the differences $X_{1j}.]$

Similarly, for Group 2, with $n_2$ subjects, use another paired t test to see if the treatment made a difference in Group 2.

Then you have two independent samples: One of differences $X_{1j}$ $(n_1$ of them$)$ and the other of differences $X_{2j}$ $(n_2$ of them$).$ You can use a Welch two-sample t test to see whether the gains (or losses) due to the treatments are the same for the two groups.

What I'm describing is sometimes called a difference in differences (DID) design, for which you can search on this site or google online. Particularly if the variances of the $X_{1j}$ and the $X_{2j}$ are about equal, then you might use an appropriate ANOVA, potentially to test everything from the same ANOVA table.

Example: Very briefly, here are P-values for the three t tests based on fictitious data with $n_1=n_2 = 20,$ in which the treatment made no significant difference in Group 1 and did make a significant difference in Group 2. Then, not surprisingly, differences for the two groups show a significant difference.

summary(x1); length(x1); sd(x1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-4.8013 -0.9169  0.7345  0.4436  1.7917  3.0391 
[1] 20       # sample size
[1] 2.000593 # sample SD

summary(x2); length(x2); sd(x2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.221   4.867   5.075   5.068   5.348   5.606 
[1] 20
[1] 0.373142

Only P-values are shown for the paired t tests:

t.test(x1)$p.val
[1] 0.3338668     # not signif. at 5% level

t.test(x2)$p.val
[1] 3.151544e-23  # P-val near 0; highly signif

Full output from the Welch two-sample t test is as follows:

t.test(x1,x2)

        Welch Two Sample t-test

data:  x1 and x2
t = -10.163, df = 20.32, p-value = 2.049e-09
alternative hypothesis: 
 true difference in means is not equal to 0
95 percent confidence interval:
 -5.572848 -3.676284
sample estimates:
 mean of x mean of y 
 0.4435764 5.0681425

Note: I have made several assumptions in the above. If what I have shown does not seem quite right, then please edit your question to give a more explicit picture of your actual scenario. Then maybe someone else can provide alternate ideas.