Solved – How to two different experiments be compared when they have different controls

multiple-comparisonsnormalizationt-test

Experiment 1: mice of genotype1 compared to their wild type littermates,
Experiment 2: mice of genotype2 compared to their wild type littermates.

(Each experiment in its own right is pretty straightforward: To determine whether the experimental treatment (=the genotype) has a significant effect on the variable measured, do a two-sample t-test.)

My problem is the following:
Experiment 1 and 2 were conducted in an independent way from each other. They measure the same variable and have different experimental groups (mice of two different genotypes), with their respective wild type littermates serving as control group (the wild type littermates of both experiments have the same genotype, but are from different cages & different generation of mice and cannot, for various reasons, be considered identical).

Another confounding problem is that experiment 1 and 2 were conducted and analyzed by 2 different observers, which is yet another reason to consider them independent from each other.

What I would like to do is compare the mice of genotype1 with the mice of genotype2. I could basically just go ahead and do that with the data I have, but because of the reasons outlined above and because it's considered good practice in mouse experiments to always only compare littermates (in an effort to reduce inter-subject variability), this is not feasible.

So probably the only way to achieve this comparison (somewhat dirtily) is to somehow normalize the data. Unfortunately, there is no inherent association or pairing between subjects in the experimental and control group, so the only strategy I can think of is to subtract from the measured variable of each experimental subject the control group mean and divide by the control group SD. I would then compare the normalized data using a independent two-sample t-test (after testing for equal variances).

I am skeptical whether this approach is legitimate and would greatly appreciate any comment or clarifying question.

Best Answer

My problem is the following: Experiment 1 and 2 were conducted in an independent way from each other. They measure the same variable and have different experimental groups (mice of two different genotypes), with their respective wild type littermates serving as control group (the wild type littermates of both experiments have the same genotype, but are from different cages & different generation of mice and cannot, for various reasons, be considered identical).

If you are unwilling to consider the controls identical due to difference in generation, then why are you willing to use a null hypothesis that mutant and wild type are identical? In the latter case you know there is a difference, and probably even have some theory that predicts an effect on the outcome measure. It sounds like the presence of a difference in and of itself is not of interest (the answer significance testing will provide), instead you are interested in the apparent effect size.

it's considered good practice in mouse experiments to always only compare littermates (in an effort to reduce inter-subject variability), this is not feasible.

This brings up the question of what "population" you are attempting to draw inferences about. By only using littermates it seems that the population is only that specific set of animals. In turn, this brings up the question of how you will justify generalizing a treatment effect to other sets of animals. Do you have reason to expect that the possibly large littermate effect does not interact with the treatment? Also, if genotype differences affect individuals in a variety of ways dependent on other factors this would seem to be of scientific interest. Perhaps the individual variability should be studied rather than reduced.

If the controls have similar outcomes for both studies, this could be taken as evidence (albeit limited evidence with n=2 studies, but that is all you got) that cage/generation/observer/unkonwn effects are not very strong. I would just graphically compare confidence intervals of mutant1, mutant 2, wt1 and wt2. Or better, if you have the individual data compare the distributions.

If the controls are not similar, and no one knows why beyond the vague "littermate effect", I would be skeptical that the experimental situation is understood/controlled well enough to draw any strong inference about treatment effects anyway. Instead of further research comparing treatments the controls should be studied until the important influences on outcome are understood well enough to get consistent results in different labs.

Related Question