Hypothesis Testing – Is There a Statistical Test to Compare Two Samples of Size 1 and 3?

hypothesis testingsample-sizet-test

For an ecology project, my lab group added vinegar to 4 tanks containing equal volumes of pond water, 1 control with no elodea (an aquatic plant) and 3 treatments with the same amount of elodea in each. The purpose of adding the vinegar was to reduce the pH. The hypothesis was that the tanks with elodea would go back to their normal pH quicker. This was indeed the case. We measured the pH of each tank daily for about two weeks. All the tanks eventually returned to their natural pH, but the length of time that this took was much shorter for the tanks with elodea.

When we told our professor about our experimental design, he said that there exists no statistical test that can be performed on the data to compare the control to the treatment. That because there was no replicate for the control (we only used one control tank) we cannot calculate variance and so we can't compare the sample means of the control and the treatment. So my question is, is this true? I definitely understand what he means. For example, if you took the height of one man and one woman, you can't draw conclusions about their respective populations. But we did 3 treatments, and the variance was small. It seems reasonable to assume that the variance would be similar in the control?

Update:

Thank you for the excellent answer. We got more water and elodea from the wetland and decided we would run the experiment again with smaller tanks but this time with 5 controls and 5 treatments. We were going to combine this with our original data but the starting pH of the tanks was different enough that it doesn't seem valid to consider the new experiment to be sampled from the same population as the original experiment.

We considered adding different amounts of elodea and trying to correlate speed of pH remediation (measured as time elapsed until pH returned to its original value) with amount of elodea, but we decided that wasn't necessary. Our objective is only to show that the elodea makes a positive difference, not to construct some kind of predictive model for exactly how the pH responds to differing amounts of elodea. It would be interesting to determine the optimal amount of elodea, but that's probably just the maximum amount that can survive. Trying to fit a regression curve to the data wouldn't be especially illuminating because of the various complicated changes that occur to the community when adding a large amount. The elodea dies, decomposes, new organisms start to dominate, and so on.

Best Answer

Note gung's question; it matters. I will assume that the treatment was the same for every tank in the treatment group.

If you can argue the variance would be equal for the two groups (which you would typically assume for a two sample t-test anyway), you can do a test. You just can't check that assumption, no matter how badly violated it might be.

The concerns expressed in this answer to a related question are even more relevant to your situation, but there's less you can do about it.

[You ask about it being reasonable to assume the variances are equal. We can't answer that for you, that's something you'd have to convince subject matter experts (i.e. ecologists) was a reasonable assumption. Are there other studies where such levels have been measured under both treatment and control? Others where similar tests (t-tests or anova especially - I bet you can find a better precedent) have been done or similar assumptions made? Some form of general reasoning you can see to apply?]

If $\bar{x}$ is the sample mean of the treatment and $\bar{y}$ is the mean of the control, and both are from normal distributions with variance $\sigma^2$, then $\bar{x}-\bar{y}$ will have mean $\mu_x - \mu_y$ and variance $\sigma^2 (1/n_x + 1/n_y)$ irrespective of whether one of the $n$'s is 1.

So when $n_y$ is 1,

$$ \frac{(\bar{x}-\bar{y})}{s_x\sqrt{1/n_x+1}} $$

(where $s_x$ is the standard deviation computed from the treatments) will be $t$-distributed (with $n_x - 1$ degrees of freedom) under the null.

You may notice that with the best available estimate of $\sigma$, $s_x$ used for $s_p$, this is exactly like the ordinary two-sample t-test formula with $n_y$ set to 1.

Edit:

Here's a simulated power curve for this test. The sample size at the null was 10000, at the other points was 1000. As you see, the rejection rate at the null is 0.05, and the power curve, while it requires a large difference in population means to have decent power, has the right shape. That is, this test does what it is supposed to.

power curve

(End edit)

With sample sizes so small, this will be somewhat sensitive to distributional assumptions, however.

If you're prepared to make different assumptions, or want to test equality of some other population quantity, some test may still be possible.

So all is not lost... but where possible, it's generally better to have at least some replication in both groups.