The problem is that, as you say, this is a very poorly designed experiment. You have no control group of sick people who didn't get medication; no group of sick people who got Type 1 but not Type 2; and no group who got Type 2 and not Type 1. I think that no amount of statistics will let you reliably test your second and third hypotheses. For example, if you find that their protein levels have changed after they get Type 2 treatment, you will have no way of deciding if the change comes from a delayed effect from Type 1, or just a general natural effect from time. So I won't offer any suggestions for testing those hypotheses as any result will be misleading.
Your first hypothesis you can test if and only if you are confident that people do not get better without treatment. You could not conclude this from your experiment, so you would need to know this from other experience eg clinical experience with this illness that people do not get better naturally. I've no idea if this is realistic or not.
Assuming the condition in the above paragraph is correct, I would measure the difference in the sick people's protein levels at the end of the experiment (after they got both treatments) from their protein levels at the beginning (when they turned up sick but before getting any treatment).
You first look for evidence that the protein levels have increased by a positive amount during this duration. This would be a one-sided t test, based on the differences (hopefully improvements) measured above, comparing it to zero.
The second part of your hypothesis was that the improvement brings the sick people up to the level of the well people. Assume there is no controversy about the fact that the illness reduces protein levels in the first place (as this wasn't one of the hypotheses you wanted to check). In this case, compare the average protein level in the sick group at the end of the experiment with the average protein level in the well group. Again, this is a one-sided t test (assuming protein levels are normally distributed), but this time based on comparing the two average protein levels (as opposed to in the para before where it was based on average improved protein compared to zero).
I don't think the set of measurements after treatment 1 but before treatment 2 can tell us anything.
You will find it easier to analyse this in R than Matlab, I think - R has many more statistical functions built in and ready to go for the user. However, if my answer above is right, you only need to do t-tests, which are pretty straightforward. I would advocate some graphical data analysis as well - if only to check for plausibility, outliers, distributions, etc - which will certainly be easier in R.
ancova. I believe the econometric term is difference-in-difference (DID). You may also want to see the Wikipedia pages for ANCOVA and DID, as there may be important differences and assumptions. It's possible to estimate as a general linear model with ordinary least squares, though whether this is optimal will depend on the specific nature of your data. Here's some code for an OLS GLM anyway:
v3=with(data.frame(v2),data.frame(pre=c(pre1,pre2),post=c(post1,post2),
condition=rep(c(0,1),c(4,4)))) #reorganizing your data into pre-post with a dummy variable
summary(lm(post~scale(pre,scale=F)*condition,v3)) #scaled out nonessential multicollinearity
Results:
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 909.6146 40.5553 22.429 2.34e-05 ***
scale(pre, scale = F) 1.0405 0.1096 9.491 0.000688 ***
condition -155.9272 57.3534 -2.719 0.053058 .
scale(pre, scale = F):condition -0.1447 0.1533 -0.943 0.398892
Residual standard error: 81.09 on 4 degrees of freedom
Multiple R-squared: 0.9764, Adjusted R-squared: 0.9588
F-statistic: 55.24 on 3 and 4 DF, p-value: 0.001033
A scatterplot using the ggplot2
package:
and its code:
ggplot(v3,aes(x=pre,y=post,colour=factor(condition)))+geom_point()+
stat_smooth(method='lm',formula=y~scale(x,scale=F))
Looks like the residuals are bigger for your second group. Test the null hypothesis that they're not, if you like: leveneTest(summary(lm(post~pre,v3))$resid~factor(condition),v3)
: $F_{(1,6)}=6.2$, $p=.05$.
This heteroscedasticity violates an ANCOVA assumption, but that may not matter greatly (Olejnik & Algina, 1984).
If you want, it's easy to repeat the above after transforming your post scores to ranks (using rank()
). The transformation reduces heteroscedasticity $(F_{(1,6)}=2.5,p=.17)$, though the residuals distribute a little less normally. The group difference comes out a little clearer, but the within-subjects differences get obscured slightly:
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.5488993 0.4110978 13.498 0.000174 ***
scale(pre, scale = F) 0.0054333 0.0011114 4.889 0.008109 **
condition -2.0952137 0.5813763 -3.604 0.022680 *
scale(pre, scale = F):condition -0.0002872 0.0015544 -0.185 0.862395
And model fit worsens a little bit:
Residual standard error: 0.822 on 4 degrees of freedom
Multiple R-squared: 0.9357, Adjusted R-squared: 0.8874
F-statistic: 19.39 on 3 and 4 DF, p-value: 0.007594
And here's that scatterplot: You can see why this emphasizes the group effect relative to the within-subjects effect: ranking wipes out the interaction mostly, and makes the confidence bands evener. Whether this is actually an improvement may depend on your purposes and, again, the specific nature of your data. As for why you shouldn't use an independent-samples $t$ test on change scores, see "Best practice when analysing pre-post treatment-control designs". There's quite a lot of literature on the topic, and even some room for debate, but not within this answer.
Conclusion:
Your two groups appear to have been sampled from different populations. The second group scores lower in general, and lower pre-scores relate to lower post-scores. I see that changes are consistently negative in your second group, and changes in your first group are consistently $\ge0$, but you'd probably want to collect more observations of this difference in the relationship of pre-scores to post-scores across conditions before concluding that the difference in change generalizes to your samples' populations.
Reference
Olejnik, S. F., & Algina, J. (1984). Parametric ANCOVA and the rank transform ANCOVA when the data are conditionally non-normal and heteroscedastic. Journal of Educational and Behavioral Statistics, 9(2), 129–149.
Best Answer
Note gung's question; it matters. I will assume that the treatment was the same for every tank in the treatment group.
If you can argue the variance would be equal for the two groups (which you would typically assume for a two sample t-test anyway), you can do a test. You just can't check that assumption, no matter how badly violated it might be.
The concerns expressed in this answer to a related question are even more relevant to your situation, but there's less you can do about it.
[You ask about it being reasonable to assume the variances are equal. We can't answer that for you, that's something you'd have to convince subject matter experts (i.e. ecologists) was a reasonable assumption. Are there other studies where such levels have been measured under both treatment and control? Others where similar tests (t-tests or anova especially - I bet you can find a better precedent) have been done or similar assumptions made? Some form of general reasoning you can see to apply?]
If $\bar{x}$ is the sample mean of the treatment and $\bar{y}$ is the mean of the control, and both are from normal distributions with variance $\sigma^2$, then $\bar{x}-\bar{y}$ will have mean $\mu_x - \mu_y$ and variance $\sigma^2 (1/n_x + 1/n_y)$ irrespective of whether one of the $n$'s is 1.
So when $n_y$ is 1,
$$ \frac{(\bar{x}-\bar{y})}{s_x\sqrt{1/n_x+1}} $$
(where $s_x$ is the standard deviation computed from the treatments) will be $t$-distributed (with $n_x - 1$ degrees of freedom) under the null.
You may notice that with the best available estimate of $\sigma$, $s_x$ used for $s_p$, this is exactly like the ordinary two-sample t-test formula with $n_y$ set to 1.
Edit:
Here's a simulated power curve for this test. The sample size at the null was 10000, at the other points was 1000. As you see, the rejection rate at the null is 0.05, and the power curve, while it requires a large difference in population means to have decent power, has the right shape. That is, this test does what it is supposed to.
(End edit)
With sample sizes so small, this will be somewhat sensitive to distributional assumptions, however.
If you're prepared to make different assumptions, or want to test equality of some other population quantity, some test may still be possible.
So all is not lost... but where possible, it's generally better to have at least some replication in both groups.