In my opinion your design more strongly resembles a nested design. Individuals
are nested within the sites
. Hence, I would advise to either treat site
simply as a fixed effect covariate (due to the small number of levels) or use some sort of hierarchical modeling (could be difficult, again due to the low number of site
levels).
The question of whether or not you have a split plot design is dependent on the smallest experimental unit. If this would indeed be site
you would have a split-plot (or repeated-measures) design, as your experimental units would undergo different treatments. However, from your description it sounds like you sample experimental units within each site
making those the smallest experimental unit. As long as experimental units are not assigned to more than one condition, it is not a split-plot design. Rather, you seem to have a two-level multilevel model (e.g., sampling students within classes).
If you are interested in the possibility of interaction between treatments A and B, you should be examining the interaction first, and what you call "main effects" later. If there is a significant interaction then there is no single effect of either A or B: there is an effect of each absent the other, and an effect of their combination. If there is no significant interaction then you might be justified in removing the interaction term from the analysis.
Logistic regression, including an interaction term, would be an obvious way to model this. (Fisher's exact test or the chi-square test for contingency tables are more like tests for interactions.) Instead of re-inventing the code, take advantage of standard, vetted, software packages to do the calculations. Such software will provide z-tests of the hypotheses that each of the regression coefficients (including for the interaction term) equals zero.
It's not clear that your study is sufficiently powered to answer your question cleanly, however. For logistic regression one usually wants to restrict analysis to 1 predictor (including interactions) per 15 or so cases in the smaller outcome category; looks like you only have about 30 responses to treatment in total (many of which seem to be in the no-treatment group) for 3 predictors (A, B, and A+B combination).
Added after seeing logistic regression results
As noted in a comment on your question there is still an issue about whether the outcomes have been coded correctly. It certainly seems surprising that the combination of treatments A and B would lead to a lower probability of "yes" outcomes than would the absence of treatment. That doesn't affect what follows, however, except for a reversal of signs of the coefficients.
First, if you had reason to suspect that the combination of A+B would affect the log-odds of outcome differently than the sum of their individual effects, then you were correct to include the interaction term in your model. In your model including the interaction, however, none of the estimated regression coefficient values is significantly different from 0. The magnitude of the interaction term nevertheless is greater than the magnitudes of the individual effects of A or B, consistent with it's having been wise to include the interaction. The interaction term is nominally the closest to "significant" of any. In this case some would argue that it is better to stay with the pre-specified model rather than remove the interaction term. Additional thoughts are on this page, this page, and many other pages on this site found via a search for "drop interaction term".
Second, even if it could be argued to proceed without the interaction term, in logistic regression it is dangerous to omit predictors that might be related for outcome, due to inherent omitted-variable bias. So the best analysis in that case would be a model combining treatments A and B but without the interaction term.* In that case the coefficient for B does come out nominally significant (p = 0.039), while the coefficient value for A is 0 (p = 1). The Wald test provided by the rms
package in R (via anova(lrm())
) for overall model significance is not, however, significant (p = 0.12). As these nominal p-values are based on the assumption that you did not look at the data to design the tests, you should be very cautious in attributing significance to the results. One benefit of statistical analysis is that it minimizes the chance that you will fool yourself into believing a false-positive finding, and in that way can prevent you from wasting further resources.
In terms of how to proceed, make sure that you have the outcomes coded correctly. You certainly don't want to be wasting time on treatment B if by itself it is worse than no treatment at all and if it's even less beneficial when combined with A. Yet that is what your initial table of probabilities and your frequency table suggest.
If your outcomes are coded/analyzed incorrectly (as I suspect), then your results at least aren't inconsistent with some effect of treatment B, an effect that might be enhanced by co-treatment with A. But you don't have enough data to make that claim reliably. If you need to write up your results to date, it might be best to show results both with and without the interaction term, while including some discussion of the pros and cons. For designing future work, you might want to explore in more detail both treatment B and its potential interaction with treatment A, informed both by these preliminary results and your knowledge of the subject matter.
*In your particular case the coefficients for the treatments and their individual p-values come out the same whether you do this as a combined 2-predictor analysis or as 2 separate single-predictor analyses. I think that's because the coefficient for A comes out to be 0 and you have a balance design. But in more general cases that won't be the case, hence the recommendation for a combined analysis as the default.
Best Answer
What's interesting or not is something that should be decided based on what you know about the topic. It's perfectly fine to run a 2x2 experiment, use only a single manipulation or even study the distribution of a single variable (with no experimental manipulation or bivariate statistics at all) if that variable present some theoretical or practical interest.
One issue here is that you seem to be overly concerned about significance testing. In fact, it's often not reasonable to expect any effect to be exactly zero (this is obviously true and has been stressed many times in relation to observational research – modeling in economics, sociology, political science, etc. – but it is also true for psychology experiments although perhaps not always in chemistry of physics).
It's also incorrect to consider that failing to reject the null hypothesis is strong evidence for the absence of any effect as the power to detect a given effect depends on the sample size and the actual magnitude of this effect. So if you actually expect some variable not to have an effect and you are specifically interested in that, just running an ANOVA in the hope that the p-values are above the usual threshold is not the right strategy, no matter how many factors you have in your design.
For all these reasons, your adviser has a point in the sense that blindly running a 2x2 experiment and finding that no effect is significant might very well leave you with uninterpretable and unpublishable data. Adding some manipulation that appear to “cancel” an effect (as shown by a simple effect and a significant interaction) could be a way to “sneak in” the story about the absence of the effect under some condition but the fact remains that a non-significant effect alone is difficult to interpret and next to impossible to publish in some journals/disciplines.
However, I think the solution here is not necessarily to add a few other factors but to think hard about effect size and formulate more sophisticated hypothesis than “something or other has an effect”. You can then use power analysis, equivalence testing, or precision in parameter estimates to ensure that your experiment provides valuable information on this hypothesis in any case.
Also, in some fields like psychology, researchers work very hard to bring everything to a 2x2x… ANOVA design even when some variables naturally have a quantitative or continuous interpretation. Another way to make an experiment more informative is to use several levels for such independent variables or try to directly model the relationship between these quantitative variables and your response.
Finally, one consideration would be the cost of additional conditions. In survey research, common wisdom is that people don't mind long questionnaires, the difficulty is getting them to participate at all. In that case, researchers often add variables/questions “just in case”, to address ancillary hypotheses or get more out of the effort. At the other extreme, if the study is necessarily between-subject and adding participants is very costly (say brain imaging with hard-to-recruit patients), you would need to think very carefully before adding any factor or condition.