ANOVA vs ANCOVA – How to Choose Between ANOVA and ANCOVA in a Designed Experiment

ancovaanovacontinuous datamultiple regression

I am conducting an experiment which has the following:

  • DV: Slice consumption (continuous or could be categorical)

  • IV: Healthy message, unhealthy message, no message (control) (3 groups in which people are randomly assigned – categorical) This is a manipulated message about the healthiness of the slice.

The following IV's could be considered as individual difference variables:

  • Impulsivity (this could be categorical ie. high versus low or continuous and is measured by a scale)

  • Sweet taste preference (this is also measured by a questionnaire which is 3 options to choose for each question)

  • BMI – participants will be weighed measured accordingly (this could also be considered either categorical or continuous).

As the groups will be randomly assigned to one of 3 groups I assume I am doing an ANOVA of some sort and would possibly use Factorial ANOVA as I am interested in which IV effects the DV the most but also the interactions between the IV's as research indicates that there are relationships between some combinations.

But I am not completely sure of this due to needing to know whether it's best to have the IV's all categorical or continous or mixed.

Or is ANCOVA a possibility or even regression but I am not sure about that given they are assigned to groups then categorised based on their answers to surveys.

I hope this makes sense and look forward to hearing from someone about my query.

Best Answer

As a fact of history, regression and ANOVA developed separately, and, due in part to tradition, are still often taught separately. In addition, people often think of ANOVA as appropriate for designed experiments (i.e., the manipulation of a variable / random assignment) and regression as appropriate for observational research (e.g., downloading data from a government website and looking for relationships). However, all of this is a little misleading. An ANOVA is a regression, just one where all of the covariates are categorical. An ANCOVA is a regression with qualitative and continuous covariates, but without interaction terms between the factors and the continuous explanatory variables (i.e., the so called 'parallel slopes assumption'). As for whether a study is experimental or observational, this is unrelated to the analysis itself.

Your experiment sounds good. I would analyze this as a regression (in my mind, I tend to call everything regression). I would include all the covariates if you are interested in them, and/or if the theories you are working with suggest they may be important. If you think the effect of some of the variables may depend on other variables, be sure to add in all of the requisite interaction terms. One thing to bear in mind is that each explanatory variable (including interaction terms!) will consume a degree of freedom, so make sure your sample size is adequate. I would not dichotomize, or otherwise make categorical, any of your continuous variables (it is unfortunate that this practice is widespread, it's really a bad thing to do). Otherwise, it sounds like you're on your way.

Update: There seems to be some concern here about whether or not to convert continuous variables into variables with just two (or more) categories. Let me address that here, rather than in a comment. I would keep all of your variables as continuous. There are several reasons to avoid categorizing continuous variables:

  1. By categorizing you would be throwing information away--some observations are further from the dividing line & others are closer to it, but they're treated as though they were the same. In science, our goal is to gather more and better information and to better organize and integrate that information. Throwing information away is simply antithetical to good science in my oppinion;
  2. You tend to lose statistical power as @Florian points out (thanks for the link!);
  3. You lose the ability to detect non-linear relationships as @rolando2 points out;
  4. What if someone reads your work & wonders what would happen if we drew the line b/t categories in a different place? (For example, consider your BMI example, what if someone else 10 years from now, based on what's happening in the literature at that time, wants to also know about people who are underweight and those who are morbidly obese?) They would simply be out of luck, but if you keep everything in its original form, each reader can assess their own preferred categorization scheme;
  5. There are rarely 'bright lines' in nature, and so by categorizing you fail to reflect the situation under study as it really is. If you are concerned that there may be an actual bright line at some point for a-priori theoretical reasons, you could fit a spline to assess this. Imagine a variable, $X$, that runs from 0 to 1, and you think the relationship between this variable and a response variable suddenly and fundamentally changes at .7, then you create a new variable (called a spline) like this: $$ \begin{aligned} X_{spline} &= 0 &\text{if } X\le{.7} \\ X_{spline} &= X-.7 &\text{if } X>.7 \end{aligned} $$ then add this new $X_{spline}$ variable to your model in addition to your original $X$ variable. The model output will show a sharp break at .7, and you can assess whether this enhances our understanding of the data.

1 & 5 being the most important, in my opinion.

Related Question