In an ANCOVA, you typically model

$$E(Y|T,X)=\gamma T+X \beta$$

where $Y$ is your outcome variable, $T$ is your treatment indicator ($T=0$ to indicate control, and $T=1$ to indicate treatment), and $X$ is a covariate (or a vector of covariates). Then $\gamma$ is the average treatment effect (ATE) conditional on $X$.

Now let $Y=TY^T+(1-T)Y^C$, where $Y^T$ is the outcome in treamtent group and $Y^C$ is the outcome in control group. The primary assumption, which is exploited by ANCOVA, is that the outcome variables $Y^T$ and $Y^C$ are independent from $T$ conditional on $X$. This is also called 'unconfoundedness' written as:

$$P(T|Y^T,Y^C,X)=P(T|X)$$

Otherwise outcome variables and treatment assignment are confounded and (conditional) mean differences on $Y^T$ and $Y^C$ may be caused by other factors than the manipulation (i.e., *even given* $X$). If $T$ and $Y^C$ and $Y^T$ are unconfounded conditional on $X$, the ATE estimate $\gamma$ from ANCOVA will be unbiased given that also all other model assumptions are met.

You may ask when it is clear whether there is unconfoundedness: this can never be assessed with absolute certainty and it represents the key weakness of adjustment for bias in observational studies. It is recommended (see ref. below) that you include all covariates that are even in tendency (p<.10) statistically associated (correlated) with either $T$, $Y^C$ or $Y^T$. This suggests that it is not problematic, rather desirable, that $X$ and $T$ are correlated when using ANCOVA (**your first question**).

In fact, the correlation of covariate(s) with dependent variable 'within the groups' (i.e., $X$ with $Y^C$ or $Y^T$) is an indication that the unconfoundedness assumption holds or is more plausible (**your second question**). But correlation with $T$ likewise indicates this. However: an 'ideal' $X$ covariate is associated to, both, treatment indicator and outcome variables. Since ANOVA does not include $X$ (**your third question**), it would assume unconfoundedness unconditional $X$, i.e., $$P(T|Y^T,Y^C)=P(T)$$which is a very strong assumption and dependence of $X$ and $T$ would point to its potential violation. It is therefore not recommended in your hypothetical situation and should be preserved to fully randomized experiments, in which any $X$ by definition is independent of treatment and criterion variables.

It is important to note that meeting all of the **other model assumptions** of ANCOVA is required to find unbiased ATE estimates (e.g., using least squares estimators). Chiefly, this suggests that there is **no interaction** between $T$ and $X$. This is sometimes referred to as effect homogeneity (as opposed to hetorogenous effects, if there is an interaction). Therefore, the model should at least include the interactions as well, which is not standard in ANCOVA models. Furthermore, you assume linearity (inspect residuals to check this assumption) and you also assume that the Y-model is correct (i.e., that you included all relevant $X$ to model $Y$).

Sometimes, propensity score methods and nonparametric matching methods are superior to ANCOVA because they do not feature the linearity assumption and can include interactions 'on the go'. Moreover, so-called double-robust methods combine Y-modeling with propensity score methods. They guarantee unbiased effect estimates even if the model for $Y$ is incorrect (assuming the propensity score model is correct). Still all of these methods make the unconfoundedness assumption.

For an excellent treatment of ANOCVA adjustment for selection bias (and also other methods) see:

Schafer, J. L., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. *Psychological Methods, 13*(4), 279–313. doi:10.1037/a0014268

## Best Answer

This is a frustrating use in terminology that has caused a lot of issues for a lot of people. My understanding is this:

Both of these predict the dependent variable and both have a similar relationship to the dependent variable. Variance from both types of variables are accounted for in a linear model (e.g., regression, ANCOVA). So, a covariate is not just a third variable not directly related to the dependent variable. It is merely a dimensional variable.

The reason statistical packages have options for both of these is because the statistical packages treats them differently. For example, a factor may allow contrasts between groups, while a covariate would not.

When someone asks you to use something as a covariate, make sure you know what they mean. That is the only way you can know, since this misunderstanding is rampant.